Menu
Research Publication

GhostHunter: A Multi-Test Framework for Detecting Ghost Introgression.

Margaret Wanjiku

42094433 PubMed ID
1 Authors
2026-04-28 Published
8 Views
Scroll to explore
Chapter I

Publication Details

Comprehensive information about this research publication

Authors

MW
Margaret Wanjiku
Chapter II

Abstract

Summary of the research findings

1 Gene flow from extinct or unsampled ghost populations is increasingly being discovered across species, but it remains difficult to detect without genomes from donor populations. Signals of ghost introgression can also resemble other demographic events, including bottlenecks, population structure, or migration among sampled populations. We introduce GhostHunter, a multi-step framework that combines (1) analyses of coalescent time distributions across loci, (2) goodness-of-fit and likelihood-based tests under the isolation-with-migration (IM) model, and (3) population structure inference under an admixture model to detect ghost introgression from population genomic data. By integrating multiple independent tests, GhostHunter is designed to capture different consequences of ghost introgression, including hidden ancestry components, heterogeneity in genealogical histories across the genome, and improved fit of demographic models that include an unsampled source. Using extensive simulations under the IM model with varying ghost migration rates and divergence times, we show that ghost introgression leaves clear genome-wide signatures in distributions of time to the most recent common ancestor (TMRCA). In particular, TMRCA distributions show multimodality and step-like patterns in the empirical cumulative distribution function (ECDF), even when median coalescent times are similar across scenarios. Likelihood comparisons under IM models consistently support the presence of unsampled lineages, although estimating ghost gene flow is more difficult and depends on admixture strength and divergence time. Model-based clustering also indicates that ghost introgression can increase support for additional population structure, although support is generally concentrated at low values of K . We then apply GhostHunter to genomic data from Central Europeans (CEU) and Han Chinese (CHS) in the 1000 Genomes Project Phase 3. In these data, TMRCA estimates show strong heterogeneity across genomic windows (62,748 TMRCAs; median 42,471 generations; median KS D = 0.271; Hartigan's dip test rejects unimodality), consistent with mixed genealogical histories across the genome. Population structure analyses favor K = 2, consistent with a strong CEU-CHS split. In contrast, IMa3 comparisons of models with and without ghost gene flow do not support estimating non-zero ghost gene flow in the empirical dataset, suggesting that genealogical heterogeneity is clearer than the migration-parameter signal in this case. Overall, these results support GhostHunter as a practical first-pass screening framework for flagging hidden ancestry and reducing avoidable errors before full demographic inference, especially in non-model systems. GhostHunter is available as an open-source Snakemake pipeline at https://github.com/Megmugure/GhostHunter .

Chapter III

Analysis

Comprehensive review of ancestry and genetic findings

Important Disclaimer: This review has been performed semi-automatically and is provided for informational purposes only. While we strive for accuracy, this analysis may contain errors, omissions, or misinterpretations of the original research. DNA Genics disclaims all liability for any inaccuracies, errors, or consequences arising from the use of this information. Users should independently verify all information and consult original research publications before making any decisions based on this content. This analysis is not intended as a substitute for professional scientific review or medical advice.

Summary

Key Findings

Ancestry Insights

Traits Analysis

Historical Context

Scientific Assessment