High resolution analysis of recent population structure using rare variants.
Huang Lei, L Lamnidis, Thiseas C TC et al.
Publication Details
Comprehensive information about this research publication
Abstract
Summary of the research findings
Identifying population structure from genetic data is a key challenge, for which several statistical methods have been developed, including F-statistics, which measure the average correlation in allele frequency differences between two pairs of populations. F-statistics are typically applied to a subset of genetic variation within the common allele frequency band, available through microarrays and SNP enrichment techniques. Recent advances in sequencing technology increasingly allow generating whole-genome sequencing data, both ancient and modern, which not only enable querying nearly every base of the genome, but also contain numerous rare variants. Rare variants, with their more population-specific distribution, allow detection of recent population structure with much finer resolution than common variants - an opportunity that has so far been under-exploited. Here, we develop a new statistical method, RAS (Rare Allele Sharing), for summarizing rare allele frequency correlations, similar to F-statistics but with flexible ascertainment on allele frequencies. We test RAS on both published and simulated data and find that RAS, with appropriate ascertainment, has better resolution than genome-wide F-statistics in identifying population structure caused by recent demographic events. Leveraging this, we further develop the use of RAS to compute ancestry proportions accurately in cases of recently diverged and closely-related source populations. We implemented the new statistical methods as an R package and a command line tool. In summary, our method can provide new perspectives to identify and model population structure, allowing us to understand more subtle relationships among populations in the recent human past.
Analysis
Comprehensive review of ancestry and genetic findings
Important Disclaimer: This review has been performed semi-automatically and is provided for informational purposes only. While we strive for accuracy, this analysis may contain errors, omissions, or misinterpretations of the original research. DNA Genics disclaims all liability for any inaccuracies, errors, or consequences arising from the use of this information. Users should independently verify all information and consult original research publications before making any decisions based on this content. This analysis is not intended as a substitute for professional scientific review or medical advice.