Genetic structure analysis of a Northwest Chinese population using a self-developed DIP system and machine learning for forensic ancestry inference.
Cai Meiming, M Shen, Ruonan R et al.
Publication Details
Comprehensive information about this research publication
Abstract
Summary of the research findings
This study investigates forensic ancestry inference in admixed populations using a self-developed 60-DIP panel, analyzing Kyrgyz samples from Northwest China (CNK) and 26 reference populations from the 1000 Genomes Project. Genetic analyses (PCA, ADMIXTURE, TreeMix) indicated that the CNK shares core ancestry with East Asian populations while exhibiting a partially distinct genetic structure, consistent with previous genome-wide studies. Although inferences regarding deep demographic history are preliminary given the limited number of markers, the DIP panel achieved high forensic efficiency, with a cumulative probability of discrimination > 0.99999999999 and a probability of exclusion > 0.9996. In ancestry modeling, machine learning algorithms significantly outperformed traditional supervised dimensionality reduction methods. At the continental level, XGBoost achieved the highest accuracy (0.919) and strong performance across all major ancestries, with near‑perfect discrimination of African and East Asian populations. For East‑Asian sub‑regional classification, random forest achieved the best performance (accuracy = 0.709) and showed the highest precision for the CNK group. The loci rs5891435 and rs35171885 were key for continental and subregional differentiation. Results support the East Asian background of the CNK, and show that machine learning with an optimized DIP panel can substantially improve ancestry inference in the admixed CNK group and similar settings, providing a promising and practically useful forensic biogeographical ancestry tool that warrants further validation in broader datasets.
Analysis
Comprehensive review of ancestry and genetic findings
Important Disclaimer: This review has been performed semi-automatically and is provided for informational purposes only. While we strive for accuracy, this analysis may contain errors, omissions, or misinterpretations of the original research. DNA Genics disclaims all liability for any inaccuracies, errors, or consequences arising from the use of this information. Users should independently verify all information and consult original research publications before making any decisions based on this content. This analysis is not intended as a substitute for professional scientific review or medical advice.