Menu
Research Publication

Autoencoders for genomic variation analysis.

Geleta Margarita, M Montserrat, Daniel Mas DM et al.

41558827 PubMed ID
6 Authors
2026-02-03 Published
0 Views
Scroll to explore
Chapter I

Publication Details

Comprehensive information about this research publication

Authors

GM
Geleta Margarita
MM
M Montserrat
DM
Daniel Mas DM
GX
Giro-I-Nieto Xavier
XI
X Ioannidis
AG
Alexander G AG
Chapter II

Abstract

Summary of the research findings

Modern biobanks are providing numerous high-resolution genomic sequences of diverse populations. In order to account for diverse and admixed populations, new algorithmic tools are needed in order to properly capture the genetic composition of populations. Here, we explore deep learning techniques, namely, variational autoencoders (VAEs), to process genomic data from a population perspective. We show the power of VAEs for a variety of tasks relating to the interpretation, compression, classification, and simulation of genomic data with several worldwide whole genome data sets from both humans and canids, and evaluate the performance of the proposed applications with and without ancestry conditioning. The unsupervised setting of autoencoders allows for the detection and learning of granular population structure and inferring of informative latent factors. The learned latent spaces of VAEs are able to capture and represent differentiated Gaussian-like clusters of samples with similar genetic composition on a fine scale from single nucleotide polymorphisms (SNPs), enabling applications in dimensionality reduction and data simulation. These individual genotype sequences can then be decomposed into latent representations and reconstruction errors (residuals), which provide a sparse representation useful for lossless compression. We show that different populations have differentiated compression ratios and classification accuracies. Additionally, we analyze the entropy of the SNP data, its effect on compression across populations, and its relation to historical migrations, and we show how to introduce autoencoders into existing compression pipelines.

Chapter III

Analysis

Comprehensive review of ancestry and genetic findings

Important Disclaimer: This review has been performed semi-automatically and is provided for informational purposes only. While we strive for accuracy, this analysis may contain errors, omissions, or misinterpretations of the original research. DNA Genics disclaims all liability for any inaccuracies, errors, or consequences arising from the use of this information. Users should independently verify all information and consult original research publications before making any decisions based on this content. This analysis is not intended as a substitute for professional scientific review or medical advice.

Analysis In Progress

Our analysis of this publication is currently being prepared. Please check back soon for comprehensive insights into the ancestry and genetic findings discussed in this research.