Every Result Has a Method
A transparent overview of the methods used across our reports and tools. Understanding how results are produced is the first step to interpreting them responsibly.
Why Methods Matter
Genetic ancestry results do not appear on their own.
Every result you see—whether a percentage, a distance score, or a population match—is produced through models, reference data, and assumptions. These methods are not invisible machinery; they are choices that shape what the results can and cannot tell you.
Understanding those methods is essential to interpreting results responsibly. Without this understanding, it is easy to read too much into numbers, or to miss what they actually represent.
This page exists because we believe that transparency about methodology builds trust—and because informed users make better sense of their results.
Core Principles of Our Analysis
Before techniques, there are principles. These guide every model we build and every result we present.
Population Genetics Over Labels
We work with genetic patterns, not ethnic categories. Labels like "European" or "African" are geographic conveniences—the genetics are far more complex and continuous.
Models Over Categories
Our results come from statistical models, not fixed categories. Models are tools for understanding—they illuminate patterns, but they don't define who you are.
Transparency Over Simplification
We would rather explain complexity honestly than hide it behind simple-seeming numbers. Simplification that misleads is worse than complexity that informs.
Context Over Certainty
Results gain meaning through context: the model used, the reference populations included, the questions being asked. Without context, numbers can mislead.
Continuous Improvement
Science evolves. Reference datasets grow. Methods improve. We update our models as the field advances, which means results may change over time—and that is a feature, not a flaw.
Data Sources & Reference Panels
Every model reflects the data it is built on. Understanding our sources helps interpret what results mean.
Modern Population References
We use reference samples from global populations, drawn from peer-reviewed datasets including the Human Genome Diversity Project (HGDP), the 1000 Genomes Project, and curated academic collections.
Modern references represent current genetic diversity—not historical populations.
Ancient DNA Datasets
Where applicable, we incorporate ancient DNA from archaeological samples. These provide windows into past genetic variation, but coverage is uneven—some regions and time periods have more samples than others.
Ancient DNA is fragmentary by nature; not all analyses can include it.
Publicly Available, Peer-Reviewed Sources
Our reference panels draw primarily from publicly available, peer-reviewed datasets. This allows reproducibility and enables researchers to understand the basis of our models.
We do not use proprietary or inaccessible data without clear documentation.
Known Coverage Limitations
Global genetic sampling is uneven. European and East Asian populations are overrepresented; many African, Indigenous American, and Pacific populations are underrepresented. This affects model resolution.
Results are more precise for well-sampled regions; less precise for others.
Modeling Approaches Used
Different methods ask different questions. Here is how the main approaches work.
Admixture & Clustering Models
Percentage breakdowns by population cluster
Admixture models use clustering algorithms to identify groups of individuals who share similar genetic patterns. Your DNA is then compared against these clusters to estimate how it distributes across them.
The "K" in model names (K7, K12, K72) refers to the number of ancestral components. Higher K does not mean "more accurate"—it means the genetic variation is being split into more groups. Different K values offer different resolutions.
What it shows: Statistical similarity to reference clusters.
What it does not show: Where your ancient matches "came from" in a historical sense.
Coordinate-Based Representations
Positioning in genetic space (G25, PCA)
Coordinate systems like G25 reduce complex genetic data into a smaller number of dimensions (25 in the case of G25). These dimensions capture the major axes of genetic variation across global populations.
Your position in this space reflects your genetic similarity to other samples. Samples that cluster together are genetically similar; samples far apart are more different.
What it shows: Your position relative to reference populations in genetic space.
What it does not show: Direct ancestry relationships or historical migrations.
Comparative Distance Analysis
Measuring genetic similarity through distance
Distance-based methods calculate how similar or different your DNA is from reference samples. This produces ranked lists of closest matches—populations or ancient individuals whose genetic profiles are most similar to yours.
Genetic distance is a measure of similarity, not a measure of ancestry percentage or geographic origin. Two populations can be genetically close for many reasons, including shared ancient ancestry, recent gene flow, or similar demographic histories.
What it shows: Which samples are most genetically similar to you.
What it does not show: That you "descend from" those populations directly.
Projection onto Reference Spaces
Placing your sample within established frameworks
Many analyses work by projecting your sample onto a reference space that was built from curated populations. Your coordinates or percentages are estimated based on how you fit within this pre-defined space.
This means results depend on which populations were used to build the reference. If a population similar to your actual ancestry was not included, you may appear as a mixture of the closest available alternatives.
What it shows: How you fit within the reference framework.
What it does not show: Populations outside the reference that might be closer matches.
Haplogroup Analysis
Tracing maternal and paternal lineages
Haplogroups are defined by specific mutations in mitochondrial DNA (mtDNA) or the Y-chromosome. They represent deep lineages that can be traced back tens of thousands of years—your direct maternal line (mtDNA) and direct paternal line (Y-DNA).
Unlike admixture analysis, which considers your entire genome, haplogroups trace single lineages. They tell you about one line of ancestors, not your full ancestry.
What it shows: The deep lineage of your direct maternal/paternal line.
What it does not show: Your full ancestry (only ~0.01% of your genome).
Advanced Analysis Methods
Beyond standard admixture percentages, we employ chromosome-level analysis techniques that reveal deeper layers of your genetic heritage.
What Sets These Methods Apart
Most ancestry services stop at population percentages. We go further—analyzing your DNA at the chromosome level, separating parental contributions, and identifying actual shared DNA segments with ancient and modern individuals. These techniques require significantly more computational resources and scientific expertise, but they provide insights that percentage-based approaches simply cannot offer.
Phasing Parental Separation
Separating maternal from paternal chromosomes
Your DNA is a mixture—you inherited one copy of each chromosome from your mother and one from your father. Standard ancestry tests analyze this mixture as a whole. Phasing computationally separates these parental contributions, creating two distinct genomic profiles from a single test.
This separation enables parent-specific ancestry analysis: you can see which ancestral components came from which side of your family, even without testing your parents. When combined with Local Ancestry Inference, phasing reveals the geographic origins of each parental chromosome.
What it enables: Parent-specific ancestry percentages, clearer inheritance patterns, and more accurate chromosome painting.
Technical note: Statistical phasing uses population patterns; duo/trio phasing with family members provides higher accuracy.
Local Ancestry Inference (LAI) Chromosome-Level
Ancestry assignment along each chromosome
While global admixture gives you overall percentages, Local Ancestry Inference (LAI) assigns ancestry to specific segments along each chromosome. This creates a "chromosome painting"—a visual map showing where different ancestral populations contributed to different parts of your genome.
LAI is particularly valuable for individuals with mixed ancestry, as it shows precisely which chromosomal regions carry which ancestral signals. Combined with phasing, you can see which parent contributed each ancestral segment.
What it shows: Ancestry painted onto specific chromosome locations.
What it requires: Phased data for parent-specific painting; works best with sufficient reference coverage.
Chromosome Browser Analysis Segment-Level
Visualizing DNA segments across all 22 autosomes
Chromosome browser technology displays your DNA as visual segments across all 22 autosomal chromosomes. When comparing to reference samples—ancient or modern—matching segments are highlighted, showing exactly where on your genome you share DNA.
This goes beyond percentages to show the actual structure of shared genetic material. Longer segments indicate more recent common ancestry; shorter segments suggest more distant connections (though short segments require careful interpretation).
What it shows: Exact genomic locations where you share DNA with reference samples.
What to consider: Segment length matters—longer segments are more informative.
IBS: Identity by State Segment Matching
Detecting identical DNA sequences
Identity by State (IBS) identifies DNA segments where two individuals have identical genetic sequences. This is the foundation of all segment-based matching—we first detect where sequences match, then analyze what those matches might mean.
IBS is a powerful detection tool, but matching sequences can occur for multiple reasons: shared ancestry (IBD), chance alignment in common population variants, or convergent patterns. Longer IBS segments are more likely to indicate true shared ancestry.
What it detects: Identical DNA sequences between samples.
Important distinction: All IBD is IBS, but not all IBS is IBD. Interpretation requires statistical analysis.
IBD: Identity by Descent Ancestral Connection
Segments inherited from a common ancestor
Identity by Descent (IBD) refers to DNA segments that two individuals inherited from a shared ancestor. Unlike IBS (which just means sequences match), IBD implies a genealogical connection—somewhere in both individuals' family trees, a common ancestor passed down that segment.
Identifying true IBD from IBS requires statistical inference: segment length, SNP density, population background rates, and the number of segments all factor into confidence assessment. Longer segments across multiple chromosomes provide stronger evidence.
What it indicates: A likely shared ancestor who passed down this DNA segment.
Confidence factors: Segment length, count across chromosomes, and statistical comparison to background rates.
Coordinates, Components, and Distances
These concepts appear throughout our tools. Understanding them unlocks deeper interpretation.
Coordinates
Coordinates represent your position in a reduced genetic space. In G25, for example, your DNA is represented by 25 numbers that capture how you relate to global genetic variation.
These are not geographic coordinates. They are mathematical positions derived from genetic data. Two individuals with similar coordinates are genetically similar.
Components
Components are the building blocks of admixture models. A K12 model has 12 components; your percentages show how your DNA distributes across them.
Components are statistical constructs derived from clustering algorithms—they are not ancestral populations. A component labeled "Northern European" represents a cluster of genetic variation, not a historical population called "Northern Europeans."
Genetic Distance
Genetic distance measures how similar or different two samples are. Smaller distances mean greater similarity; larger distances mean greater difference.
Distance is calculated from genetic markers—the more variants two samples share, the smaller the distance between them.
Uncertainty, Limits, and Change
Scientific honesty includes acknowledging limits. Here is what every user should understand.
Why Results Have Uncertainty
All ancestry estimates are probabilistic. They depend on which genetic markers are analyzed, which reference populations are used, and how the model distributes ambiguous signals. Small percentages especially should be treated with caution—they may reflect statistical noise rather than meaningful ancestry.
Why Different Models Give Different Views
A K7 model groups genetic variation into 7 clusters. A K72 model uses 72. Neither is "more correct"—they partition the same data differently. Similarly, different reference panels produce different results because they define the space differently. Results from different models are complementary perspectives, not contradictions.
Why Updates May Change Results
As science advances, reference panels expand and algorithms improve. We update our models to reflect current knowledge, which means your results may change over time. This is not an error—it reflects genuine improvement in the underlying methods. Earlier results were not "wrong"; they reflected what was known at the time.
Why No Model Is "Final"
No ancestry model represents absolute truth. Every model is a lens—a way of viewing genetic data that illuminates some patterns while obscuring others. The goal is not to find the "correct" model, but to use multiple perspectives to build a richer understanding. Certainty is not the aim; informed interpretation is.
How to Use Results Responsibly
Results are tools for understanding—not identity assignments. Here is how to interpret them well.
Do Not Treat Models as Identity Assignments
Ancestry results describe genetic patterns—they do not define who you are. A percentage labeled "Scandinavian" does not make you Scandinavian; it means your DNA clusters with reference samples from that region. Identity is cultural, personal, and historical—not determined by algorithms.
Compare Results Across Methods
No single model tells the complete story. Admixture percentages show one view; genetic distance rankings show another; haplogroups trace specific lineages. Use multiple methods together. Where they converge, you can have more confidence; where they differ, you have learned about the limits of each approach.
Use Context and History Together
Genetic results gain meaning when combined with family history, historical records, and geographic context. A result that seems surprising may make sense in light of known migration patterns or family stories. Genetics is one source of information—not the only source.
Avoid Conclusions from a Single Model
If one model shows an unexpected result, do not immediately assume it reveals hidden ancestry. Check other models. Consider whether the result might reflect model limitations, statistical noise, or reference panel gaps. Robust conclusions come from convergent evidence, not single data points.
How Methods Connect to Reports & Tools
Different tools use these methods in different ways. Here is how they relate.
Narrative Reports
Narrative reports like Deep Ancestry, Viking Heritage, and Celtic Heritage add an interpretation layer on top of raw calculations. They contextualize results within historical and cultural frameworks, making data more accessible.
Method used: Multiple approaches combined with historical context.
Explore Narrative ReportsHGDP & K-Models
K-models (HGDP K72, K7, K12, etc.) apply admixture analysis with different numbers of components. They show population structure—how your DNA distributes across genetic clusters defined by reference populations.
Method used: Admixture/clustering models with varying K values.
View HGDP K72G25 Coordinates
G25 Coordinates represent your position in 25-dimensional genetic space. This is raw positional data that can be used in various downstream analyses, including distance calculations and ancestry modeling.
Method used: Coordinate-based dimensional reduction (PCA-derived).
Get Your CoordinatesG25 Studio
G25 Studio is an analytical environment where you can run custom models using your coordinates. Test different population combinations, calculate distances, and explore hypotheses about your ancestry.
Method used: User-directed distance analysis and modeling.
Open G25 StudioGenetic Similarity Reports
Similarity reports rank modern or ancient populations by genetic distance to you. They answer the question: "Which samples in our database are most similar to me?"
Method used: Comparative distance analysis.
View Similarity ReportsHaplogroup Analysis
Haplogroup reports determine your mtDNA (maternal) and Y-DNA (paternal) lineages. These trace specific ancestral lines back thousands of years through mutations that define deep genetic branches.
Method used: Haplogroup determination via mutation analysis.
Explore HaplogroupsFrequently Asked Questions
Ancestry results are produced through statistical models that compare your DNA against reference populations. These models use clustering algorithms to identify patterns of genetic similarity, then estimate how your DNA distributes across different population clusters. Results are probabilistic estimates—not definitive statements about identity or origin.
Different models use different reference populations, different numbers of components (K values), and different algorithms. A K7 model groups genetic variation into 7 clusters; a K72 model uses 72. Neither is more "correct"—they simply partition the data differently. Think of it like viewing a landscape from different elevations: each view reveals different features. Results from different models are complementary perspectives, not contradictions.
Genetic distance measures how similar or different two DNA samples are. Smaller distances indicate greater similarity. Crucially, distance is a measure of genetic similarity—not ancestry percentage or geographic origin. A distance of 0.02 does not mean "2% ancestry." Two populations can be genetically close without sharing recent ancestors, due to parallel evolution, ancient common ancestry, or similar demographic histories.
As science advances, we improve our reference panels and algorithms. When we update our models to reflect current research, your results may shift. This is not an error—it reflects genuine methodological improvement. Your DNA has not changed; our ability to interpret it has. Earlier results were not "wrong"; they were the best available at the time.
Small percentages (typically below 5%) should be interpreted with caution. They may represent genuine distant ancestry, but they can also reflect statistical noise, model artifacts, or limitations in reference panel coverage. If a small percentage is meaningful, it should ideally appear consistently across multiple models. A single small percentage in one model is not strong evidence on its own.
Admixture analysis models your DNA as a mixture of ancestral components, producing percentage breakdowns. Genetic similarity analysis calculates how close your DNA is to various reference samples, producing ranked lists. Admixture answers "What is the composition?" while similarity answers "Who is closest?" Both are valuable and complementary—neither is superior.
Have questions not covered here?
Contact Us
Understanding Methods
Leads to Better Insight
You now understand how results are built. The next step is exploring them—with the clarity that comes from knowing what the numbers mean and what they don't.
Understanding methods leads to better insight.