What methods are used in DNA ancestry analysis?

DNA ancestry analysis uses several complementary methods: admixture modeling (percentage breakdowns), coordinate-based representations (like G25), genetic distance calculations, and haplogroup analysis for maternal/paternal lineages. Each method asks a different question of the data and provides a different perspective.

What are the limits of ancestry models?

All ancestry models have limits: they depend on which reference populations are included, they represent statistical patterns rather than historical migrations, and they cannot distinguish between similar populations. Results should be interpreted as showing genetic similarity to reference groups—not as definitive ancestry assignments.

What is genetic distance in ancestry analysis?

Genetic distance measures how similar or different two DNA samples are, based on shared genetic variants. Smaller distances indicate greater similarity. Distance is a measure of genetic similarity—not ancestry percentage or geographic origin.

How should I interpret ancestry percentages?

Ancestry percentages represent how your DNA statistically clusters relative to reference populations in the model. They are not measures of 'how much' of a population you 'are.' Percentages can vary between models, and small percentages may reflect statistical noise rather than meaningful ancestry.

What is the difference between IBS and IBD?

IBS (Identity by State) means two DNA segments have identical sequences. IBD (Identity by Descent) means those segments were inherited from a shared ancestor. All IBD is IBS, but not all IBS is IBD—short segments can match by chance. We detect IBS segments and use statistical methods to assess which are likely to be IBD, providing meaningful ancestral connections.

What is DNA phasing and Local Ancestry Inference?

Phasing separates your mixed DNA into maternal and paternal chromosomes, enabling parent-specific ancestry analysis. Local Ancestry Inference (LAI) assigns ancestry to specific chromosome segments, creating detailed chromosome paintings that show which ancestral populations contributed to different parts of your genome.

What methods are used in DNA ancestry analysis?

DNA ancestry analysis uses several complementary methods: admixture modeling (percentage breakdowns), coordinate-based representations (like G25), genetic distance calculations, haplogroup analysis, and advanced chromosome-level techniques like phasing, Local Ancestry Inference (LAI), and IBS/IBD segment analysis. Each method asks a different question of the data and provides a different perspective.

What are the limits of ancestry models?

All ancestry models have limits: they depend on which reference populations are included, they represent statistical patterns rather than historical migrations, and they cannot distinguish between similar populations. Results should be interpreted as showing genetic similarity to reference groups—not as definitive ancestry assignments.

What is genetic distance in ancestry analysis?

Genetic distance measures how similar or different two DNA samples are, based on shared genetic variants. Smaller distances indicate greater similarity. Distance is a measure of genetic similarity—not ancestry percentage or geographic origin. Two populations can be genetically close without sharing recent ancestors.

How should I interpret ancestry percentages?

Ancestry percentages represent how your DNA statistically clusters relative to reference populations in the model. They are not measures of 'how much' of a population you 'are.' Percentages can vary between models, and small percentages may reflect statistical noise rather than meaningful ancestry.

What is the difference between IBS and IBD?

IBS (Identity by State) means two DNA segments have identical sequences. IBD (Identity by Descent) means those segments were inherited from a shared ancestor. All IBD is IBS, but not all IBS is IBD—short segments can match by chance. We detect IBS segments and use statistical methods to assess which are likely to be IBD, providing meaningful ancestral connections.

What is DNA phasing and Local Ancestry Inference?

Phasing separates your mixed DNA into maternal and paternal chromosomes, enabling parent-specific ancestry analysis. Local Ancestry Inference (LAI) assigns ancestry to specific chromosome segments, creating detailed 'chromosome paintings' that show which ancestral populations contributed to different parts of your genome.

Analysis Methods Explained - How Ancestry Results Are Built

Foundation

Why Methods Matter

Genetic ancestry results do not appear on their own.

Every result you see—whether a percentage, a distance score, or a population match—is produced through models, reference data, and assumptions. These methods are not invisible machinery; they are choices that shape what the results can and cannot tell you.

Understanding those methods is essential to interpreting results responsibly. Without this understanding, it is easy to read too much into numbers, or to miss what they actually represent.

This page exists because we believe that transparency about methodology builds trust—and because informed users make better sense of their results.

Clarity prevents misinterpretation.

Guiding Framework

Core Principles of Our Analysis

Before techniques, there are principles. These guide every model we build and every result we present.

Population Genetics Over Labels

We work with genetic patterns, not ethnic categories. Labels like "European" or "African" are geographic conveniences—the genetics are far more complex and continuous.

Models Over Categories

Our results come from statistical models, not fixed categories. Models are tools for understanding—they illuminate patterns, but they don't define who you are.

Transparency Over Simplification

We would rather explain complexity honestly than hide it behind simple-seeming numbers. Simplification that misleads is worse than complexity that informs.

Context Over Certainty

Results gain meaning through context: the model used, the reference populations included, the questions being asked. Without context, numbers can mislead.

Continuous Improvement

Science evolves. Reference datasets grow. Methods improve. We update our models as the field advances, which means results may change over time—and that is a feature, not a flaw.

Methods explain what results can say—and what they cannot.

Inputs

Data Sources & Reference Panels

Every model reflects the data it is built on. Understanding our sources helps interpret what results mean.

Modern Population References

We use reference samples from global populations, drawn from peer-reviewed datasets including the Human Genome Diversity Project (HGDP), the 1000 Genomes Project, and curated academic collections.

Modern references represent current genetic diversity—not historical populations.

Ancient DNA Datasets

Where applicable, we incorporate ancient DNA from archaeological samples. These provide windows into past genetic variation, but coverage is uneven—some regions and time periods have more samples than others.

Ancient DNA is fragmentary by nature; not all analyses can include it.

Publicly Available, Peer-Reviewed Sources

Our reference panels draw primarily from publicly available, peer-reviewed datasets. This allows reproducibility and enables researchers to understand the basis of our models.

We do not use proprietary or inaccessible data without clear documentation.

Known Coverage Limitations

Global genetic sampling is uneven. European and East Asian populations are overrepresented; many African, Indigenous American, and Pacific populations are underrepresented. This affects model resolution.

Results are more precise for well-sampled regions; less precise for others.

Every model reflects the data it is built on.

Techniques

Modeling Approaches Used

Different methods ask different questions. Here is how the main approaches work.

Admixture & Clustering Models

Percentage breakdowns by population cluster

Admixture models use clustering algorithms to identify groups of individuals who share similar genetic patterns. Your DNA is then compared against these clusters to estimate how it distributes across them.

The "K" in model names (K7, K12, K72) refers to the number of ancestral components. Higher K does not mean "more accurate"—it means the genetic variation is being split into more groups. Different K values offer different resolutions.

What it shows: Statistical similarity to reference clusters.
What it does not show: Where your ancient matches "came from" in a historical sense.

Coordinate-Based Representations

Positioning in genetic space (G25, PCA)

Coordinate systems like G25 reduce complex genetic data into a smaller number of dimensions (25 in the case of G25). These dimensions capture the major axes of genetic variation across global populations.

Your position in this space reflects your genetic similarity to other samples. Samples that cluster together are genetically similar; samples far apart are more different.

What it shows: Your position relative to reference populations in genetic space.
What it does not show: Direct ancestry relationships or historical migrations.

Comparative Distance Analysis

Measuring genetic similarity through distance

Distance-based methods calculate how similar or different your DNA is from reference samples. This produces ranked lists of closest matches—populations or ancient individuals whose genetic profiles are most similar to yours.

Genetic distance is a measure of similarity, not a measure of ancestry percentage or geographic origin. Two populations can be genetically close for many reasons, including shared ancient ancestry, recent gene flow, or similar demographic histories.

What it shows: Which samples are most genetically similar to you.
What it does not show: That you "descend from" those populations directly.

Projection onto Reference Spaces

Placing your sample within established frameworks

Many analyses work by projecting your sample onto a reference space that was built from curated populations. Your coordinates or percentages are estimated based on how you fit within this pre-defined space.

This means results depend on which populations were used to build the reference. If a population similar to your actual ancestry was not included, you may appear as a mixture of the closest available alternatives.

What it shows: How you fit within the reference framework.
What it does not show: Populations outside the reference that might be closer matches.

Haplogroup Analysis

Tracing maternal and paternal lineages

Haplogroups are defined by specific mutations in mitochondrial DNA (mtDNA) or the Y-chromosome. They represent deep lineages that can be traced back tens of thousands of years—your direct maternal line (mtDNA) and direct paternal line (Y-DNA).

Unlike admixture analysis, which considers your entire genome, haplogroups trace single lineages. They tell you about one line of ancestors, not your full ancestry.

What it shows: The deep lineage of your direct maternal/paternal line.
What it does not show: Your full ancestry (only ~0.01% of your genome).

Distinctive Capabilities

Advanced Analysis Methods

Beyond standard admixture percentages, we employ chromosome-level analysis techniques that reveal deeper layers of your genetic heritage.

What Sets These Methods Apart

Most ancestry services stop at population percentages. We go further—analyzing your DNA at the chromosome level, separating parental contributions, and identifying actual shared DNA segments with ancient and modern individuals. These techniques require significantly more computational resources and scientific expertise, but they provide insights that percentage-based approaches simply cannot offer.

Phasing Parental Separation

Separating maternal from paternal chromosomes

Your DNA is a mixture—you inherited one copy of each chromosome from your mother and one from your father. Standard ancestry tests analyze this mixture as a whole. Phasing computationally separates these parental contributions, creating two distinct genomic profiles from a single test.

This separation enables parent-specific ancestry analysis: you can see which ancestral components came from which side of your family, even without testing your parents. When combined with Local Ancestry Inference, phasing reveals the geographic origins of each parental chromosome.

What it enables: Parent-specific ancestry percentages, clearer inheritance patterns, and more accurate chromosome painting.
Technical note: Statistical phasing uses population patterns; duo/trio phasing with family members provides higher accuracy.

Used in: DNA Phasing Service, Contemporary Populations

Local Ancestry Inference (LAI) Chromosome-Level

Ancestry assignment along each chromosome

While global admixture gives you overall percentages, Local Ancestry Inference (LAI) assigns ancestry to specific segments along each chromosome. This creates a "chromosome painting"—a visual map showing where different ancestral populations contributed to different parts of your genome.

LAI is particularly valuable for individuals with mixed ancestry, as it shows precisely which chromosomal regions carry which ancestral signals. Combined with phasing, you can see which parent contributed each ancestral segment.

What it shows: Ancestry painted onto specific chromosome locations.
What it requires: Phased data for parent-specific painting; works best with sufficient reference coverage.

Used in: Contemporary Populations

Chromosome Browser Analysis Segment-Level

Visualizing DNA segments across all 22 autosomes

Chromosome browser technology displays your DNA as visual segments across all 22 autosomal chromosomes. When comparing to reference samples—ancient or modern—matching segments are highlighted, showing exactly where on your genome you share DNA.

This goes beyond percentages to show the actual structure of shared genetic material. Longer segments indicate more recent common ancestry; shorter segments suggest more distant connections (though short segments require careful interpretation).

What it shows: Exact genomic locations where you share DNA with reference samples.
What to consider: Segment length matters—longer segments are more informative.

IBS: Identity by State Segment Matching

Detecting identical DNA sequences

Identity by State (IBS) identifies DNA segments where two individuals have identical genetic sequences. This is the foundation of all segment-based matching—we first detect where sequences match, then analyze what those matches might mean.

IBS is a powerful detection tool, but matching sequences can occur for multiple reasons: shared ancestry (IBD), chance alignment in common population variants, or convergent patterns. Longer IBS segments are more likely to indicate true shared ancestry.

What it detects: Identical DNA sequences between samples.
Important distinction: All IBD is IBS, but not all IBS is IBD. Interpretation requires statistical analysis.

Used in: Shared Roots Ancient DNA Matching

IBD: Identity by Descent Ancestral Connection

Segments inherited from a common ancestor

Identity by Descent (IBD) refers to DNA segments that two individuals inherited from a shared ancestor. Unlike IBS (which just means sequences match), IBD implies a genealogical connection—somewhere in both individuals' family trees, a common ancestor passed down that segment.

Identifying true IBD from IBS requires statistical inference: segment length, SNP density, population background rates, and the number of segments all factor into confidence assessment. Longer segments across multiple chromosomes provide stronger evidence.

What it indicates: A likely shared ancestor who passed down this DNA segment.
Confidence factors: Segment length, count across chromosomes, and statistical comparison to background rates.

Used in: Shared Roots Ancient DNA Matching

Chromosome-level analysis reveals what percentages alone cannot show.

Key Concepts

Coordinates, Components, and Distances

These concepts appear throughout our tools. Understanding them unlocks deeper interpretation.

Coordinates

Coordinates represent your position in a reduced genetic space. In G25, for example, your DNA is represented by 25 numbers that capture how you relate to global genetic variation.

These are not geographic coordinates. They are mathematical positions derived from genetic data. Two individuals with similar coordinates are genetically similar.

Key insight: Coordinates describe genetic position, not geographic origin or historical movement.

Components

Components are the building blocks of admixture models. A K12 model has 12 components; your percentages show how your DNA distributes across them.

Components are statistical constructs derived from clustering algorithms—they are not ancestral populations. A component labeled "Northern European" represents a cluster of genetic variation, not a historical population called "Northern Europeans."

Key insight: Components are patterns in data, not populations or identities.

Genetic Distance

Genetic distance measures how similar or different two samples are. Smaller distances mean greater similarity; larger distances mean greater difference.

Distance is calculated from genetic markers—the more variants two samples share, the smaller the distance between them.

Critical clarification: Distance is a measure of similarity—not ancestry percentage. A distance of 0.02 does not mean "2% ancestry." It means the samples are genetically close.

Scientific Honesty

Uncertainty, Limits, and Change

Scientific honesty includes acknowledging limits. Here is what every user should understand.

Why Results Have Uncertainty

All ancestry estimates are probabilistic. They depend on which genetic markers are analyzed, which reference populations are used, and how the model distributes ambiguous signals. Small percentages especially should be treated with caution—they may reflect statistical noise rather than meaningful ancestry.

Why Different Models Give Different Views

A K7 model groups genetic variation into 7 clusters. A K72 model uses 72. Neither is "more correct"—they partition the same data differently. Similarly, different reference panels produce different results because they define the space differently. Results from different models are complementary perspectives, not contradictions.

Why Updates May Change Results

As science advances, reference panels expand and algorithms improve. We update our models to reflect current knowledge, which means your results may change over time. This is not an error—it reflects genuine improvement in the underlying methods. Earlier results were not "wrong"; they reflected what was known at the time.

Why No Model Is "Final"

No ancestry model represents absolute truth. Every model is a lens—a way of viewing genetic data that illuminates some patterns while obscuring others. The goal is not to find the "correct" model, but to use multiple perspectives to build a richer understanding. Certainty is not the aim; informed interpretation is.

Scientific honesty includes acknowledging limits.

Guidance

How to Use Results Responsibly

Results are tools for understanding—not identity assignments. Here is how to interpret them well.

Do Not Treat Models as Identity Assignments

Ancestry results describe genetic patterns—they do not define who you are. A percentage labeled "Scandinavian" does not make you Scandinavian; it means your DNA clusters with reference samples from that region. Identity is cultural, personal, and historical—not determined by algorithms.

Compare Results Across Methods

No single model tells the complete story. Admixture percentages show one view; genetic distance rankings show another; haplogroups trace specific lineages. Use multiple methods together. Where they converge, you can have more confidence; where they differ, you have learned about the limits of each approach.

Use Context and History Together

Genetic results gain meaning when combined with family history, historical records, and geographic context. A result that seems surprising may make sense in light of known migration patterns or family stories. Genetics is one source of information—not the only source.

Avoid Conclusions from a Single Model

If one model shows an unexpected result, do not immediately assume it reveals hidden ancestry. Check other models. Consider whether the result might reflect model limitations, statistical noise, or reference panel gaps. Robust conclusions come from convergent evidence, not single data points.

Ecosystem

How Methods Connect to Reports & Tools

Different tools use these methods in different ways. Here is how they relate.

Narrative Reports

Narrative reports like Deep Ancestry, Viking Heritage, and Celtic Heritage add an interpretation layer on top of raw calculations. They contextualize results within historical and cultural frameworks, making data more accessible.

Method used: Multiple approaches combined with historical context.

Explore Narrative Reports

HGDP & K-Models

K-models (HGDP K72, K7, K12, etc.) apply admixture analysis with different numbers of components. They show population structure—how your DNA distributes across genetic clusters defined by reference populations.

Method used: Admixture/clustering models with varying K values.

View HGDP K72

G25 Coordinates

G25 Coordinates represent your position in 25-dimensional genetic space. This is raw positional data that can be used in various downstream analyses, including distance calculations and ancestry modeling.

Method used: Coordinate-based dimensional reduction (PCA-derived).

Get Your Coordinates

G25 Studio

G25 Studio is an analytical environment where you can run custom models using your coordinates. Test different population combinations, calculate distances, and explore hypotheses about your ancestry.

Method used: User-directed distance analysis and modeling.

Open G25 Studio

Genetic Similarity Reports

Similarity reports rank modern or ancient populations by genetic distance to you. They answer the question: "Which samples in our database are most similar to me?"

Method used: Comparative distance analysis.

View Similarity Reports

Haplogroup Analysis

Haplogroup reports determine your mtDNA (maternal) and Y-DNA (paternal) lineages. These trace specific ancestral lines back thousands of years through mutations that define deep genetic branches.

Method used: Haplogroup determination via mutation analysis.

Explore Haplogroups

Questions

Frequently Asked Questions

How are ancestry results calculated?

Ancestry results are produced through statistical models that compare your DNA against reference populations. These models use clustering algorithms to identify patterns of genetic similarity, then estimate how your DNA distributes across different population clusters. Results are probabilistic estimates—not definitive statements about identity or origin.

Why do different models give different results?

Different models use different reference populations, different numbers of components (K values), and different algorithms. A K7 model groups genetic variation into 7 clusters; a K72 model uses 72. Neither is more "correct"—they simply partition the data differently. Think of it like viewing a landscape from different elevations: each view reveals different features. Results from different models are complementary perspectives, not contradictions.

What does genetic distance actually mean?

Genetic distance measures how similar or different two DNA samples are. Smaller distances indicate greater similarity. Crucially, distance is a measure of genetic similarity—not ancestry percentage or geographic origin. A distance of 0.02 does not mean "2% ancestry." Two populations can be genetically close without sharing recent ancestors, due to parallel evolution, ancient common ancestry, or similar demographic histories.

Why might my results change over time?

As science advances, we improve our reference panels and algorithms. When we update our models to reflect current research, your results may shift. This is not an error—it reflects genuine methodological improvement. Your DNA has not changed; our ability to interpret it has. Earlier results were not "wrong"; they were the best available at the time.

Should I trust small percentages in my results?

Small percentages (typically below 5%) should be interpreted with caution. They may represent genuine distant ancestry, but they can also reflect statistical noise, model artifacts, or limitations in reference panel coverage. If a small percentage is meaningful, it should ideally appear consistently across multiple models. A single small percentage in one model is not strong evidence on its own.

What is the difference between admixture and genetic similarity?

Admixture analysis models your DNA as a mixture of ancestral components, producing percentage breakdowns. Genetic similarity analysis calculates how close your DNA is to various reference samples, producing ranked lists. Admixture answers "What is the composition?" while similarity answers "Who is closest?" Both are valuable and complementary—neither is superior.

Have questions not covered here?

Contact Us

Why Methods Matter

Core Principles of Our Analysis

Population Genetics Over Labels

Models Over Categories

Transparency Over Simplification

Context Over Certainty

Continuous Improvement

Data Sources & Reference Panels

Modern Population References

Ancient DNA Datasets

Publicly Available, Peer-Reviewed Sources

Known Coverage Limitations

Modeling Approaches Used

Admixture & Clustering Models

Coordinate-Based Representations

Comparative Distance Analysis

Projection onto Reference Spaces

Haplogroup Analysis

Advanced Analysis Methods

What Sets These Methods Apart

Phasing Parental Separation

Local Ancestry Inference (LAI) Chromosome-Level

Chromosome Browser Analysis Segment-Level

IBS: Identity by State Segment Matching

IBD: Identity by Descent Ancestral Connection

Coordinates, Components, and Distances

Coordinates

Components

Genetic Distance

Uncertainty, Limits, and Change

Why Results Have Uncertainty

Why Different Models Give Different Views

Why Updates May Change Results

Why No Model Is "Final"

How to Use Results Responsibly

Do Not Treat Models as Identity Assignments

Compare Results Across Methods

Use Context and History Together

Avoid Conclusions from a Single Model

How Methods Connect to Reports & Tools

Narrative Reports

HGDP & K-Models

G25 Coordinates

G25 Studio

Genetic Similarity Reports

Haplogroup Analysis

Frequently Asked Questions

Understanding Methods Leads to Better Insight

Explore More Analysis Tools

DNA Analysis Studio

G25 Studio

AI Assistant

Understanding Methods
Leads to Better Insight