Menu
Skip to main content

Why Methods Matter

Genetic ancestry results do not appear on their own.

Every result you see—whether a percentage, a distance score, or a population match—is produced through models, reference data, and assumptions. These methods are not invisible machinery; they are choices that shape what the results can and cannot tell you.

Understanding those methods is essential to interpreting results responsibly. Without this understanding, it is easy to read too much into numbers, or to miss what they actually represent.

This page exists because we believe that transparency about methodology builds trust—and because informed users make better sense of their results.

Clarity prevents misinterpretation.

Core Principles of Our Analysis

Before techniques, there are principles. These guide every model we build and every result we present.

Population Genetics Over Labels

We work with genetic patterns, not ethnic categories. Labels like "European" or "African" are geographic conveniences—the genetics are far more complex and continuous.

Models Over Categories

Our results come from statistical models, not fixed categories. Models are tools for understanding—they illuminate patterns, but they don't define who you are.

Transparency Over Simplification

We would rather explain complexity honestly than hide it behind simple-seeming numbers. Simplification that misleads is worse than complexity that informs.

Context Over Certainty

Results gain meaning through context: the model used, the reference populations included, the questions being asked. Without context, numbers can mislead.

Continuous Improvement

Science evolves. Reference datasets grow. Methods improve. We update our models as the field advances, which means results may change over time—and that is a feature, not a flaw.

Methods explain what results can say—and what they cannot.

Data Sources & Reference Panels

Every model reflects the data it is built on. Understanding our sources helps interpret what results mean.

Modern Population References

We use reference samples from global populations, drawn from peer-reviewed datasets including the Human Genome Diversity Project (HGDP), the 1000 Genomes Project, and curated academic collections.

Modern references represent current genetic diversity—not historical populations.

Ancient DNA Datasets

Where applicable, we incorporate ancient DNA from archaeological samples. These provide windows into past genetic variation, but coverage is uneven—some regions and time periods have more samples than others.

Ancient DNA is fragmentary by nature; not all analyses can include it.

Publicly Available, Peer-Reviewed Sources

Our reference panels draw primarily from publicly available, peer-reviewed datasets. This allows reproducibility and enables researchers to understand the basis of our models.

We do not use proprietary or inaccessible data without clear documentation.

Known Coverage Limitations

Global genetic sampling is uneven. European and East Asian populations are overrepresented; many African, Indigenous American, and Pacific populations are underrepresented. This affects model resolution.

Results are more precise for well-sampled regions; less precise for others.

Every model reflects the data it is built on.

Modeling Approaches Used

Different methods ask different questions. Here is how the main approaches work.

Admixture models use clustering algorithms to identify groups of individuals who share similar genetic patterns. Your DNA is then compared against these clusters to estimate how it distributes across them.

The "K" in model names (K7, K12, K72) refers to the number of ancestral components. Higher K does not mean "more accurate"—it means the genetic variation is being split into more groups. Different K values offer different resolutions.

What it shows: Statistical similarity to reference clusters.
What it does not show: Where your ancient matches "came from" in a historical sense.

Your Ancestry K5 Model Components Cluster A - 30% Cluster B - 24% Cluster C - 20% Cluster D - 16% Cluster E - 10% Higher K values split the same data into more clusters

Coordinate systems like G25 reduce complex genetic data into a smaller number of dimensions (25 in the case of G25). These dimensions capture the major axes of genetic variation across global populations.

Your position in this space reflects your genetic similarity to other samples. Samples that cluster together are genetically similar; samples far apart are more different.

What it shows: Your position relative to reference populations in genetic space.
What it does not show: Direct ancestry relationships or historical migrations.

PC1 (Major axis of variation) PC2 YOU Population A Population B Population C Population D Your Position Proximity = genetic similarity, not geographic origin

Distance-based methods calculate how similar or different your DNA is from reference samples. This produces ranked lists of closest matches—populations or ancient individuals whose genetic profiles are most similar to yours.

Genetic distance is a measure of similarity, not a measure of ancestry percentage or geographic origin. Two populations can be genetically close for many reasons, including shared ancient ancestry, recent gene flow, or similar demographic histories.

What it shows: Which samples are most genetically similar to you.
What it does not show: That you "descend from" those populations directly.

Ranked Closest Populations by Genetic Distance Population A 0.0128 Population B 0.0156 Population C 0.0201 Population D 0.0267 Population E 0.0345 Shorter bars = closer genetic distance = greater similarity

Many analyses work by projecting your sample onto a reference space that was built from curated populations. Your coordinates or percentages are estimated based on how you fit within this pre-defined space.

This means results depend on which populations were used to build the reference. If a population similar to your actual ancestry was not included, you may appear as a mixture of the closest available alternatives.

What it shows: How you fit within the reference framework.
What it does not show: Populations outside the reference that might be closer matches.

Reference Space Boundary Ref A Ref B Ref C Ref D True pos. projected YOU Your position is estimated within the boundaries of the reference panel

Haplogroups are defined by specific mutations in mitochondrial DNA (mtDNA) or the Y-chromosome. They represent deep lineages that can be traced back tens of thousands of years—your direct maternal line (mtDNA) and direct paternal line (Y-DNA).

Unlike admixture analysis, which considers your entire genome, haplogroups trace single lineages. They tell you about one line of ancestors, not your full ancestry.

What it shows: The deep lineage of your direct maternal/paternal line.
What it does not show: Your full ancestry (only ~0.01% of your genome).

Y-DNA (Paternal) R R1a R1b L21 U106 YOUR HAPLOGROUP P312 mtDNA (Maternal) H H1 H3 H5 YOUR HAPLOGROUP Mutations accumulate over thousands of years, defining branches ~50,000 years ago Present

Advanced Analysis Methods

Beyond standard admixture percentages, we employ chromosome-level analysis techniques that reveal deeper layers of your genetic heritage.

What Sets These Methods Apart

Most ancestry services stop at population percentages. We go further—analyzing your DNA at the chromosome level, separating parental contributions, and identifying actual shared DNA segments with ancient and modern individuals. These techniques require significantly more computational resources and scientific expertise, but they provide insights that percentage-based approaches simply cannot offer.

Chromosome-level analysis reveals what percentages alone cannot show.

Coordinates, Components, and Distances

These concepts appear throughout our tools. Understanding them unlocks deeper interpretation.

Coordinates

Coordinates represent your position in a reduced genetic space. In G25, for example, your DNA is represented by 25 numbers that capture how you relate to global genetic variation.

These are not geographic coordinates. They are mathematical positions derived from genetic data. Two individuals with similar coordinates are genetically similar.

Key insight: Coordinates describe genetic position, not geographic origin or historical movement.

Components

Components are the building blocks of admixture models. A K12 model has 12 components; your percentages show how your DNA distributes across them.

Components are statistical constructs derived from clustering algorithms—they are not ancestral populations. A component labeled "Northern European" represents a cluster of genetic variation, not a historical population called "Northern Europeans."

Key insight: Components are patterns in data, not populations or identities.

Genetic Distance

Genetic distance measures how similar or different two samples are. Smaller distances mean greater similarity; larger distances mean greater difference.

Distance is calculated from genetic markers—the more variants two samples share, the smaller the distance between them.

Critical clarification: Distance is a measure of similarity—not ancestry percentage. A distance of 0.02 does not mean "2% ancestry." It means the samples are genetically close.

Uncertainty, Limits, and Change

Scientific honesty includes acknowledging limits. Here is what every user should understand.

Why Results Have Uncertainty

All ancestry estimates are probabilistic. They depend on which genetic markers are analyzed, which reference populations are used, and how the model distributes ambiguous signals. Small percentages especially should be treated with caution—they may reflect statistical noise rather than meaningful ancestry.

Why Different Models Give Different Views

A K7 model groups genetic variation into 7 clusters. A K72 model uses 72. Neither is "more correct"—they partition the same data differently. Similarly, different reference panels produce different results because they define the space differently. Results from different models are complementary perspectives, not contradictions.

Why Updates May Change Results

As science advances, reference panels expand and algorithms improve. We update our models to reflect current knowledge, which means your results may change over time. This is not an error—it reflects genuine improvement in the underlying methods. Earlier results were not "wrong"; they reflected what was known at the time.

Why No Model Is "Final"

No ancestry model represents absolute truth. Every model is a lens—a way of viewing genetic data that illuminates some patterns while obscuring others. The goal is not to find the "correct" model, but to use multiple perspectives to build a richer understanding. Certainty is not the aim; informed interpretation is.

Scientific honesty includes acknowledging limits.

How to Use Results Responsibly

Results are tools for understanding—not identity assignments. Here is how to interpret them well.

Do Not Treat Models as Identity Assignments

Ancestry results describe genetic patterns—they do not define who you are. A percentage labeled "Scandinavian" does not make you Scandinavian; it means your DNA clusters with reference samples from that region. Identity is cultural, personal, and historical—not determined by algorithms.

Compare Results Across Methods

No single model tells the complete story. Admixture percentages show one view; genetic distance rankings show another; haplogroups trace specific lineages. Use multiple methods together. Where they converge, you can have more confidence; where they differ, you have learned about the limits of each approach.

Use Context and History Together

Genetic results gain meaning when combined with family history, historical records, and geographic context. A result that seems surprising may make sense in light of known migration patterns or family stories. Genetics is one source of information—not the only source.

Avoid Conclusions from a Single Model

If one model shows an unexpected result, do not immediately assume it reveals hidden ancestry. Check other models. Consider whether the result might reflect model limitations, statistical noise, or reference panel gaps. Robust conclusions come from convergent evidence, not single data points.

How Methods Connect to Reports & Tools

Different tools use these methods in different ways. Here is how they relate.

Narrative Reports

Narrative reports like Deep Ancestry, Viking Heritage, and Celtic Heritage add an interpretation layer on top of raw calculations. They contextualize results within historical and cultural frameworks, making data more accessible.

Method used: Multiple approaches combined with historical context.

Explore Narrative Reports

HGDP & K-Models

K-models (HGDP K72, K7, K12, etc.) apply admixture analysis with different numbers of components. They show population structure—how your DNA distributes across genetic clusters defined by reference populations.

Method used: Admixture/clustering models with varying K values.

View HGDP K72

G25 Coordinates

G25 Coordinates represent your position in 25-dimensional genetic space. This is raw positional data that can be used in various downstream analyses, including distance calculations and ancestry modeling.

Method used: Coordinate-based dimensional reduction (PCA-derived).

Get Your Coordinates

G25 Studio

G25 Studio is an analytical environment where you can run custom models using your coordinates. Test different population combinations, calculate distances, and explore hypotheses about your ancestry.

Method used: User-directed distance analysis and modeling.

Open G25 Studio

Genetic Similarity Reports

Similarity reports rank modern or ancient populations by genetic distance to you. They answer the question: "Which samples in our database are most similar to me?"

Method used: Comparative distance analysis.

View Similarity Reports

Haplogroup Analysis

Haplogroup reports determine your mtDNA (maternal) and Y-DNA (paternal) lineages. These trace specific ancestral lines back thousands of years through mutations that define deep genetic branches.

Method used: Haplogroup determination via mutation analysis.

Explore Haplogroups

Frequently Asked Questions

Ancestry results are produced through statistical models that compare your DNA against reference populations. These models use clustering algorithms to identify patterns of genetic similarity, then estimate how your DNA distributes across different population clusters. Results are probabilistic estimates—not definitive statements about identity or origin.

Different models use different reference populations, different numbers of components (K values), and different algorithms. A K7 model groups genetic variation into 7 clusters; a K72 model uses 72. Neither is more "correct"—they simply partition the data differently. Think of it like viewing a landscape from different elevations: each view reveals different features. Results from different models are complementary perspectives, not contradictions.

Genetic distance measures how similar or different two DNA samples are. Smaller distances indicate greater similarity. Crucially, distance is a measure of genetic similarity—not ancestry percentage or geographic origin. A distance of 0.02 does not mean "2% ancestry." Two populations can be genetically close without sharing recent ancestors, due to parallel evolution, ancient common ancestry, or similar demographic histories.

As science advances, we improve our reference panels and algorithms. When we update our models to reflect current research, your results may shift. This is not an error—it reflects genuine methodological improvement. Your DNA has not changed; our ability to interpret it has. Earlier results were not "wrong"; they were the best available at the time.

Small percentages (typically below 5%) should be interpreted with caution. They may represent genuine distant ancestry, but they can also reflect statistical noise, model artifacts, or limitations in reference panel coverage. If a small percentage is meaningful, it should ideally appear consistently across multiple models. A single small percentage in one model is not strong evidence on its own.

Admixture analysis models your DNA as a mixture of ancestral components, producing percentage breakdowns. Genetic similarity analysis calculates how close your DNA is to various reference samples, producing ranked lists. Admixture answers "What is the composition?" while similarity answers "Who is closest?" Both are valuable and complementary—neither is superior.

Have questions not covered here?

Contact Us

Understanding Methods
Leads to Better Insight

You now understand how results are built. The next step is exploring them—with the clarity that comes from knowing what the numbers mean and what they don't.

Understanding methods leads to better insight.