Menu
Skip to main content
Get 40% off with code WELCOME40 � Ends Feb 26
STATISTICAL GENOMICS SERVICE

Unlock your genome's hidden variants

Genotype imputation uses linkage disequilibrium patterns from reference populations to statistically predict the ~29 million variants your DNA chip didn't directly measure�enabling finer-resolution ancestry analysis, archaeological comparisons, and academic research workflows.

0.00
One-time payment No subscription required
1000 Genomes Phase 3
r� > 0.3 quality threshold
~24h average turnaround
Your Imputation Results Preview
Input
~700K SNPs
Processing
Beagle 5.4 Imputation
Output
~30M Variants

Why impute your DNA data?

Your raw DNA chip captures ~700K SNPs�less than 3% of common human variation. Imputation predicts the remaining variants, unlocking advanced analysis capabilities.

Higher-Resolution PCA

More variants means finer population clustering in principal component analysis. Distinguish closely related populations that appear merged with chip-only data.

EIGENSOFT smartpca PLINK PCA

Archaeological Comparisons

Ancient DNA studies use imputed data for qpAdm, f-statistics, and admixture modeling. Match the SNP density of published aDNA datasets.

ADMIXTOOLS qpAdm f3/f4 stats

Academic Research Workflows

Imputed data integrates seamlessly with standard bioinformatics pipelines. Output files are PLINK-compatible and ready for downstream analysis.

PLINK VCF/BCF BEAGLE

Third-Party Platform Compatibility

Many ancestry platforms benefit from denser SNP coverage. Improve your results on sites that accept imputed or high-density raw files.

GEDmatch MyTrueAncestry Illustrative DNA

What is genotype imputation?

A statistical method that leverages population-level haplotype patterns to predict genetic variants not directly genotyped by your DNA chip.

The Simple Version

Your DNA chip tests specific positions (SNPs) across your genome�typically 600K�900K sites. But the human genome has over 80 million known variants. Imputation fills in the gaps by comparing your tested SNPs against a reference panel of fully sequenced individuals. Because genetic variants are inherited together in blocks (called linkage disequilibrium), we can statistically infer what's in the untested regions with high confidence.

Technical Deep Dive For Researchers

The Imputation Process

Modern genotype imputation is a two-stage process involving phasing and imputation:

1. Haplotype Phasing

Raw genotype data contains unphased diploid calls (e.g., A/G). Phasing algorithms determine which alleles were inherited together on each chromosome.

  • Uses Hidden Markov Models (HMM)
  • Leverages population LD patterns
  • Outputs phased haplotypes

2. Imputation Engine

The phased haplotypes are compared against a reference panel of sequenced individuals to predict untyped variants.

  • Li & Stephens HMM framework
  • Probabilistic allele dosages
  • Quality scores (r�/INFO)

3. Reference Panel

The reference panel provides the haplotype templates. Larger, more diverse panels yield better accuracy.

  • 1000 Genomes Phase 3
  • 2,504 individuals, 26 populations
  • ~84M variants (post-QC)

4. Quality Control

Not all imputed variants are equally reliable. We filter based on quality metrics for high-confidence calls.

  • r� (squared correlation) threshold
  • Minor allele frequency filters
  • Strand alignment checks
# Conceptual imputation workflow
Input: raw_genotypes.txt (700K SNPs, unphased)
Step 1: Phase with Eagle/SHAPEIT ? phased_haplotypes.vcf
Step 2: Impute with Beagle 5.4 against 1000G ? imputed_dosages.vcf
Step 3: Filter r� > 0.3, MAF > 0.001 ? qc_passed.vcf
Output: 30M_imputed.txt (30M SNPs, hard calls)

Our Imputation Pipeline

Research-grade bioinformatics workflow using industry-standard tools

Data Processing Flow
Input File
Your raw DNA data
~700K SNPs
Phasing
Haplotype estimation
Eagle / SHAPEIT
Imputation
Reference panel matching
Beagle 5.4
Quality Control
Filter low-confidence
r� > 0.3
Output Files
Research-ready data
~30M SNPs
Technical Specifications

Reference Panel

1000 Genomes Phase 3

2,504 samples from 26 global populations with 84.4M variants

Genome Build

GRCh37 / hg19

Human genome reference assembly, chromosomes 1�22 (autosomes only)

Quality Threshold

r� = 0.3

Imputation quality filter; higher r� = more confident genotype calls

Output Format

23andMe-style TXT

Tab-delimited format; convertible to PLINK, VCF, or other formats

Before & After Imputation

See the dramatic increase in genomic coverage

Before Imputation
~700K
SNPs from your chip

~2.3% of common variants

After Imputation
~30M
Imputed variants

~43� more variants

Compatible Downstream Tools

PLINK 1.9 / 2.0 EIGENSOFT ADMIXTOOLS 2 ADMIXTURE smartpca qpAdm BCFtools VCFtools

What You'll Receive

Two output files optimized for different use cases, with clear documentation

Full Dataset ~250 MB (.zip)

30M SNPs � Research-Grade

  • Complete imputed output (r� = 0.3)
  • 23andMe-style format, convertible to PLINK
  • Suitable for PCA, admixture, f-statistics
  • GRCh37/hg19, chromosomes 1�22
File Format Preview
# rsid chromosome position genotype
rs12345678   1   123456   AG
rs23456789   1   234567   CC
rs34567890   1   345678   TT
# ... ~30 million rows
High-Confidence ~20 MB (.zip)

2.5M SNPs � Optimized Subset

  • High-confidence variants (r� = 0.8)
  • Optimized for third-party platforms
  • Fast uploads, quick processing
  • Same format, smaller file size

Best for: GEDmatch, MyTrueAncestry, and other sites where you need a balance of density and file size.

Compatible DNA Providers

Imputation quality depends on your chip's SNP density. Higher-density chips yield better results.

Optimal
700K+ SNPs
AncestryDNA v2 23andMe v5 MyHeritage v2 LivingDNA Nebula 30x* Dante Labs*
Good
500K�700K SNPs
23andMe v4 AncestryDNA v1 FamilyTreeDNA MyHeritage v1 Gene by Gene 24Genetics
Standard
<500K SNPs
23andMe v3 National Geographic WeGene Genera TellMeGen Adntro

* WGS providers require conversion to RAW format first. Use our WGS to RAW service.

Important Technical Limitations

Imputation is powerful but not perfect. Here's what you should know.

Expected Error Rates

Imputation accuracy varies by allele frequency. Common variants (MAF > 5%) are imputed with ~98% accuracy; rare variants (MAF < 1%) have higher error rates.

Autosomes Only

We impute chromosomes 1�22 only. mtDNA and Y-DNA are not imputed as they require specialized reference panels and analysis pipelines.

GRCh37/hg19 Reference

All coordinates are in GRCh37 (hg19) assembly. If you need GRCh38, you'll need to liftover the positions using tools like CrossMap or UCSC liftOver.

Not for Health/Medical Use

Imputed genotypes are statistical estimates, not direct measurements. They are not suitable for clinical decision-making or health-related interpretations.

Expected Accuracy by Minor Allele Frequency
MAF Range Variant Class Typical Accuracy Notes
> 5% Common ~98�99% Excellent imputation quality
1�5% Low frequency ~95�98% Good quality, some uncertainty
0.1�1% Rare ~85�95% Increased error rate expected
< 0.1% Very rare Variable Often filtered out (r� < 0.3)

Simple Pricing

One order includes both output files. No subscription.

DNA Imputation Service

0.00
One-time payment � No subscription
  • Full 30M SNP file (r� = 0.3)
  • High-confidence 2.5M file (r� = 0.8)
  • 1000 Genomes Phase 3 reference panel
  • ~24 hour average turnaround
  • Secure processing & private download
  • Email support for questions
  • Files deleted after 15 days
Order Now

Legal & Disclaimer

Please review before ordering

Unlock your genome's hidden variants

Get research-grade imputed data for advanced ancestry analysis and academic workflows.