Menu
Skip to main content
Get 30% off with code WELCOME30 — Ends Mar 30
STATISTICAL GENOMICS SERVICE

Unlock your genome's hidden variants

Genotype imputation uses linkage disequilibrium patterns from reference populations to statistically predict the ~29 million variants your DNA chip didn't directly measure—enabling finer-resolution ancestry analysis, archaeological comparisons, and academic research workflows.

21.00
One-time payment No subscription required
1000 Genomes Phase 3
GP ≥ 99.5% confidence filter
~24h average turnaround
Your Imputation Results Preview
Input
~700K SNPs
Processing
Beagle 5.4 Imputation
Output
~30M Variants

Why impute your DNA data?

Your raw DNA chip captures ~700K SNPs—less than 3% of common human variation. Imputation predicts the remaining variants, unlocking advanced analysis capabilities.

Higher-Resolution PCA

More variants means finer population clustering in principal component analysis. Distinguish closely related populations that appear merged with chip-only data.

EIGENSOFT smartpca PLINK PCA

Archaeological Comparisons

Ancient DNA studies use imputed data for qpAdm, f-statistics, and admixture modeling. Match the SNP density of published aDNA datasets.

ADMIXTOOLS qpAdm f3/f4 stats

Academic Research Workflows

Imputed data integrates seamlessly with standard bioinformatics pipelines. Output files are PLINK-compatible and ready for downstream analysis.

PLINK VCF/BCF BEAGLE

Third-Party Platform Compatibility

Many ancestry platforms benefit from denser SNP coverage. Improve your results on sites that accept imputed or high-density raw files.

GEDmatch MyTrueAncestry Illustrative DNA

What is genotype imputation?

A statistical method that leverages population-level haplotype patterns to predict genetic variants not directly genotyped by your DNA chip.

The Simple Version

Your DNA chip tests specific positions (SNPs) across your genome—typically 600K–900K sites. But the human genome has over 80 million known variants. Imputation fills in the gaps by comparing your tested SNPs against a reference panel of fully sequenced individuals. Because genetic variants are inherited together in blocks (called linkage disequilibrium), we can statistically infer what's in the untested regions with high confidence.

Technical Deep Dive For Researchers

The Imputation Process

Modern genotype imputation is a two-stage process involving phasing and imputation:

1. Pre-processing

The raw DNA file is converted to VCF format against the GRCh37 reference. Several normalisation steps run before Beagle sees any data.

  • Y and MT chromosomes removed
  • Haploid calls normalised to diploid
  • Isolated edge markers trimmed

2. Imputation Engine

Beagle simultaneously phases and imputes the pre-processed genotypes against the reference panel in a single pass, chromosome by chromosome.

  • Li & Stephens HMM framework
  • Probabilistic allele dosages
  • Genotype probabilities (GP) per site

3. Reference Panel

The reference panel provides the haplotype templates. Larger, more diverse panels yield better accuracy.

  • 1000 Genomes Phase 3
  • 2,504 individuals, 26 populations
  • ~84M variants (post-QC)

4. Quality Filtering

Two output tiers are produced. The high-confidence file applies a genotype probability filter per imputed site.

  • Typed SNPs kept unconditionally
  • Imputed SNPs: max GP ≥ 0.995
  • Curated SNP panel extraction
# Actual imputation workflow
Input: raw_genotypes.txt (700K SNPs)
Step 1: Convert to VCF; remove Y/MT; normalise ploidy; trim edge markers
Step 2: Impute with Beagle 5.4 against 1000G (chr 1-22 + X) → imputed.vcf
Step 3: Filter imputed SNPs: max(GP) ≥ 0.995 → high_confidence.vcf
Step 4: Merge imputed data back with original typed SNPs
Output: *_L.zip (~30M SNPs, full) + *_S.zip (~2.5M SNPs, GP-filtered)

Our Imputation Pipeline

Research-grade bioinformatics workflow using industry-standard tools

Data Processing Flow
Input File
Your raw DNA data
~700K SNPs
Pre-processing
VCF conversion & filters
Remove Y/MT, fix ploidy
Imputation
Reference panel matching
Beagle 5.4 (chr 1-22 + X)
Confidence Filtering
GP-based QC
max(GP) ≥ 0.995
Merge with Original
Original SNPs re-added
RawDnaMerger
Output Files
Full + GP-filtered
~30M / ~2.5M SNPs
Technical Specifications

Reference Panel

1000 Genomes Phase 3

2,504 samples from 26 global populations with 84.4M variants

Genome Build

GRCh37 / hg19

Human genome reference assembly, chromosomes 1–22 and X

Confidence Filter

GP ≥ 0.995

High-confidence file: imputed SNPs kept only when Beagle assigns ≥99.5% posterior probability to one genotype. Typed SNPs are always kept.

Output Format

23andMe-style TXT

Tab-delimited format; convertible to PLINK, VCF, or other formats

Before & After Imputation

See the dramatic increase in genomic coverage

Before Imputation
~700K
SNPs from your chip

~2.3% of common variants

After Imputation
~30M
Imputed variants

~43× more variants

Compatible Downstream Tools

PLINK 1.9 / 2.0 EIGENSOFT ADMIXTOOLS 2 ADMIXTURE smartpca qpAdm BCFtools VCFtools

What You'll Receive

Two output files optimized for different use cases, with clear documentation

Full Dataset ~250 MB (.zip)

30M SNPs — Research-Grade

  • Complete imputed output (all variants retained)
  • 23andMe-style format, convertible to PLINK
  • Suitable for PCA, admixture, f-statistics
  • GRCh37/hg19, chromosomes 1–22 and X
File Format Preview
# rsid chromosome position genotype
rs12345678   1   123456   AG
rs23456789   1   234567   CC
rs34567890   1   345678   TT
# ... ~30 million rows
High-Confidence ~20 MB (.zip)

~2.5M SNPs — High-Confidence Subset

  • Typed SNPs kept unconditionally
  • Imputed SNPs: max(GP) ≥ 0.995
  • Optimized for third-party platforms
  • Same format, smaller file size

Best for: GEDmatch, MyTrueAncestry, and other sites where you need a balance of density and file size.

Compatible DNA Providers

Imputation quality depends on your chip's SNP density. Higher-density chips yield better results.

Optimal
700K+ SNPs
AncestryDNA v2 23andMe v5 MyHeritage v2 LivingDNA Nebula 30x* Dante Labs*
Good
500K–700K SNPs
23andMe v4 AncestryDNA v1 FamilyTreeDNA MyHeritage v1 Gene by Gene 24Genetics
Standard
<500K SNPs
23andMe v3 National Geographic WeGene Genera TellMeGen Adntro

* WGS providers require conversion to RAW format first. Use our WGS to RAW service.

Important Technical Limitations

Imputation is powerful but not perfect. Here's what you should know.

Expected Error Rates

Imputation accuracy varies by allele frequency. Common variants (MAF > 5%) are imputed with ~98% accuracy; rare variants (MAF < 1%) have higher error rates.

Autosomes Only

We impute chromosomes 1–22 and X. Y-DNA and mtDNA are not imputed as they require specialised reference panels and analysis pipelines.

GRCh37/hg19 Reference

All coordinates are in GRCh37 (hg19) assembly. If you need GRCh38, you'll need to liftover the positions using tools like CrossMap or UCSC liftOver.

Not for Health/Medical Use

Imputed genotypes are statistical estimates, not direct measurements. They are not suitable for clinical decision-making or health-related interpretations.

Expected Accuracy by Minor Allele Frequency
MAF Range Variant Class Typical Accuracy Notes
> 5% Common ~98–99% Excellent imputation quality
1–5% Low frequency ~95–98% Good quality, some uncertainty
0.1–1% Rare ~85–95% Increased error rate expected
< 0.1% Very rare Variable Often filtered out (low GP)

Simple Pricing

One order includes both output files. No subscription.

DNA Imputation Service

21.00
One-time payment — No subscription
  • Full ~30M SNP file (all variants retained)
  • High-confidence ~2.5M file (GP-filtered)
  • 1000 Genomes Phase 3 reference panel
  • ~24 hour average turnaround
  • Secure processing & private download
  • Email support for questions
  • Files deleted after 15 days
Order Now

Legal & Disclaimer

Please review before ordering

Unlock your genome's hidden variants

Get research-grade imputed data for advanced ancestry analysis and academic workflows.