Get 25% off with code DISCOUNT25 — Ends Jul 12

STATISTICAL GENOMICS SERVICE

Unlock your genome's hidden variants

Genotype imputation uses linkage disequilibrium patterns from reference populations to statistically predict the ~29 million variants your DNA chip didn't directly measure—enabling finer-resolution ancestry analysis, archaeological comparisons, and academic research workflows.

€ 22.50

One-time payment No subscription required

Order Now Learn the science

1000 Genomes Phase 3

GP ≥ 99.5% confidence filter

~24h average turnaround

Your Imputation Results Preview

Input

~700K SNPs — HG19 or HG38

Processing

Beagle 5.4 Imputation (HG19 + HG38)

Output

~30M Variants in HG19 and HG38

Why Impute

Why impute your DNA data?

Your raw DNA chip captures ~700K SNPs—less than 3% of common human variation. Imputation predicts the remaining variants, unlocking advanced analysis capabilities.

Higher-Resolution PCA

More variants means finer population clustering in principal component analysis. Distinguish closely related populations that appear merged with chip-only data.

EIGENSOFT smartpca PLINK PCA

Archaeological Comparisons

Ancient DNA studies use imputed data for qpAdm, f-statistics, and admixture modeling. Match the SNP density of published aDNA datasets.

ADMIXTOOLS qpAdm f3/f4 stats

Academic Research Workflows

Imputed data integrates seamlessly with standard bioinformatics pipelines. Output files are PLINK-compatible and ready for downstream analysis.

PLINK VCF/BCF BEAGLE

Third-Party Platform Compatibility

Many ancestry platforms benefit from denser SNP coverage. Improve your results on sites that accept imputed or high-density raw files.

GEDmatch MyTrueAncestry Illustrative DNA

The Science

What is genotype imputation?

A statistical method that leverages population-level haplotype patterns to predict genetic variants not directly genotyped by your DNA chip.

The Simple Version

Your DNA chip tests specific positions (SNPs) across your genome—typically 600K–900K sites. But the human genome has over 80 million known variants. Imputation fills in the gaps by comparing your tested SNPs against a reference panel of fully sequenced individuals. Because genetic variants are inherited together in blocks (called linkage disequilibrium), we can statistically infer what's in the untested regions with high confidence.

Technical Deep Dive For Researchers

The Imputation Process

Modern genotype imputation is a two-stage process involving phasing and imputation:

1. Pre-processing

The input build (HG19 or HG38) is auto-detected and the raw DNA file is converted to VCF against the matching reference. Several normalisation steps run before Beagle sees any data.

Y and MT chromosomes removed
Haploid calls normalised to diploid
Isolated edge markers trimmed

2. Imputation Engine

Beagle simultaneously phases and imputes the pre-processed genotypes against the reference panel in a single pass, chromosome by chromosome.

Li & Stephens HMM framework
Probabilistic allele dosages
Genotype probabilities (GP) per site

3. Reference Panel

The reference panel provides the haplotype templates. Larger, more diverse panels yield better accuracy.

1000 Genomes Phase 3
2,504 individuals, 26 populations
~84M variants (post-QC)

4. Quality Filtering

Two output tiers are produced. The high-confidence file applies a genotype probability filter per imputed site.

Typed SNPs kept unconditionally
Imputed SNPs: max GP ≥ 0.995
Curated SNP panel extraction


                                # Actual imputation workflow

                                Input: raw_genotypes.txt (700K SNPs, HG19 or HG38 auto-detected)

                                Step 1: Convert to VCF against matching reference; remove Y/MT; normalise ploidy; trim edge markers

                                Step 2: Impute with Beagle 5.4 against 1000G (chr 1-22 + X) in both HG19 and HG38 → imputed.vcf

                                Step 3: Filter imputed SNPs: max(GP) ≥ 0.995 → high_confidence.vcf

                                Step 4: Merge imputed data back with original typed SNPs

                                Output: *_Large.zip (~30M SNPs, full, HG19 + HG38) + *_Small.zip (~2.5M SNPs, GP-filtered, HG19 + HG38)

Our Pipeline

Our Imputation Pipeline

Research-grade bioinformatics workflow using industry-standard tools

Data Processing Flow

Input File

Your raw DNA data

~700K SNPs

Pre-processing

VCF conversion & filters

Remove Y/MT, fix ploidy

Imputation

Beagle 5.4 - HG19 + HG38

chr 1-22 + X, both builds

Confidence Filtering

GP-based QC

max(GP) ≥ 0.995

Merge with Original

Original SNPs re-added

RawDnaMerger

Output Files

Full + GP-filtered

~30M / ~2.5M SNPs

Technical Specifications

Reference Panel

1000 Genomes Phase 3

2,504 samples from 26 global populations with 84.4M variants

Genome Build

GRCh37 (hg19) and GRCh38 (hg38)

Both builds are supported for input and output. Build is auto-detected from your file. Results are delivered in both HG19 and HG38, chromosomes 1–22 and X.

Confidence Filter

GP ≥ 0.995

High-confidence file: imputed SNPs kept only when Beagle assigns ≥99.5% posterior probability to one genotype. Typed SNPs are always kept.

Output Format

23andMe-style TXT

Tab-delimited format; convertible to PLINK, VCF, or other formats

Results

Before & After Imputation

See the dramatic increase in genomic coverage

Before Imputation

~700K

SNPs from your chip

~2.3% of common variants

After Imputation

~30M

Imputed variants

~43× more variants

Compatible Downstream Tools

PLINK 1.9 / 2.0 EIGENSOFT ADMIXTOOLS 2 ADMIXTURE smartpca qpAdm BCFtools VCFtools

Deliverables

What You'll Receive

Two output files optimized for different use cases, with clear documentation

Full Dataset ~250 MB (.zip)

30M SNPs — Research-Grade

Complete imputed output (all variants retained)
23andMe-style format, convertible to PLINK
Suitable for PCA, admixture, f-statistics
Output in both HG19 (GRCh37) and HG38 (GRCh38), chromosomes 1–22 and X

File Format Preview


                                    # rsid chromosome position genotype

                                    rs12345678   1   123456   AG

                                    rs23456789   1   234567   CC

                                    rs34567890   1   345678   TT

                                    # ... ~30 million rows

High-Confidence ~20 MB (.zip)

~2.5M SNPs — High-Confidence Subset

Typed SNPs kept unconditionally
Imputed SNPs: max(GP) ≥ 0.995
Optimized for third-party platforms
Same format, smaller file size

Best for: GEDmatch, MyTrueAncestry, and other sites where you need a balance of density and file size.

Compatibility

Compatible DNA Providers

Imputation quality depends on your chip's SNP density. Higher-density chips yield better results.

Optimal

700K+ SNPs

AncestryDNA v2 23andMe v5 MyHeritage v2 LivingDNA Nebula 30x* Dante Labs*

Good

500K–700K SNPs

23andMe v4 AncestryDNA v1 FamilyTreeDNA MyHeritage v1 Gene by Gene 24Genetics

Standard

<500K SNPs

23andMe v3 National Geographic WeGene Genera TellMeGen Adntro

* WGS providers require conversion to RAW format first. Use our WGS to RAW service.

Limitations

Important Technical Limitations

Imputation is powerful but not perfect. Here's what you should know.

Expected Error Rates

Imputation accuracy varies by allele frequency. Common variants (MAF > 5%) are imputed with ~98% accuracy; rare variants (MAF < 1%) have higher error rates.

Autosomes Only

We impute chromosomes 1–22 and X. Y-DNA and mtDNA are not imputed as they require specialised reference panels and analysis pipelines.

HG19 and HG38 Both Supported

Your input file build (HG19 or HG38) is auto-detected. Output files are delivered in both GRCh37 (hg19) and GRCh38 (hg38), so no manual liftover is required.

Not for Health/Medical Use

Imputed genotypes are statistical estimates, not direct measurements. They are not suitable for clinical decision-making or health-related interpretations.

Expected Accuracy by Minor Allele Frequency

MAF Range	Variant Class	Typical Accuracy	Notes
> 5%	Common	~98–99%	Excellent imputation quality
1–5%	Low frequency	~95–98%	Good quality, some uncertainty
0.1–1%	Rare	~85–95%	Increased error rate expected
< 0.1%	Very rare	Variable	Often filtered out (low GP)

Pricing

Simple Pricing

One order includes both output files. No subscription.

DNA Imputation Service

€22.50

One-time payment — No subscription

Full ~30M SNP file (all variants retained)
High-confidence ~2.5M file (GP-filtered)
Output in both HG19 (GRCh37) and HG38 (GRCh38)
1000 Genomes Phase 3 reference panel
~24 hour average turnaround
Secure processing & private download
Email support for questions
Files deleted after 15 days

Order Now

Legal

Legal & Disclaimer

Please review before ordering

Intended Use

This service is intended solely for ancestry research, genealogy, and academic purposes. It is not designed, intended, or suitable for health-related uses or medical decision-making.

Key Terms

Statistical Nature: Imputed genotypes are probabilistic estimates, not direct measurements. Error rates of 1–5% are expected depending on variant frequency.
No Health Claims: We explicitly disclaim responsibility for any health interpretations made from imputed data.
Third-Party Compatibility: Compatibility with external platforms is provided as-is and not guaranteed.
No Warranty: This service is provided "as is" without warranties of any kind.
Data Handling: Your files are processed securely and deleted 15 days after delivery.