G25 Studio: Algorithms, Distance Methods, and Oracle Settings
This document explains the configuration options available in G25 Studio in plain language. It covers the Monte Carlo algorithms, the Oracle (Ancestors Predictions) system, distance methods, and distance weights. Read this after you have completed at least one basic run. For first-time setup, start with G25 Studio: Beginner Manual.
Table of contents
- Monte Carlo algorithms
- Monte Carlo vs Oracle: the core difference
- Oracle (Ancestors Predictions)
- Distance methods
- Distance weight
- Choosing settings: quick reference
- Glossary additions
Monte Carlo algorithms
The algorithm decides how G25 Studio searches for the best mixture of reference populations that matches your coordinates. All algorithms solve the same problem; they differ in how they explore the solution space and how precise the result is.
Montecarlo V1 (Free)
The default, beginner-friendly algorithm. It uses iterative random sampling: at each step it tries a random mixture, keeps the combinations that reduce the fit error, and repeats until the result stabilizes.
When to use: First runs, quick checks, or when you do not need the highest possible precision. Fast and reliable for most analyses.
Montecarlo V2 (PRO)
An improved version of V1 with a refined search strategy. It handles complex or mixed ancestry backgrounds more accurately and is less likely to get stuck in a local minimum.
When to use: When V1 results feel inconsistent across runs, or when your ancestry background is unusually diverse.
Montecarlo V3 (PRO)
The most advanced Monte Carlo variant. It uses a more thorough optimization pass and unlocks the full set of distance methods (see Distance methods below).
When to use: Research-quality work, publication-level precision, or when you want to try specialized distance metrics.
SLSQP (PRO)
Sequential Least Squares Programming is a mathematical optimization technique, not random sampling. It finds the mathematically optimal solution to the mixture problem.
When to use: When you need deterministic (same result every run) and mathematically exact results. Less exploratory than Monte Carlo, but more consistent.
BAT - Bat Algorithm (PRO, Metaheuristic)
A metaheuristic optimizer inspired by bat echolocation. Because it uses stochastic exploration, it may return slightly different results on each run.
When to use: To explore alternative ancestry solutions that standard Monte Carlo might miss. Useful for cross-checking.
PSO - Particle Swarm Optimization (PRO, Metaheuristic)
Another metaheuristic that simulates a swarm of particles searching the solution space. Like BAT, results may vary between runs.
When to use: Same as BAT. Run it alongside other methods to compare whether a different optimizer finds a meaningfully different solution.
Monte Carlo vs Oracle: the core difference
This is the most common point of confusion for new users.
| Monte Carlo | Oracle (Ancestors Predictions) | |
|---|---|---|
| Proportions | Flexible, any combination | Fixed equal shares (50/50, 33/33/33, or 25/25/25/25) |
| Question answered | "What mixture best explains my coordinates?" | "If my ancestry were split evenly across N populations, which would those be?" |
| Output | One admixture result with optimized percentages | A ranked list of combinations sorted by genetic distance |
| Generation model | None implied | Loosely models parents, grandparents, great-grandparents |
| Good for | Main ancestry read | "What if" scenarios and multi-generation exploration |
Use them together: Monte Carlo gives your primary result. Oracle lets you cross-check it by asking structured questions about how that result might look in a simplified family-tree model.
Oracle (Ancestors Predictions)
Oracle is the section labeled Ancestors Predictions in the G25 Studio configuration panel. When enabled, it runs alongside the main Monte Carlo result and produces up to five categories of output, each testing a different fixed-proportion scenario.
Oracle ranks every combination by genetic distance from your coordinates to the average of the chosen populations. The lower the distance, the better the match for that scenario.
One Population
Oracle tests each reference population on its own and ranks them by how close their average coordinates are to yours.
What it tells you: Which single population in the calculator is the closest match to your full coordinate profile.
Practical use: A good sanity check. If the top single-population result is very different from your Monte Carlo result, it usually means your ancestry is genuinely mixed and no single population captures it well.
Generation analogy: Not tied to a specific generation. Think of it as a "which population do I most resemble overall?" question.
Two Ways (50/50)
Oracle tests every pair of populations at an equal 50% each and ranks the pairs by distance.
What it tells you: Which two-population 50/50 split best approximates your coordinates.
Generation analogy: Loosely models a scenario where each of your two parents came from a different population and you inherited roughly equal amounts from both.
Practical use: Useful when you suspect two dominant ancestral streams. The top result is not a genealogical claim; it is the two-population blend that minimizes the reconstruction error for this fixed-proportion model.
Three Ways (33/33/33)
Oracle tests every combination of three populations at an equal 33.3% each and ranks them by distance.
What it tells you: Which three-way equal split best represents your coordinates.
Generation analogy: Loosely models a two-grandparent-per-side scenario where three streams contributed roughly equally, for example one grandparent from one population and two from another.
Practical use: More granular than Two Ways. Compare the distance scores: if Three Ways gives a substantially lower distance than Two Ways, your mixture genuinely benefits from the third population.
Four Ways (25/25/25/25)
Oracle tests every combination of four populations at an equal 25% each and ranks them by distance.
What it tells you: Which four-population equal split best represents your coordinates.
Generation analogy: Loosely models four distinct grandparental lines, one from each population.
Practical use: The most complex fixed-proportion model. Use it to explore whether a four-source explanation fits better than two or three. Compare fit scores across all levels: the level with the biggest drop in distance is often the most informative.
Mixed Mode
Mixed Mode is an alternative Oracle output style (enabled via the Alternative Oracle toggle in the configuration). Instead of requiring equal proportions across all populations in a scenario, it calculates combinations where proportions are not constrained to be equal, but the total still sums to 100%.
The app label for this option is Alternative Oracle or Top 10 Mixed Proportions depending on context.
What it tells you: The best-fitting multi-population combinations when you allow the proportions to vary, rather than locking them at equal shares.
When to use it: When you suspect your ancestry is genuinely unequal (for example 75% one stream, 25% another) and standard equal-share Oracle scenarios return poor distances. Mixed Mode relaxes the constraint so it can find that unequal split.
Relationship to the main Monte Carlo result: Mixed Mode Oracle output is closer in spirit to Monte Carlo (flexible proportions) but still ranks combinations by distance rather than optimizing a mixture formula. It bridges the gap between the fixed Oracle scenarios and the fully flexible Monte Carlo admixture.
Distance methods
A distance method (also called a distance metric) is the mathematical rule that measures how far your coordinates are from a reference population's average coordinates. A smaller distance means more similar.
The method you choose affects which populations are ranked as "close" to you, and therefore affects both the Monte Carlo admixture percentages and the Oracle ranking.
Available in all algorithms
Euclidean (Standard) - Recommended default
Measures the straight-line distance in 25-dimensional space. This is the most widely used metric in G25 workflows and the recommended starting point.
Formal idea: square root of the sum of squared differences across all 25 dimensions.
When to use: Almost always. Start here and only change if you have a specific reason.
Manhattan
Measures distance as the sum of absolute differences across all 25 dimensions (rather than the square root of squared differences as in Euclidean). It is more sensitive to large differences in individual dimensions.
When to use: When you want to emphasize individual dimensional differences rather than the overall geometric distance. Can surface different nearest-population orderings than Euclidean in edge cases.
Chebyshev
Uses the maximum single-dimension difference as the distance. Ignores all other dimensions except the one where you differ most from the reference.
When to use: Specialized use only. Useful when a single genetic dimension is diagnostically important and you want to rank populations by how well they match you on that worst-case dimension.
Available for Oracle only
Angular
Measures the angle between your coordinate vector and the reference population vector, rather than the absolute distance in space. Two samples can be at very different scales but point in the same direction; Angular distance would call them similar while Euclidean would not.
When to use: Oracle-only. Useful when you want to match based on genetic direction (the relative pattern of which dimensions are high vs low) rather than the absolute magnitude of coordinates.
Advanced methods (PRO, Montecarlo V3 only)
These metrics are unlocked when using Montecarlo V3 and a PRO license. Each captures a different mathematical view of similarity.
| Method | Brief description |
|---|---|
| Cosine Similarity | Similar idea to Angular: measures the angle between coordinate vectors. Useful for directional similarity regardless of scale. |
| Mahalanobis | Accounts for correlations between dimensions and scales by the variance. Treats dimensions differently based on how much they vary across the reference set. |
| Bray Curtis | Originally from ecology; measures the proportion of non-shared values. Can emphasize compositional differences. |
| Canberra | Normalizes each dimension before comparing. More weight given to small absolute values; can be sensitive to near-zero differences. |
| Minkowski 3 / Minkowski 4 | Generalizations of Euclidean (p=2) and Manhattan (p=1) for higher p values. Higher p weights the largest differences more heavily. |
| Normalized Euclidean | Euclidean distance after normalizing each dimension by its standard deviation. Reduces the influence of high-variance dimensions. |
| Nested Euclidean | A hierarchical variant that groups dimensions before computing distance. |
| Weighted Euclidean | Applies per-dimension weights before computing Euclidean distance. Lets some dimensions count more than others. |
| Wave Hedges | A non-standard metric that normalizes by the maximum of the two values in each dimension. |
| Lorentzian | Uses a logarithmic difference formula. More robust to large individual-dimension differences than Euclidean. |
| Squared | Euclidean distance squared (without the square root step). Amplifies the influence of large differences. |
Practical advice for advanced methods: Run Euclidean first as your baseline. Then try one or two alternatives and compare whether the ranked populations change substantially. Consistent results across multiple metrics indicate a robust finding; large changes indicate sensitivity to the metric choice.
Distance weight
The distance weight (labeled Weight in the UI) is a multiplier applied to the genetic distances used during the optimization. It controls how strictly the algorithm enforces closeness to reference populations.
The weight is selected from a fixed set of values:
| Weight value | Effect |
|---|---|
| No extra distance (0x) | No additional weighting. The algorithm uses raw distances as calculated. Default for Oracle. |
| 0.25x | Gentle weighting. Broadens the search: populations that are somewhat further away can still contribute. Useful when your ancestry is spread across many references. |
| 0.50x | Moderate broadening. |
| 0.75x | Moderate broadening, closer to standard. |
| 1.00x | Standard weighting. |
| 1.25x | Slight narrowing. |
| 1.50x | Moderate narrowing. |
| 1.75x | Strong narrowing. |
| 2.00x | Strongest narrowing. Only populations very close to your coordinates contribute significantly. |
In plain language:
- Lower weight (0.25x): casts a wider net. More distant populations are allowed to contribute, which can reveal subtle ancestry threads but may produce more diffuse results.
- Higher weight (2.00x): focuses tightly on the very closest populations. Results are dominated by your nearest genetic neighbors; minor distant contributions are suppressed.
- No extra distance: effectively the same as "no additional sharpening or broadening" and is the typical default for Oracle.
Default for Monte Carlo: 0.25x (applies a mild broadening that generally gives well-balanced results across diverse ancestry backgrounds).
Default for Oracle: No extra distance (Oracle already measures distance ranking directly; additional weighting is optional).
Practical advice: If your results look too fragmented (many tiny percentages from unrelated populations), increase the weight toward 1.00 or higher to focus the result. If a known major ancestry component seems suppressed, decrease toward 0.25x.
Choosing settings: quick reference
I want to start simply
- Algorithm: Montecarlo V1
- Distance method: Euclidean
- Distance weight: 0.25x (default)
- Oracle: off
I want to explore multi-generation scenarios
- Algorithm: Montecarlo V2 or V3
- Distance method: Euclidean
- Oracle: on, start with 10 predictions
- Review One Population, Two Ways, Three Ways, Four Ways in order
- Enable Mixed Mode if equal-share scenarios feel too rigid for your background
I want the most precise admixture result
- Algorithm: Montecarlo V3 or SLSQP
- Distance method: Euclidean first, then try Cosine Similarity or Mahalanobis for comparison
- Distance weight: experiment between 0.25x and 1.00x
- Oracle: on with 50-100 predictions for comprehensive scenario coverage
My results look too scattered (many small percentages)
Increase distance weight to 1.00x or 1.25x. Consider reducing maximum populations to 5 and enabling Group Populations.
My main ancestry component seems missing or small
Decrease distance weight toward 0.25x. Also check that you are using a calculator that includes good references for that ancestry region.
Glossary additions
The following terms complement the glossary in the Beginner Manual.
- Angular distance Measures the angle between two coordinate vectors. Focuses on genetic direction rather than absolute proximity. Available in Oracle only.
- Alternative Oracle (Mixed Mode) An Oracle output mode that removes the equal-proportion constraint, allowing the algorithm to find the best-fitting multi-population combination at any proportions.
- Bat Algorithm (BAT) A metaheuristic optimization algorithm inspired by bat echolocation. Stochastic; results may vary between runs. Used to explore alternative ancestry solutions.
- Canberra distance A distance metric that normalizes each dimension before summing differences. Sensitive to near-zero values.
- Chebyshev distance Distance metric defined as the maximum difference across all dimensions (ignoring all but the worst-case dimension).
- Cosine Similarity Measures the angle between coordinate vectors. Similar in concept to Angular distance but available as a Montecarlo V3 metric.
- Distance method The mathematical formula used to calculate how far your coordinates are from a reference population. Determines what "close" means numerically.
- Distance weight A multiplier that sharpens or broadens how strongly closeness is enforced during optimization. Lower values cast a wider net; higher values focus on nearest matches.
- Four Ways (25/25/25/25) Oracle scenario that tests all four-population equal-share combinations. Loosely analogous to a great-grandparent model.
- Lorentzian distance A logarithmic distance metric. More robust than Euclidean to large individual-dimension outliers.
- Mahalanobis distance A distance metric that accounts for correlations between dimensions and normalizes by variance. Available in Montecarlo V3 PRO.
- Manhattan distance Sum of absolute differences across all dimensions. More sensitive to individual-dimension outliers than Euclidean.
- Mixed Mode See Alternative Oracle.
- Minkowski distance A family of distance metrics parameterized by p. Euclidean is Minkowski with p=2; Manhattan is p=1. Minkowski 3 and 4 increase p further.
- One Population Oracle scenario that tests each reference population individually and ranks by distance.
- Particle Swarm Optimization (PSO) A metaheuristic algorithm that simulates a swarm searching the solution space. Stochastic; used to discover alternative ancestry models.
- SLSQP Sequential Least Squares Programming. A deterministic mathematical optimizer that finds the mathematically exact mixture solution. Produces the same result on every run.
- Three Ways (33/33/33) Oracle scenario that tests all three-population equal-share combinations. Loosely analogous to a three-stream grandparent model.
- Two Ways (50/50) Oracle scenario that tests all two-population 50/50 combinations. Loosely analogous to a two-parent model.
- Weighted Euclidean A Euclidean variant where each dimension is multiplied by a weight before computing distance. Allows some dimensions to count more.
Related documentation
- G25 Studio: Beginner Manual - Start here if you are new
- G25 Population Library - Understanding the reference populations used in calculators
Disclaimer
G25 Studio is intended for education and research-style exploration. It is not a medical product. Statistical similarity to a reference population is not proof of descent from that population or of any particular ethnic identity. Interpret results with appropriate caution.