Imputed MK Test
MKado implements the imputed MK test from Murga-Moreno et al. (2022), which corrects for slightly deleterious mutations by imputing their count from the synonymous frequency spectrum rather than discarding low-frequency polymorphisms.
Background
The standard MK test is biased by slightly deleterious mutations that inflate nonsynonymous polymorphism (Pn) without contributing to divergence (Dn). Several approaches exist to address this:
Asymptotic MK test: Fits a curve across the full frequency spectrum and extrapolates to x=1
Standard MK with frequency filtering: Discards all low-frequency polymorphisms below a cutoff (e.g.,
--min-freq,--no-singletons)
The imputed MK test takes a different approach: instead of discarding low-frequency data, it estimates how many low-frequency nonsynonymous polymorphisms are slightly deleterious, using the synonymous frequency spectrum as a neutral reference.
The Key Insight
Under neutrality, nonsynonymous and synonymous polymorphisms should have the same ratio of low-frequency to high-frequency variants. An excess of low-frequency nonsynonymous polymorphisms (relative to synonymous) indicates segregating slightly deleterious mutations.
By imputing this excess, the test:
Retains more data than methods that discard all low-frequency polymorphisms
Increases statistical power at the gene level
Decomposes the distribution of fitness effects (DFE) into interpretable fractions
The Imputation Formula
Polymorphisms are split at a derived allele frequency (DAF) cutoff (default 15%):
Count low-frequency (DAF <= cutoff) and high-frequency (DAF > cutoff) variants separately for nonsynonymous (Pn) and synonymous (Ps) classes
Compute the neutral ratio from synonymous polymorphisms:
\[r = \frac{P_{s,low}}{P_{s,high}}\]Impute the number of weakly deleterious nonsynonymous polymorphisms:
\[P_{wd} = P_{n,low} - P_{n,high} \times r\]This is clamped to >= 0.
Compute neutral nonsynonymous polymorphisms:
\[P_{n,neutral} = P_n - P_{wd}\]Calculate corrected alpha:
\[\alpha = 1 - \frac{P_{n,neutral}}{P_s} \times \frac{D_s}{D_n}\]Significance is assessed with Fisher’s exact test on the corrected 2x2 table.
DFE Fractions
When the number of synonymous (m0) and nonsynonymous (mi) sites are provided, the test decomposes the DFE into four fractions:
alpha (a): Fraction of adaptive substitutions
f: Fraction of effectively neutral nonsynonymous mutations
b: Fraction of weakly deleterious mutations
d: Fraction of strongly deleterious mutations (d = 1 - f - b)
These fractions sum to 1 and describe the full distribution of fitness effects for new nonsynonymous mutations.
Usage
Single Gene Analysis
# Basic imputed MK test (default 15% DAF cutoff)
mkado test alignment.fa -i ingroup -o outgroup --imputed
# With custom DAF cutoff (10%)
mkado test alignment.fa -i ingroup -o outgroup --imputed --min-freq 0.10
# Separate ingroup/outgroup files
mkado test ingroup.fa outgroup.fa --imputed
Note
The --min-freq option is reused as the DAF cutoff when --imputed is set. If --min-freq is not specified, the default cutoff of 0.15 (15%) is used.
Batch Analysis (Aggregated)
For multi-gene analyses, pooling data across genes increases power:
# Pool polymorphisms and divergence across all genes
mkado batch alignments/ -i ingroup -o outgroup --imputed
Batch Analysis (Per-Gene)
# Run imputed test separately for each gene
mkado batch alignments/ -i ingroup -o outgroup --imputed --per-gene
Bootstrap Confidence Intervals
When --bootstrap N is set (with N > 0), the imputed test runs an
additional case-resampling bootstrap to produce a 95% CI on alpha. Each
replicate resamples the polymorphism list with replacement and re-runs
the imputed MK algorithm; the 2.5/97.5 percentiles of the resulting
alpha distribution are reported as alpha_CI_low / alpha_CI_high.
The omega decomposition CIs are derived from the alpha CI by analytical
scaling (omega_a_ci = alpha_ci * omega), mirroring the asymptotic
test.
# Imputed test with 500-replicate bootstrap
mkado test alignment.fa -i ingroup -o outgroup --imputed --bootstrap 500
# Aggregated imputed batch with bootstrap
mkado batch alignments/ -i ingroup -o outgroup --imputed --bootstrap 200
The legacy default (--bootstrap 100) computes the bootstrap; pass
--bootstrap 0 to disable CI computation entirely.
Output
The imputed test reports:
alpha: Corrected proportion of adaptive substitutions
p_value: Fisher’s exact test on the corrected contingency table
Pwd: Imputed count of weakly deleterious nonsynonymous polymorphisms
Pn_neutral: Nonsynonymous polymorphisms after removing imputed slightly deleterious mutations
Dn, Ds, Pn, Ps: Raw counts
cutoff: The DAF cutoff used
Ln, Ls: Nei-Gojobori non-synonymous and synonymous site totals
omega, omega_a, omega_na: dN/dS and the adaptive/non-adaptive decomposition (
omega_a = alpha * omega) following Gossmann, Keightley & Eyre-Walker 2012, applied to MK counts by Coronado-Zamora et al. 2019. See Omega Decomposition (ω, ω_a, ω_na) for the decomposition formula.alpha_CI_low, alpha_CI_high: 95% bootstrap CI on alpha (when
--bootstrap > 0)omega_a_CI_low/high, omega_na_CI_low/high: 95% CIs on the omega decomposition, derived from the alpha CI scaled by omega
ci_method:
"bootstrap"when CI was computed,Noneotherwise
Example output (pretty format):
Imputed MK Test Results:
Divergence: Dn=6, Ds=8
Polymorphism: Pn=11, Ps=17
DAF cutoff: 0.15
Imputed Pwd: 4.82
Pn (neutral): 6.18
Alpha: 0.5154
p-value: 0.0891
Alpha 95% CI [bootstrap]: (0.3210, 0.7050)
Sites: Ln=576.00, Ls=195.00
omega: 0.2539 (omega_a=0.1308, omega_na=0.1230)
omega_a 95% CI: (0.0815, 0.1790)
omega_na 95% CI: (0.0749, 0.1724)
Comparison with Other Methods
Method |
Approach |
Best used when |
|---|---|---|
Standard MK |
No correction |
Quick assessment; comparing specific genes |
Asymptotic MK |
Curve fitting across frequency spectrum |
Genome-wide analyses with many polymorphisms |
Standard MK with frequency filtering |
Discards low-frequency polymorphisms (e.g., |
Simple correction, but loses data |
Imputed MK |
Imputes slightly deleterious count from synonymous spectrum |
Gene-level analyses; maximizing statistical power |
The imputed test is particularly useful when:
You want gene-level significance (p-values) rather than only genome-wide estimates
You want to retain as much data as possible
You want to decompose the DFE into interpretable fractions
When to Use the Imputed MK Test
Use imputed MK when:
Analyzing individual genes or small gene sets where power matters
You want per-gene corrected alpha with p-values
You want DFE decomposition (with site count information)
The asymptotic test lacks sufficient data for curve fitting
Consider alternatives when:
You have genome-wide data with thousands of polymorphisms (asymptotic MK may be more robust)
You want frequency-bin visualization of alpha(x) (use asymptotic MK with
--plot-asymptotic)You need unbiased multi-gene weighting without frequency modeling (use alpha_TG)
Interpreting Results
α > 0: Evidence for positive selection (proportion of adaptive substitutions)
α ≈ 0: Consistent with neutral evolution
α < 0: Excess of nonsynonymous polymorphism relative to divergence, suggesting segregating weakly deleterious mutations remain even after correction
Pwd > 0: Low-frequency nonsynonymous excess detected and corrected
Pwd = 0: No evidence for segregating slightly deleterious mutations (or all polymorphisms are high-frequency)
Choosing the DAF Cutoff
The default cutoff of 15% follows the recommendation of Murga-Moreno et al. (2022). The cutoff defines the boundary between “low-frequency” and “high-frequency” polymorphisms:
Lower cutoff (e.g., 5%): Only the rarest variants are considered low-frequency. More conservative imputation.
Higher cutoff (e.g., 25%): More variants classified as low-frequency. More aggressive imputation.
The optimal cutoff depends on the effective population size and strength of selection in your system.
Reference
Murga-Moreno J, Coronado-Zamora M, Casillas S, Barbadilla A (2022) impMKT: the imputed McDonald and Kreitman test, a straightforward correction that significantly increases the evidence of positive selection of the McDonald and Kreitman test at the gene level. G3: Genes, Genomes, Genetics 12(10):jkac206. https://doi.org/10.1093/g3journal/jkac206
Gossmann TI, Keightley PD, Eyre-Walker A (2012) The effect of variation in the effective population size on the rate of adaptive molecular evolution in eukaryotes. Genome Biology and Evolution 4(5):658-667. https://doi.org/10.1093/gbe/evs027
Coronado-Zamora M, Salvador-Martínez I, Castellano D, Barbadilla A, Salazar-Ciudad I (2019) Adaptation and conservation throughout the Drosophila melanogaster life-cycle. Genome Biology and Evolution 11(5):1463-1482. https://doi.org/10.1093/gbe/evz046