Python API Reference
MKado provides a Python API for programmatic access to MK test functionality.
Quick Start
from mkado import mk_test, asymptotic_mk_test, SequenceSet
# Run standard MK test
result = mk_test("ingroup.fa", "outgroup.fa")
print(f"Alpha: {result.alpha}")
print(f"P-value: {result.p_value}")
# Run asymptotic MK test
result = asymptotic_mk_test("ingroup.fa", "outgroup.fa")
print(f"Asymptotic Alpha: {result.alpha_asymptotic}")
Core Classes
SequenceSet
Sequence and SequenceSet classes for handling aligned sequences.
- class mkado.core.sequences.Sequence(name, sequence)[source]
Bases:
objectRepresents a single biological sequence.
- class mkado.core.sequences.SequenceSet(sequences=<factory>, reading_frame=1, genetic_code=<factory>)[source]
Bases:
objectRepresents a set of aligned sequences (e.g., from one species).
- Parameters:
reading_frame (int)
genetic_code (GeneticCode)
- genetic_code: GeneticCode
- classmethod from_fasta(path, reading_frame=1, genetic_code=None)[source]
Load sequences from a FASTA file.
- Parameters:
reading_frame (
int) – Reading frame (1, 2, or 3)genetic_code (
GeneticCode|None) – Genetic code for translation
- Return type:
- Returns:
SequenceSet containing all sequences from the file
- codon_set_clean(codon_index)[source]
Get unique codons at a position, excluding those with N or gaps.
Result is cached per
codon_indexfor the lifetime of theSequenceSet. Callers must not mutate the returned set.
- amino_set_clean(codon_index)[source]
Get unique amino acids at a position, excluding ambiguous codons.
- is_polymorphic(codon_index)[source]
Check if a codon position is polymorphic (has >1 unique codon).
- codon_array()[source]
Get a 2D array of codons (n_sequences x n_codons).
- Return type:
- Returns:
NumPy array of codon strings
- derived_allele_frequency(codon_index, ancestral_codon)[source]
Calculate the derived allele frequency at a codon position.
- __init__(sequences=<factory>, reading_frame=1, genetic_code=<factory>)
- Parameters:
reading_frame (int)
genetic_code (GeneticCode)
- Return type:
None
GeneticCode
Genetic code and codon utilities.
- class mkado.core.codons.GeneticCode(code=None, *, table_id=None)[source]
Bases:
objectRepresents a genetic code for translation and codon path computation.
- translate_sequence(sequence, reading_frame=1)[source]
Translate a nucleotide sequence to amino acids.
- count_synonymous_sites(codon)[source]
Count the number of synonymous sites in a codon.
Uses Nei-Gojobori method: for each site, calculate fraction of possible changes that are synonymous.
AlignedPair
Alignment comparison utilities for MK test.
- class mkado.core.alignment.AlignedPair(ingroup, outgroup, genetic_code=<factory>)[source]
Bases:
objectRepresents a pair of aligned sequence sets (ingroup and outgroup).
- Parameters:
ingroup (SequenceSet)
outgroup (SequenceSet)
genetic_code (GeneticCode)
- ingroup: SequenceSet
- outgroup: SequenceSet
- genetic_code: GeneticCode
- combined_codon_set_clean(codon_index)[source]
Get all clean unique codons at a position from both groups.
- is_fixed_between(codon_index)[source]
Check if a codon position is fixed differently between groups.
A position is a fixed difference if: - The ingroup has a single codon (or all share the same) - The outgroup has a single codon (or all share the same) - The ingroup and outgroup codons are different
- is_polymorphic_within_ingroup(codon_index)[source]
Check if a codon position is polymorphic within the ingroup only.
- polymorphic_sites_pooled()[source]
Get all codon indices polymorphic in either population (union).
This follows the libsequence convention of pooling polymorphisms from both populations.
- count_total_sites()[source]
Aggregate total non-synonymous (Ln) and synonymous (Ls) sites.
For each codon position with at least one clean codon in each group, averages the Nei-Gojobori per-codon synonymous site counts within each group, then averages the two group means. Sums across positions and returns
(Ln, Ls)withLn = 3 * n_codons - Ls.
- classify_fixed_difference(codon_index)[source]
Classify a fixed difference as synonymous/non-synonymous.
Uses the shortest mutational path to count the minimum number of synonymous and non-synonymous changes.
- classify_polymorphism_pooled(codon_index)[source]
Classify a polymorphism using codons from both populations.
This follows the libsequence convention of using all unique codons from both ingroup and outgroup when classifying polymorphisms.
- __init__(ingroup, outgroup, genetic_code=<factory>)
- Parameters:
ingroup (SequenceSet)
outgroup (SequenceSet)
genetic_code (GeneticCode)
- Return type:
None
- class mkado.core.alignment.PolarizedAlignedPair(ingroup, outgroup, genetic_code=<factory>, outgroup2=None)[source]
Bases:
AlignedPairAligned pair with a second outgroup for polarization.
- Parameters:
ingroup (SequenceSet)
outgroup (SequenceSet)
genetic_code (GeneticCode)
outgroup2 (SequenceSet | None)
- outgroup2: SequenceSet | None = None
- polarize_fixed_difference(codon_index)[source]
Polarize a fixed difference to determine which lineage changed.
Uses the second outgroup to determine the ancestral state.
- polarize_ingroup_polymorphism(codon_index)[source]
Polarize and classify an ingroup polymorphism.
Uses outgroup2 to determine if the polymorphism arose on the ingroup lineage. Following the convention from mkTest.rb:
If outgroup2 has ALL ingroup alleles → ancestral polymorphism, cannot attribute to ingroup lineage
If outgroup2 shares allele with outgroup1 → the shared allele is ancestral, ingroup has derived allele(s) → ingroup polymorphism
If outgroup2 shares some (but not all) alleles with ingroup → shared allele is ancestral → ingroup polymorphism
Otherwise → cannot polarize
- __init__(ingroup, outgroup, genetic_code=<factory>, outgroup2=None)
- Parameters:
ingroup (SequenceSet)
outgroup (SequenceSet)
genetic_code (GeneticCode)
outgroup2 (SequenceSet | None)
- Return type:
None
Analysis Functions
Standard MK Test
Standard McDonald-Kreitman test implementation.
- class mkado.analysis.mk_test.MKResult(dn, ds, pn, ps, p_value, ni, alpha, dos, ln=None, ls=None, omega=None)[source]
Bases:
objectResults from a McDonald-Kreitman test.
Site counts follow the standard MK definitions:
Dn / Ds: number of fixed differences between ingroup and outgroup that change (Dn) or do not change (Ds) the encoded amino acid under the supplied genetic code.
Pn / Ps: number of ingroup polymorphic sites whose derived allele causes a non-synonymous (Pn) or synonymous (Ps) substitution. Sites are filtered by
min_frequency(default0.0); the--no-singletonsCLI flag sets this threshold to1/n. Whenpool_polymorphismsis true, polymorphic sites in the outgroup are pooled into the ingroup counts.
- Parameters:
- __init__(dn, ds, pn, ps, p_value, ni, alpha, dos, ln=None, ls=None, omega=None)
- mkado.analysis.mk_test.mk_test(ingroup, outgroup, reading_frame=1, genetic_code=None, pool_polymorphisms=False, min_frequency=0.0)[source]
Perform the standard McDonald-Kreitman test.
The MK test compares the ratio of non-synonymous to synonymous changes within species (polymorphism) to the ratio between species (divergence).
Under neutrality, these ratios should be equal. Deviations suggest selection: - Excess divergence relative to polymorphism suggests positive selection - Excess polymorphism relative to divergence suggests segregating weakly deleterious polymorphisms
- Parameters:
ingroup (
SequenceSet|str|Path) – SequenceSet or path to FASTA file for ingroup sequencesoutgroup (
SequenceSet|str|Path) – SequenceSet or path to FASTA file for outgroup sequencesreading_frame (
int) – Reading frame (1, 2, or 3)genetic_code (
GeneticCode|None) – Genetic code for translation (uses standard if None)pool_polymorphisms (
bool) – If True, count polymorphisms from both ingroup and outgroup (libsequence convention). If False (default), count only ingroup polymorphisms (DnaSP/original MK convention).min_frequency (
float) – Minimum derived allele frequency for a polymorphism to be counted (default 0.0, i.e., include all polymorphisms). Useful for filtering out rare variants that may be slightly deleterious.
- Return type:
- Returns:
MKResult containing test statistics
- mkado.analysis.mk_test.mk_test_from_counts(dn, ds, pn, ps, ln=None, ls=None)[source]
Create MK test results from pre-computed counts.
Useful for testing or when counts are already available.
- Parameters:
dn (
int) – Non-synonymous divergenceds (
int) – Synonymous divergencepn (
int) – Non-synonymous polymorphismsps (
int) – Synonymous polymorphismsln (
float|None) – Total non-synonymous sites (optional, enables omega computation)ls (
float|None) – Total synonymous sites (optional, enables omega computation)
- Return type:
- Returns:
MKResult with calculated statistics
Asymptotic MK Test
Asymptotic McDonald-Kreitman test implementation.
Based on Messer & Petrov (2013) PNAS.
- class mkado.analysis.asymptotic.PolymorphismData(polymorphisms=<factory>, dn=0, ds=0, gene_id='', ln=None, ls=None)[source]
Bases:
objectIntermediate data from a single gene for aggregation.
- Parameters:
- mkado.analysis.asymptotic.sum_site_totals(gene_data)[source]
Sum per-gene Nei-Gojobori (Ln, Ls); return
(None, None)if any gene lacks them.
- class mkado.analysis.asymptotic.AggregatedSFS(bin_edges=<factory>, pn_counts=<factory>, ps_counts=<factory>, dn_total=0, ds_total=0, num_genes=0, ln_total=None, ls_total=None, pn_total=0, ps_total=0, sfs_mode='at')[source]
Bases:
objectAggregated site frequency spectrum across genes.
- Parameters:
- __init__(bin_edges=<factory>, pn_counts=<factory>, ps_counts=<factory>, dn_total=0, ds_total=0, num_genes=0, ln_total=None, ls_total=None, pn_total=0, ps_total=0, sfs_mode='at')
- class mkado.analysis.asymptotic.AsymptoticMKResult(frequency_bins=<factory>, alpha_by_freq=<factory>, alpha_x_values=<factory>, alpha_asymptotic=0.0, ci_low=0.0, ci_high=0.0, fit_a=0.0, fit_b=0.0, fit_c=0.0, dn=0, ds=0, num_genes=0, model_type='exponential', pn_total=0, ps_total=0, ln=None, ls=None, omega=None, omega_a=None, omega_na=None, ci_method='monte-carlo', sfs_mode='at')[source]
Bases:
objectResults from an asymptotic McDonald-Kreitman test.
Pn_totalandPs_totalfollow the standard MK definition: ingroup polymorphic sites whose derived allele causes a non-synonymous (Pn) or synonymous (Ps) substitution. The asymptotic test uses the full SFS with nomin_frequencyfilter; frequency cutoffs apply only to thealpha(x)curve fit, not to the SFS counts themselves.- Parameters:
alpha_asymptotic (float)
ci_low (float)
ci_high (float)
fit_a (float)
fit_b (float)
fit_c (float)
dn (int)
ds (int)
num_genes (int)
model_type (str)
pn_total (int)
ps_total (int)
ln (float | None)
ls (float | None)
omega (float | None)
omega_a (float | None)
omega_na (float | None)
ci_method (str)
sfs_mode (str)
- ci_method: str = 'monte-carlo'
"monte-carlo"samples parameters from the curve-fit covariance matrix;"bootstrap"resamples polymorphisms with replacement and refits per replicate.- Type:
Method used to compute
ci_low/ci_high
- sfs_mode: str = 'at'
"at"(Messer & Petrov 2013, count per bin) or"above"(Uricchio et al. 2019, inclusive right-tail cumulative count).- Type:
SFS construction mode
- __init__(frequency_bins=<factory>, alpha_by_freq=<factory>, alpha_x_values=<factory>, alpha_asymptotic=0.0, ci_low=0.0, ci_high=0.0, fit_a=0.0, fit_b=0.0, fit_c=0.0, dn=0, ds=0, num_genes=0, model_type='exponential', pn_total=0, ps_total=0, ln=None, ls=None, omega=None, omega_a=None, omega_na=None, ci_method='monte-carlo', sfs_mode='at')
- Parameters:
alpha_asymptotic (float)
ci_low (float)
ci_high (float)
fit_a (float)
fit_b (float)
fit_c (float)
dn (int)
ds (int)
num_genes (int)
model_type (str)
pn_total (int)
ps_total (int)
ln (float | None)
ls (float | None)
omega (float | None)
omega_a (float | None)
omega_na (float | None)
ci_method (str)
sfs_mode (str)
- Return type:
None
- mkado.analysis.asymptotic.extract_polymorphism_data(ingroup, outgroup, reading_frame=1, genetic_code=None, pool_polymorphisms=False, gene_id='', min_frequency=0.0)[source]
Extract polymorphism and divergence data without curve fitting.
This function extracts the raw data needed for asymptotic MK analysis, allowing multiple genes to be processed and aggregated before fitting.
- Parameters:
ingroup (
SequenceSet|str|Path) – SequenceSet or path to FASTA file for ingroup sequencesoutgroup (
SequenceSet|str|Path) – SequenceSet or path to FASTA file for outgroup sequencesreading_frame (
int) – Reading frame (1, 2, or 3)genetic_code (
GeneticCode|None) – Genetic code for translation (uses standard if None)pool_polymorphisms (
bool) – If True, consider sites polymorphic in either population (libsequence convention). Frequencies are still calculated from ingroup only.gene_id (
str) – Identifier for this genemin_frequency (
float) – Minimum derived allele frequency for polymorphisms (polymorphisms below this threshold are excluded)
- Return type:
- Returns:
PolymorphismData containing polymorphisms with frequencies and divergence counts
- mkado.analysis.asymptotic.aggregate_polymorphism_data(gene_data, num_bins=20, sfs_mode='at')[source]
Aggregate polymorphism data from multiple genes into SFS bins.
- Parameters:
gene_data (
list[PolymorphismData]) – List of PolymorphismData from individual genesnum_bins (
int) – Number of frequency binssfs_mode (
str) –"at"(default, Messer & Petrov 2013) returns per-bin counts."above"(Uricchio et al. 2019) returns the inclusive right-tail cumulative SFS — biniholds the count of polymorphisms with frequency>= bin_edges[i].
- Return type:
- Returns:
AggregatedSFS with per-bin Pn/Ps counts and total divergence
- mkado.analysis.asymptotic.asymptotic_mk_test_aggregated(gene_data, num_bins=20, ci_replicates=10000, frequency_cutoffs=(0.1, 0.9), ci_method='monte-carlo', seed=42, sfs_mode='at', workers=1)[source]
Genome-wide asymptotic MK test on aggregated data.
This follows the approach of Messer & Petrov (2013) and the asymptoticMK R package: aggregate polymorphism SFS data and divergence counts from many genes, then fit a single exponential curve to the aggregated data.
- Parameters:
gene_data (
list[PolymorphismData]) – List of PolymorphismData from individual genesnum_bins (
int) – Number of frequency bins (default 20)ci_replicates (
int) – Number of CI replicates. Forci_method="monte-carlo"(default), parameters are sampled from the curve-fit covariance — pass ~10000. Forci_method="bootstrap", each replicate refits the curve so 100-500 is typical.frequency_cutoffs (
tuple[float,float]) – (low, high) frequency range for fitting (default 0.1-0.9)ci_method (
str) –"monte-carlo"(default, parametric MVN sampling from the curve-fit covariance) or"bootstrap"(case-resampling of the pooled polymorphism list with replacement, refit per replicate).seed (
int) – Random seed for reproducibility (default 42).sfs_mode (str)
workers (int)
- Return type:
- Returns:
AsymptoticMKResult with aggregated analysis
- mkado.analysis.asymptotic.asymptotic_mk_test(ingroup, outgroup, reading_frame=1, genetic_code=None, num_bins=10, bootstrap_replicates=100, pool_polymorphisms=False, sfs_mode='at', workers=1)[source]
Perform the asymptotic McDonald-Kreitman test.
This method accounts for slightly deleterious mutations by examining how alpha changes across the frequency spectrum and extrapolating to the asymptotic value at x=1.
Based on Messer & Petrov (2013) PNAS.
- Parameters:
ingroup (
SequenceSet|str|Path) – SequenceSet or path to FASTA file for ingroup sequencesoutgroup (
SequenceSet|str|Path) – SequenceSet or path to FASTA file for outgroup sequencesreading_frame (
int) – Reading frame (1, 2, or 3)genetic_code (
GeneticCode|None) – Genetic code for translation (uses standard if None)num_bins (
int) – Number of frequency bins (default 10)bootstrap_replicates (
int) – Number of bootstrap replicates for CI (default 100)pool_polymorphisms (
bool) – If True, consider sites polymorphic in either population (libsequence convention). Frequencies are still calculated from ingroup only. If False (default), only consider ingroup polymorphisms (DnaSP/original MK convention).sfs_mode (str)
workers (int)
- Return type:
- Returns:
AsymptoticMKResult containing test statistics
Imputed MK Test
Imputed McDonald-Kreitman test implementation.
Based on Murga-Moreno et al. (2022) G3: Genes|Genomes|Genetics. https://doi.org/10.1093/g3journal/jkac206
The impMKT corrects for slightly deleterious mutations by imputing the number of weakly deleterious nonsynonymous polymorphisms segregating at low frequency, rather than discarding all low-frequency variants. This retains more data and increases power to detect positive selection at the gene level.
- class mkado.analysis.imputed.ImputedMKResult(alpha, p_value, pn_neutral, pwd, dn, ds, pn_total, ps_total, d=None, b=None, f=None, cutoff=0.15, ln=None, ls=None, omega=None, omega_a=None, omega_na=None, alpha_ci_low=None, alpha_ci_high=None, ci_method=None)[source]
Bases:
objectResults from an imputed McDonald-Kreitman test.
The imputed test partitions
Pninto a weakly-deleterious fraction (Pwd) — imputed from the synonymous SFS scaled by the observed Pn/Ps ratio at low derived allele frequencies — and a neutral fraction (Pn_neutral = Pn_total - Pwd).alphais computed againstPn_neutralrather than the rawPncount, removing the downward bias caused by slightly deleterious nonsynonymous variants.DnandDsfollow the standard MK definitions.Pn_totalandPs_totalare the unfiltered ingroup polymorphism counts (the imputed test uses the entire SFS, including singletons).- Parameters:
alpha (float | None)
p_value (float)
pn_neutral (float)
pwd (float)
dn (int)
ds (int)
pn_total (int)
ps_total (int)
d (float | None)
b (float | None)
f (float | None)
cutoff (float)
ln (float | None)
ls (float | None)
omega (float | None)
omega_a (float | None)
omega_na (float | None)
alpha_ci_low (float | None)
alpha_ci_high (float | None)
ci_method (str | None)
- cutoff: float = 0.15
Derived allele frequency below which polymorphisms are treated as the imputation pool.
- __init__(alpha, p_value, pn_neutral, pwd, dn, ds, pn_total, ps_total, d=None, b=None, f=None, cutoff=0.15, ln=None, ls=None, omega=None, omega_a=None, omega_na=None, alpha_ci_low=None, alpha_ci_high=None, ci_method=None)
- Parameters:
alpha (float | None)
p_value (float)
pn_neutral (float)
pwd (float)
dn (int)
ds (int)
pn_total (int)
ps_total (int)
d (float | None)
b (float | None)
f (float | None)
cutoff (float)
ln (float | None)
ls (float | None)
omega (float | None)
omega_a (float | None)
omega_na (float | None)
alpha_ci_low (float | None)
alpha_ci_high (float | None)
ci_method (str | None)
- Return type:
None
- omega_a: float | None = None
Adaptive rate
alpha * omegawith imputed alpha (Gossmann, Keightley & Eyre-Walker 2012).
- mkado.analysis.imputed.imputed_mk_test(gene_data, cutoff=0.15, num_synonymous_sites=None, num_nonsynonymous_sites=None, n_bootstrap=0, seed=42)[source]
Perform the imputed McDonald-Kreitman test on a single gene.
The impMKT corrects for slightly deleterious mutations by imputing the count of weakly deleterious nonsynonymous polymorphisms from the ratio of low- to high-frequency synonymous polymorphisms.
Based on Murga-Moreno et al. (2022) G3. https://doi.org/10.1093/g3journal/jkac206
- Parameters:
gene_data (
PolymorphismData) – Polymorphism and divergence data from extract_polymorphism_data()cutoff (
float) – DAF cutoff separating low and high frequency (default 0.15)num_synonymous_sites (
float|None) – Number of synonymous sites (m0) for DFE fractionsnum_nonsynonymous_sites (
float|None) – Number of nonsynonymous sites (mi) for DFE fractionsn_bootstrap (
int) – Number of bootstrap replicates for the alpha CI.0(default) disables CI computation and preserves byte-identical legacy output.seed (
int) – Random seed for the bootstrap (default 42).
- Return type:
- Returns:
ImputedMKResult with corrected alpha and optionally DFE fractions
- mkado.analysis.imputed.imputed_mk_test_multi(gene_data, cutoff=0.15, num_synonymous_sites=None, num_nonsynonymous_sites=None, n_bootstrap=0, seed=42)[source]
Perform the imputed MK test on pooled data from multiple genes.
Pools all polymorphisms and divergence counts across genes, then runs the imputed MK algorithm on the combined data.
- Parameters:
gene_data (
list[PolymorphismData]) – List of PolymorphismData from individual genescutoff (
float) – DAF cutoff separating low and high frequency (default 0.15)num_synonymous_sites (
float|None) – Number of synonymous sites (m0) for DFE fractionsnum_nonsynonymous_sites (
float|None) – Number of nonsynonymous sites (mi) for DFE fractionsn_bootstrap (
int) – Number of bootstrap replicates for the alpha CI.0(default) disables CI computation and preserves byte-identical legacy output.seed (
int) – Random seed for the bootstrap (default 42).
- Return type:
- Returns:
ImputedMKResult with corrected alpha from pooled data
Polarized MK Test
Polarized McDonald-Kreitman test implementation.
- class mkado.analysis.polarized.PolarizedMKResult(dn_ingroup, ds_ingroup, pn_ingroup, ps_ingroup, dn_outgroup, ds_outgroup, dn_unpolarized, ds_unpolarized, pn_unpolarized, ps_unpolarized, p_value_ingroup, ni_ingroup, alpha_ingroup, dos_ingroup, ln=None, ls=None, omega=None)[source]
Bases:
objectResults from a polarized McDonald-Kreitman test.
The polarized test uses a second outgroup (
outgroup2) to assign each fixed difference and each ingroup polymorphism to one of two lineages — the ingroup or the first outgroup — by majority rule on the inferred ancestral state. Sites where the ancestral state cannot be determined are reported under*_unpolarized.Pn and Ps count derived alleles segregating in the ingroup (subject to
min_frequencyfiltering, identical to the standard MK test) and are split by lineage assignment.- Parameters:
dn_ingroup (int)
ds_ingroup (int)
pn_ingroup (int)
ps_ingroup (int)
dn_outgroup (int)
ds_outgroup (int)
dn_unpolarized (int)
ds_unpolarized (int)
pn_unpolarized (int)
ps_unpolarized (int)
p_value_ingroup (float)
ni_ingroup (float | None)
alpha_ingroup (float | None)
dos_ingroup (float | None)
ln (float | None)
ls (float | None)
omega (float | None)
- __init__(dn_ingroup, ds_ingroup, pn_ingroup, ps_ingroup, dn_outgroup, ds_outgroup, dn_unpolarized, ds_unpolarized, pn_unpolarized, ps_unpolarized, p_value_ingroup, ni_ingroup, alpha_ingroup, dos_ingroup, ln=None, ls=None, omega=None)
- Parameters:
dn_ingroup (int)
ds_ingroup (int)
pn_ingroup (int)
ps_ingroup (int)
dn_outgroup (int)
ds_outgroup (int)
dn_unpolarized (int)
ds_unpolarized (int)
pn_unpolarized (int)
ps_unpolarized (int)
p_value_ingroup (float)
ni_ingroup (float | None)
alpha_ingroup (float | None)
dos_ingroup (float | None)
ln (float | None)
ls (float | None)
omega (float | None)
- Return type:
None
- mkado.analysis.polarized.polarized_mk_test(ingroup, outgroup1, outgroup2, reading_frame=1, genetic_code=None, pool_polymorphisms=False, min_frequency=0.0)[source]
Perform a polarized McDonald-Kreitman test.
Uses a second outgroup to determine the ancestral state and assign mutations to specific lineages.
- Parameters:
ingroup (
SequenceSet|str|Path) – SequenceSet or path to FASTA file for ingroup sequencesoutgroup1 (
SequenceSet|str|Path) – SequenceSet or path to FASTA file for first outgroupoutgroup2 (
SequenceSet|str|Path) – SequenceSet or path to FASTA file for second outgroup (used for polarization)reading_frame (
int) – Reading frame (1, 2, or 3)genetic_code (
GeneticCode|None) – Genetic code for translation (uses standard if None)pool_polymorphisms (
bool) – If True, count polymorphisms from both ingroup and outgroup1 (libsequence convention). If False (default), count only ingroup polymorphisms (DnaSP/original MK convention).min_frequency (
float) – Minimum derived allele frequency for a polymorphism to be counted (default 0.0, i.e., include all polymorphisms). Useful for filtering out rare variants that may be slightly deleterious.
- Return type:
- Returns:
PolarizedMKResult containing test statistics
Usage Examples
Working with SequenceSet
from mkado import SequenceSet
# Load sequences from FASTA
seqs = SequenceSet.from_fasta("alignment.fa")
# Filter by name pattern
ingroup = seqs.filter_by_name("dmel")
outgroup = seqs.filter_by_name("dsim")
# Get sequence information
print(f"Number of sequences: {len(seqs)}")
print(f"Alignment length: {seqs.alignment_length}")
Running MK Tests
from mkado import mk_test, SequenceSet
# From file paths
result = mk_test("ingroup.fa", "outgroup.fa")
# From SequenceSet objects
all_seqs = SequenceSet.from_fasta("combined.fa")
ingroup = all_seqs.filter_by_name("species1")
outgroup = all_seqs.filter_by_name("species2")
result = mk_test(ingroup, outgroup)
# Access results
print(f"Dn={result.Dn}, Ds={result.Ds}")
print(f"Pn={result.Pn}, Ps={result.Ps}")
print(f"P-value: {result.p_value}")
print(f"NI: {result.NI}")
print(f"Alpha: {result.alpha}")
Asymptotic Analysis
from mkado import asymptotic_mk_test
result = asymptotic_mk_test("ingroup.fa", "outgroup.fa", n_bins=20)
print(f"Asymptotic Alpha: {result.alpha_asymptotic}")
print(f"95% CI: [{result.ci_low}, {result.ci_high}]")
# Access per-bin data
for freq, alpha in zip(result.frequencies, result.alphas):
print(f"Frequency {freq:.2f}: Alpha = {alpha:.3f}")
Batch Processing
from pathlib import Path
from mkado import mk_test, SequenceSet
# Process multiple files
results = {}
for fasta_file in Path("alignments/").glob("*.fa"):
seqs = SequenceSet.from_fasta(fasta_file)
ingroup = seqs.filter_by_name("species1")
outgroup = seqs.filter_by_name("species2")
if len(ingroup) > 0 and len(outgroup) > 0:
results[fasta_file.stem] = mk_test(ingroup, outgroup)
# Summarize results
for gene, result in results.items():
print(f"{gene}: alpha={result.alpha:.3f}, p={result.p_value:.3f}")
Generating Volcano Plots
from pathlib import Path
from mkado import mk_test, SequenceSet
from mkado.io.plotting import create_volcano_plot
# Collect batch results
results = []
for fasta_file in Path("alignments/").glob("*.fa"):
seqs = SequenceSet.from_fasta(fasta_file)
ingroup = seqs.filter_by_name("species1")
outgroup = seqs.filter_by_name("species2")
if len(ingroup) > 0 and len(outgroup) > 0:
result = mk_test(ingroup, outgroup)
results.append((fasta_file.stem, result))
# Generate volcano plot
create_volcano_plot(results, Path("volcano.png"))
Generating Asymptotic Plots
from pathlib import Path
from mkado import asymptotic_mk_test, SequenceSet
from mkado.io.plotting import create_asymptotic_plot
# Run asymptotic MK test
seqs = SequenceSet.from_fasta("alignment.fa")
ingroup = seqs.filter_by_name("species1")
outgroup = seqs.filter_by_name("species2")
result = asymptotic_mk_test(ingroup, outgroup)
# Generate alpha(x) vs frequency plot
create_asymptotic_plot(result, Path("asymptotic_fit.png"))
Plotting Functions
Plotting functions for MK test results.
- mkado.io.plotting.create_volcano_plot(results, output_path, alpha=0.05)[source]
Create a volcano plot from batch MK test results.
The volcano plot shows -log10(NI) on the X-axis and -log10(p-value) on the Y-axis. A horizontal line indicates the Bonferroni-corrected significance threshold.
- mkado.io.plotting.create_asymptotic_plot(result, output_path)[source]
Create an asymptotic MK test plot showing alpha(x) vs derived allele frequency.
This plot follows the style of Messer & Petrov (2013), showing: - Scatter points of alpha at each frequency bin - The fitted curve (exponential or linear) - A horizontal line at alpha_asymptotic with confidence interval band
- Parameters:
result (
AsymptoticMKResult) – AsymptoticMKResult from asymptotic MK testoutput_path (
Path) – Path to save the plot (PNG, PDF, or SVG)
- Return type: