Differential Module#

Per-bin differential peak testing between resistant (R) and susceptible (S) groups, with multiple-testing correction, log2 fold change, Cohen’s d effect size, and AMR-aware visualizations.

Analysis#

class maldiamrkit.differential.DifferentialAnalysis(X, y)[source]#

Bases: object

Identify discriminative m/z peaks between resistant and susceptible groups.

Given a binned feature matrix and binary labels (0 = susceptible, 1 = resistant), the analysis iterates over each m/z bin and computes a statistical test, a log2 fold change, and Cohen’s d effect size comparing the two groups. Multiple-testing correction is applied across bins.

Parameters:

X (pd.DataFrame) – Feature matrix of shape (n_samples, n_features). Column names are m/z bin identifiers (numeric or string).
y (pd.Series or ndarray) – Binary labels aligned with X rows: 0 = susceptible, 1 = resistant. Any sample with a missing / NaN label is dropped before analysis.

Variables:

X (pd.DataFrame) – Feature matrix (possibly subset to rows with non-missing labels).
y (pd.Series) – Labels aligned with X.
results (pd.DataFrame or None) – Populated by run() with columns mz_bin, mean_r, mean_s, fold_change, p_value, adjusted_p_value, effect_size.

Examples

>>> analysis = DifferentialAnalysis(X, y).run()
>>> analysis.top_peaks(n=10)
>>> analysis.significant_peaks(fc_threshold=1.0, p_threshold=0.05)

__init__(X, y)[source]#

Parameters:

X (DataFrame)
y (Series | ndarray)

Return type:

None

classmethod from_maldi_set(maldi_set, antibiotic=None)[source]#

Build a DifferentialAnalysis from a MaldiSet.

Extracts the feature matrix via maldi_set.X and labels via maldi_set.get_y_single(antibiotic).

Parameters:

maldi_set (MaldiSet) – Dataset providing X and get_y_single.
antibiotic (str or None, default=None) – Antibiotic label to analyse. If None, the first configured antibiotic is used.

Returns:

Unrun analysis (call run() next).

Return type:

DifferentialAnalysis

run(test=StatisticalTest.mann_whitney, correction=CorrectionMethod.fdr_bh, mz_ranges=None, peak_detector=None)[source]#

Run per-bin statistical analysis.

For each kept column of X, splits samples by label, computes the requested test statistic and p-value, the log2 fold change of group means, and Cohen’s d. Multiple-testing correction is then applied across the kept bins and the result is stored in results.

Pre-test filters reduce the number of hypotheses - this is often decisive on small datasets where a full 1k-10k bin scan would exceed FDR power.

Parameters:

test ({"mann_whitney", "t_test"} or StatisticalTest) – Statistical test to apply per bin.
correction ({"fdr_bh", "fdr_by", "bonferroni"} or CorrectionMethod) – Multiple-testing correction.
mz_ranges (tuple, list of tuples, or None, default=None) – Restrict analysis to bins whose m/z value falls within the given range(s). Pass a single (low, high) tuple or a list of such tuples for a union of intervals. Endpoints are inclusive. Column labels are coerced to float for range comparison; non-numeric columns are excluded. None disables the filter.
peak_detector (MaldiPeakDetector or None, default=None) – Restrict analysis to bins that are peaks in at least one sample according to the provided detector. The detector’s fit_transform is run on the (range-filtered) feature matrix and any bin that is non-zero in at least one row is kept. None disables the filter.

Returns:

self, for method chaining.

Return type:

DifferentialAnalysis

Raises:

ValueError – If y does not contain both classes, or if the combined filters leave no bins to test.

property results: DataFrame#

Per-bin results table.

Returns:: Columns: mz_bin, mean_r, mean_s, fold_change, p_value, adjusted_p_value, effect_size.
Return type:: pd.DataFrame
Raises:: RuntimeError – If run() has not been called yet.

top_peaks(n=20)[source]#

Return the top n peaks sorted by adjusted p-value ascending.

Parameters:: n (int, default=20) – Number of peaks to return.
Returns:: Sub-table with the n lowest adjusted p-values.
Return type:: pd.DataFrame

significant_peaks(fc_threshold=1.0, p_threshold=0.05)[source]#

Return peaks passing both fold-change and adjusted p-value filters.

Parameters:

fc_threshold (float, default=1.0) – Absolute log2 fold-change threshold (inclusive).
p_threshold (float, default=0.05) – Adjusted p-value threshold (inclusive).

Returns:

Peaks where |fold_change| >= fc_threshold AND adjusted_p_value <= p_threshold.

Return type:

pd.DataFrame

static compare_drugs(analyses, fc_threshold=1.0, p_threshold=0.05)[source]#

Build a boolean significance matrix across multiple drug analyses.

Parameters:

analyses (dict[str, DifferentialAnalysis]) – Mapping from drug name to a fitted DifferentialAnalysis.
fc_threshold (float, default=1.0) – Absolute log2 fold-change threshold for significance.
p_threshold (float, default=0.05) – Adjusted p-value threshold for significance.

Returns:

Boolean matrix indexed by the union of significant m/z bins across all drugs; columns are drug names; True indicates the peak is significant for that drug.

Return type:

pd.DataFrame

Raises:

ValueError – If analyses is empty.

class maldiamrkit.differential.StatisticalTest(value)[source]#

Bases: str, Enum

Supported statistical tests for run().

Variables:

mann_whitney (str) – Two-sided Mann-Whitney U test (non-parametric).
t_test (str) – Welch’s two-sample t-test (unequal variances).

mann_whitney = 'mann_whitney'#

t_test = 't_test'#

class maldiamrkit.differential.CorrectionMethod(value)[source]#

Bases: str, Enum

Supported multiple-testing corrections for run().

Variables:

fdr_bh (str) – Benjamini-Hochberg false discovery rate.
fdr_by (str) – Benjamini-Yekutieli false discovery rate.
bonferroni (str) – Bonferroni family-wise correction.

fdr_bh = 'fdr_bh'#

fdr_by = 'fdr_by'#

bonferroni = 'bonferroni'#

Visualization#

maldiamrkit.differential.plot_volcano(results, fc_threshold=1.0, p_threshold=0.05, *, ax=None, title=None, drug=None, figsize=(8, 6), annotate_top_k=None, grid=True, show=True)[source]#

Volcano plot of log2 fold change vs. -log10 adjusted p-value.

Points are coloured by direction and significance: grey for non-significant, red for up in resistant (fold_change > fc_threshold and adjusted_p_value <= p_threshold), blue for up in susceptible (fold_change < -fc_threshold and adjusted_p_value <= p_threshold). Horizontal and vertical dashed lines mark the thresholds and are referenced in the legend with their counts.

Parameters:

results (pd.DataFrame) – Output of DifferentialAnalysis.results. Must contain fold_change and adjusted_p_value columns.
fc_threshold (float, default=1.0) – Absolute log2 fold-change threshold (drawn as vertical dashed lines at \(\pm\) fc_threshold).
p_threshold (float, default=0.05) – Adjusted p-value threshold (drawn as a horizontal dashed line at -log10(p_threshold)).
ax (Axes or None, default=None) – Pre-existing axes. If None, a new figure and axes are created.
title (str or None, default=None) – Plot title. Defaults to "Volcano plot"; if drug is given, the default becomes f"Volcano plot - {drug}".
drug (str or None, default=None) – Drug name appended to the default title. Ignored when title is explicitly provided.
figsize (tuple of float, default=(8, 6)) – Figure size in inches (only used when ax is None).
annotate_top_k (int, optional) – If given, label the k most significant peaks with their mz_bin value. Requires an mz_bin column in results.
grid (bool, default=True) – Draw a faint background grid.
show (bool, default=True) – Call plt.show() at the end.

Return type:

tuple[Figure, Axes]

Returns:

fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes)

maldiamrkit.differential.plot_manhattan(results, p_threshold=0.05, *, ax=None, title=None, drug=None, figsize=(12, 4), annotate_top_k=None, grid=True, show=True)[source]#

Manhattan plot along the m/z axis.

x-axis is the numeric m/z bin value; y-axis is -log10(adjusted_p_value). Points with adjusted_p_value <= p_threshold are highlighted in red, and the legend reports per-class counts.

Parameters:

results (pd.DataFrame) – Output of DifferentialAnalysis.results. Must contain mz_bin and adjusted_p_value columns. mz_bin values that cannot be coerced to float are excluded.
p_threshold (float, default=0.05) – Adjusted p-value threshold.
ax (Axes or None, default=None) – Pre-existing axes.
title (str or None, default=None) – Plot title. Defaults to "Manhattan plot"; if drug is given, the default becomes f"Manhattan plot - {drug}".
drug (str or None, default=None) – Drug name appended to the default title. Ignored when title is explicitly provided.
figsize (tuple of float, default=(12, 4)) – Figure size in inches.
annotate_top_k (int, optional) – If given, label the k most significant peaks with their mz_bin value.
grid (bool, default=True) – Draw a faint background grid.
show (bool, default=True) – Call plt.show() at the end.

Return type:

tuple[Figure, Axes]

Returns:

fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes)

maldiamrkit.differential.plot_drug_comparison(comparison_df, *, kind=DrugComparisonKind.heatmap, ax=None, title=None, figsize=(10, 8), show=True)[source]#

Visualise a multi-drug differential-peak comparison matrix.

Parameters:

comparison_df (pd.DataFrame) – Boolean significance matrix from DifferentialAnalysis.compare_drugs(). Index = m/z bins, columns = drug names, values coerced to bool.
kind ({"heatmap", "upset"} or DrugComparisonKind, default="heatmap") –
Rendering style.
- "heatmap": compact binary heatmap of peaks x drugs. Drug labels show per-drug significant-peak counts.
- "upset": UpSet-style plot showing intersection counts across drug combinations.
ax (Axes or None, default=None) – Pre-existing axes (used only by kind="heatmap"; ignored for "upset" which needs its own composite figure).
title (str or None, default=None) – Plot title. Defaults to "Drug comparison".
figsize (tuple of float, default=(10, 8)) – Figure size in inches (only used when ax is None).
show (bool, default=True) – Call plt.show() at the end.

Return type:

tuple[Figure, Axes]

Returns:

fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes) – For kind="upset", the returned Axes is the intersection-size bar chart; the drug-membership matrix is drawn on a second Axes inside the same Figure.

class maldiamrkit.differential.DrugComparisonKind(value)[source]#

Bases: str, Enum

Rendering kind for plot_drug_comparison().

Variables:

heatmap (str) – Boolean rows x drugs heatmap (compact, precise positions).
upset (str) – UpSet-style intersection plot: bar chart of intersection sizes plus a dot matrix of drug membership.

heatmap = 'heatmap'#

upset = 'upset'#

Example#

from maldiamrkit.differential import (
    DifferentialAnalysis,
    plot_volcano,
    plot_manhattan,
    plot_drug_comparison,
)

# Per-drug analysis: run Mann-Whitney + FDR-BH across all m/z bins
analysis = DifferentialAnalysis.from_maldi_set(
    maldi_set, antibiotic="Ceftriaxone"
).run(test="mann_whitney", correction="fdr_bh")

# On small datasets, narrow the hypothesis set before correction:
from maldiamrkit.detection import MaldiPeakDetector
analysis = DifferentialAnalysis.from_maldi_set(
    maldi_set, antibiotic="Ceftriaxone"
).run(
    mz_ranges=[(2000, 5000), (9000, 12000)],
    peak_detector=MaldiPeakDetector(prominence=1e-4),
)

# Inspect the top 20 peaks by adjusted p-value
analysis.top_peaks(n=20)

# Significance filter: |log2FC| >= 1 and adjusted p-value <= 0.05
analysis.significant_peaks(fc_threshold=1.0, p_threshold=0.05)

# Volcano and Manhattan visualizations
plot_volcano(analysis.results, fc_threshold=1.0, p_threshold=0.05)
plot_manhattan(analysis.results, p_threshold=0.05)

# Multi-drug comparison: which peaks are shared / unique across drugs?
comparison = DifferentialAnalysis.compare_drugs({
    "Ceftriaxone": analysis_cro,
    "Ceftazidime": analysis_caz,
    "Meropenem":   analysis_mem,
})
plot_drug_comparison(comparison, kind="heatmap")
plot_drug_comparison(comparison, kind="upset")