Differential Module#
Per-bin differential peak testing between resistant (R) and susceptible (S) groups, with multiple-testing correction, log2 fold change, Cohen’s d effect size, and AMR-aware visualizations.
Analysis#
- class maldiamrkit.differential.DifferentialAnalysis(X, y)[source]#
Bases:
objectIdentify discriminative m/z peaks between resistant and susceptible groups.
Given a binned feature matrix and binary labels (0 = susceptible, 1 = resistant), the analysis iterates over each m/z bin and computes a statistical test, a log2 fold change, and Cohen’s d effect size comparing the two groups. Multiple-testing correction is applied across bins.
- Parameters:
X (pd.DataFrame) – Feature matrix of shape
(n_samples, n_features). Column names are m/z bin identifiers (numeric or string).y (pd.Series or ndarray) – Binary labels aligned with
Xrows:0= susceptible,1= resistant. Any sample with a missing / NaN label is dropped before analysis.
- Variables:
X (pd.DataFrame) – Feature matrix (possibly subset to rows with non-missing labels).
y (pd.Series) – Labels aligned with
X.results (pd.DataFrame or None) – Populated by
run()with columnsmz_bin,mean_r,mean_s,fold_change,p_value,adjusted_p_value,effect_size.
Examples
>>> analysis = DifferentialAnalysis(X, y).run() >>> analysis.top_peaks(n=10) >>> analysis.significant_peaks(fc_threshold=1.0, p_threshold=0.05)
- classmethod from_maldi_set(maldi_set, antibiotic=None)[source]#
Build a
DifferentialAnalysisfrom aMaldiSet.Extracts the feature matrix via
maldi_set.Xand labels viamaldi_set.get_y_single(antibiotic).- Parameters:
- Returns:
Unrun analysis (call
run()next).- Return type:
- run(test=StatisticalTest.mann_whitney, correction=CorrectionMethod.fdr_bh, mz_ranges=None, peak_detector=None)[source]#
Run per-bin statistical analysis.
For each kept column of
X, splits samples by label, computes the requested test statistic and p-value, the log2 fold change of group means, and Cohen’s d. Multiple-testing correction is then applied across the kept bins and the result is stored inresults.Pre-test filters reduce the number of hypotheses - this is often decisive on small datasets where a full 1k-10k bin scan would exceed FDR power.
- Parameters:
test ({"mann_whitney", "t_test"} or StatisticalTest) – Statistical test to apply per bin.
correction ({"fdr_bh", "fdr_by", "bonferroni"} or CorrectionMethod) – Multiple-testing correction.
mz_ranges (tuple, list of tuples, or None, default=None) – Restrict analysis to bins whose m/z value falls within the given range(s). Pass a single
(low, high)tuple or a list of such tuples for a union of intervals. Endpoints are inclusive. Column labels are coerced tofloatfor range comparison; non-numeric columns are excluded.Nonedisables the filter.peak_detector (MaldiPeakDetector or None, default=None) – Restrict analysis to bins that are peaks in at least one sample according to the provided detector. The detector’s
fit_transformis run on the (range-filtered) feature matrix and any bin that is non-zero in at least one row is kept.Nonedisables the filter.
- Returns:
self, for method chaining.- Return type:
- Raises:
ValueError – If
ydoes not contain both classes, or if the combined filters leave no bins to test.
- property results: DataFrame#
Per-bin results table.
- Returns:
Columns:
mz_bin,mean_r,mean_s,fold_change,p_value,adjusted_p_value,effect_size.- Return type:
pd.DataFrame
- Raises:
RuntimeError – If
run()has not been called yet.
- top_peaks(n=20)[source]#
Return the top
npeaks sorted by adjusted p-value ascending.- Parameters:
n (int, default=20) – Number of peaks to return.
- Returns:
Sub-table with the
nlowest adjusted p-values.- Return type:
pd.DataFrame
- significant_peaks(fc_threshold=1.0, p_threshold=0.05)[source]#
Return peaks passing both fold-change and adjusted p-value filters.
- static compare_drugs(analyses, fc_threshold=1.0, p_threshold=0.05)[source]#
Build a boolean significance matrix across multiple drug analyses.
- Parameters:
analyses (dict[str, DifferentialAnalysis]) – Mapping from drug name to a fitted
DifferentialAnalysis.fc_threshold (float, default=1.0) – Absolute log2 fold-change threshold for significance.
p_threshold (float, default=0.05) – Adjusted p-value threshold for significance.
- Returns:
Boolean matrix indexed by the union of significant m/z bins across all drugs; columns are drug names;
Trueindicates the peak is significant for that drug.- Return type:
pd.DataFrame
- Raises:
ValueError – If analyses is empty.
- class maldiamrkit.differential.StatisticalTest(value)[source]#
-
Supported statistical tests for
run().- Variables:
- mann_whitney = 'mann_whitney'#
- t_test = 't_test'#
Visualization#
- maldiamrkit.differential.plot_volcano(results, fc_threshold=1.0, p_threshold=0.05, *, ax=None, title=None, drug=None, figsize=(8, 6), annotate_top_k=None, grid=True, show=True)[source]#
Volcano plot of log2 fold change vs. -log10 adjusted p-value.
Points are coloured by direction and significance: grey for non-significant, red for up in resistant (
fold_change > fc_thresholdandadjusted_p_value <= p_threshold), blue for up in susceptible (fold_change < -fc_thresholdandadjusted_p_value <= p_threshold). Horizontal and vertical dashed lines mark the thresholds and are referenced in the legend with their counts.- Parameters:
results (pd.DataFrame) – Output of
DifferentialAnalysis.results. Must containfold_changeandadjusted_p_valuecolumns.fc_threshold (float, default=1.0) – Absolute log2 fold-change threshold (drawn as vertical dashed lines at \(\pm\)
fc_threshold).p_threshold (float, default=0.05) – Adjusted p-value threshold (drawn as a horizontal dashed line at
-log10(p_threshold)).ax (Axes or None, default=None) – Pre-existing axes. If
None, a new figure and axes are created.title (str or None, default=None) – Plot title. Defaults to
"Volcano plot"; ifdrugis given, the default becomesf"Volcano plot - {drug}".drug (str or None, default=None) – Drug name appended to the default title. Ignored when
titleis explicitly provided.figsize (tuple of float, default=(8, 6)) – Figure size in inches (only used when
axisNone).annotate_top_k (int, optional) – If given, label the
kmost significant peaks with theirmz_binvalue. Requires anmz_bincolumn inresults.grid (bool, default=True) – Draw a faint background grid.
show (bool, default=True) – Call
plt.show()at the end.
- Return type:
tuple[Figure,Axes]- Returns:
fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes)
- maldiamrkit.differential.plot_manhattan(results, p_threshold=0.05, *, ax=None, title=None, drug=None, figsize=(12, 4), annotate_top_k=None, grid=True, show=True)[source]#
Manhattan plot along the m/z axis.
x-axis is the numeric m/z bin value; y-axis is
-log10(adjusted_p_value). Points withadjusted_p_value <= p_thresholdare highlighted in red, and the legend reports per-class counts.- Parameters:
results (pd.DataFrame) – Output of
DifferentialAnalysis.results. Must containmz_binandadjusted_p_valuecolumns.mz_binvalues that cannot be coerced to float are excluded.p_threshold (float, default=0.05) – Adjusted p-value threshold.
ax (Axes or None, default=None) – Pre-existing axes.
title (str or None, default=None) – Plot title. Defaults to
"Manhattan plot"; ifdrugis given, the default becomesf"Manhattan plot - {drug}".drug (str or None, default=None) – Drug name appended to the default title. Ignored when
titleis explicitly provided.figsize (tuple of float, default=(12, 4)) – Figure size in inches.
annotate_top_k (int, optional) – If given, label the
kmost significant peaks with theirmz_binvalue.grid (bool, default=True) – Draw a faint background grid.
show (bool, default=True) – Call
plt.show()at the end.
- Return type:
tuple[Figure,Axes]- Returns:
fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes)
- maldiamrkit.differential.plot_drug_comparison(comparison_df, *, kind=DrugComparisonKind.heatmap, ax=None, title=None, figsize=(10, 8), show=True)[source]#
Visualise a multi-drug differential-peak comparison matrix.
- Parameters:
comparison_df (pd.DataFrame) – Boolean significance matrix from
DifferentialAnalysis.compare_drugs(). Index = m/z bins, columns = drug names, values coerced tobool.kind ({"heatmap", "upset"} or DrugComparisonKind, default="heatmap") –
Rendering style.
"heatmap": compact binary heatmap of peaks x drugs. Drug labels show per-drug significant-peak counts."upset": UpSet-style plot showing intersection counts across drug combinations.
ax (Axes or None, default=None) – Pre-existing axes (used only by
kind="heatmap"; ignored for"upset"which needs its own composite figure).title (str or None, default=None) – Plot title. Defaults to
"Drug comparison".figsize (tuple of float, default=(10, 8)) – Figure size in inches (only used when
axisNone).show (bool, default=True) – Call
plt.show()at the end.
- Return type:
tuple[Figure,Axes]- Returns:
fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes) – For
kind="upset", the returned Axes is the intersection-size bar chart; the drug-membership matrix is drawn on a second Axes inside the same Figure.
- class maldiamrkit.differential.DrugComparisonKind(value)[source]#
-
Rendering kind for
plot_drug_comparison().- Variables:
- heatmap = 'heatmap'#
- upset = 'upset'#
Example#
from maldiamrkit.differential import (
DifferentialAnalysis,
plot_volcano,
plot_manhattan,
plot_drug_comparison,
)
# Per-drug analysis: run Mann-Whitney + FDR-BH across all m/z bins
analysis = DifferentialAnalysis.from_maldi_set(
maldi_set, antibiotic="Ceftriaxone"
).run(test="mann_whitney", correction="fdr_bh")
# On small datasets, narrow the hypothesis set before correction:
from maldiamrkit.detection import MaldiPeakDetector
analysis = DifferentialAnalysis.from_maldi_set(
maldi_set, antibiotic="Ceftriaxone"
).run(
mz_ranges=[(2000, 5000), (9000, 12000)],
peak_detector=MaldiPeakDetector(prominence=1e-4),
)
# Inspect the top 20 peaks by adjusted p-value
analysis.top_peaks(n=20)
# Significance filter: |log2FC| >= 1 and adjusted p-value <= 0.05
analysis.significant_peaks(fc_threshold=1.0, p_threshold=0.05)
# Volcano and Manhattan visualizations
plot_volcano(analysis.results, fc_threshold=1.0, p_threshold=0.05)
plot_manhattan(analysis.results, p_threshold=0.05)
# Multi-drug comparison: which peaks are shared / unique across drugs?
comparison = DifferentialAnalysis.compare_drugs({
"Ceftriaxone": analysis_cro,
"Ceftazidime": analysis_caz,
"Meropenem": analysis_mem,
})
plot_drug_comparison(comparison, kind="heatmap")
plot_drug_comparison(comparison, kind="upset")