Drift Module#
Temporal drift monitoring for MALDI-TOF spectra. The DriftMonitor
class anchors a baseline on the earliest timestamps and reports
three complementary views of drift over subsequent time windows:
reference similarity, PCA centroid trajectory, and peak-selection
stability (plus per-peak Cohen’s d tracking).
Monitor#
- class maldiamrkit.drift.DriftMonitor(time_column, window='30D', baseline_end=None, metric='cosine', n_components=2, min_samples=5)[source]#
Bases:
objectMonitor spectral drift over time using baseline-anchored metrics.
Establishes a baseline from the earliest timestamps, then quantifies drift for later time windows via three complementary views:
reference similarity (distance to baseline median spectrum)
PCA centroid trajectory (baseline-fitted PCA space)
peak-selection stability (Jaccard overlap of top-k discriminative peaks per window vs. baseline) and Cohen’s d tracking of specific peaks
Output is data + plots only; no automated alerts.
- Parameters:
time_column (str) – Metadata column containing timestamps (parsed via
pandas.to_datetime()).window (str or pd.Timedelta, default="30D") – Time window size for
pandas.Grouper(e.g."30D","7D").baseline_end (str, pd.Timestamp, or None, default=None) – End of the baseline period (inclusive). If
None, defaults to the timestamp at the 20th percentile of sorted timestamps.metric (str, default="cosine") – Distance metric for reference similarity (see
maldiamrkit.similarity.spectral_distance()).n_components (int, default=2) – PCA components for PCA-drift monitoring.
min_samples (int, default=5) – Skip time windows with fewer spectra than this (and, for peak-stability / effect-size monitoring, fewer than this many samples in either the R or S class).
- __init__(time_column, window='30D', baseline_end=None, metric='cosine', n_components=2, min_samples=5)[source]#
- fit(maldi_set)[source]#
Establish the baseline reference and PCA space.
- Parameters:
maldi_set (MaldiSet) – Dataset exposing
.Xand.metawithtime_column.- Returns:
self, for chaining.- Return type:
- monitor(maldi_set)[source]#
Reference-similarity timeseries.
Only spectra with
timestamp > baseline_end_are monitored; the baseline is reserved as a reference and excluded from windows to avoid a self-reference artefact in the first window.Returns a DataFrame with columns
window_start,window_end,n_spectra,distance_to_reference.
- monitor_pca(maldi_set)[source]#
PCA centroid + dispersion timeseries.
Only post-baseline spectra (
timestamp > baseline_end_) are included.Returns a DataFrame with columns
window_start,window_end,centroid_pc1,centroid_pc2,dispersion,n_spectra.
- monitor_peak_stability(maldi_set, differential_analysis, antibiotic=None, n_top=20)[source]#
Peak-selection stability (Jaccard) timeseries.
differential_analysismust already have been.run(); its top-n_toppeaks define the baseline peak set. Only post-baseline spectra (timestamp > baseline_end_) are included in the monitored windows.Returns a DataFrame with columns
window_start,stability_score,n_spectra.
Visualization#
- maldiamrkit.drift.plot_reference_drift(monitoring_df, *, baseline_end=None, warning_threshold=None, ax=None, title=None, figsize=(10, 4), show=True)[source]#
Line plot of reference-similarity distance over time.
- Parameters:
monitoring_df (pd.DataFrame) – Output of
DriftMonitor.monitor(). Must containwindow_startanddistance_to_referencecolumns.baseline_end (pd.Timestamp or str, optional) – If given, draw a dashed vertical line at this timestamp so the reader can tell where the baseline period ends and monitoring begins.
warning_threshold (float, optional) – If given, draw a horizontal dashed line at this distance so windows exceeding the threshold are visually flagged.
ax (Axes or None, default=None) – Pre-existing axes.
title (str or None, default=None) – Plot title. Defaults to
"Reference drift".figsize (tuple of float, default=(10, 4)) – Figure size in inches (only used when
axisNone).show (bool, default=True) – Whether to call
plt.show().
- Return type:
tuple[Figure,Axes]- Returns:
fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes)
- maldiamrkit.drift.plot_pca_drift(pca_df, *, baseline_end=None, ax=None, title=None, figsize=(8, 6), show=True)[source]#
PCA centroid trajectory colored by time.
Consecutive windows are connected by a thin grey polyline so the reader can follow the temporal order; time direction is encoded by the colorbar (early → late). Marker size encodes per-window dispersion (mean distance from centroid) when the
dispersioncolumn is present.- Parameters:
pca_df (pd.DataFrame) – Output of
DriftMonitor.monitor_pca(). Must containwindow_start,centroid_pc1, andcentroid_pc2columns;dispersionis used for marker sizing when available.baseline_end (pd.Timestamp or str, optional) – If given, ring the first post-baseline point with a thicker black outline and annotate it
"post-baseline start".ax (Axes or None, default=None) – Pre-existing axes.
title (str or None, default=None) – Plot title. Defaults to
"PCA centroid drift".figsize (tuple of float, default=(8, 6)) – Figure size in inches (only used when
axisNone).show (bool, default=True) – Whether to call
plt.show().
- Return type:
tuple[Figure,Axes]- Returns:
fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes)
- maldiamrkit.drift.plot_peak_stability(stability_df, *, drug=None, threshold=0.5, ax=None, title=None, figsize=(10, 4), show=True)[source]#
Line plot of peak-selection Jaccard stability over time.
- Parameters:
stability_df (pd.DataFrame) – Output of
DriftMonitor.monitor_peak_stability(). Must containwindow_startandstability_scorecolumns.drug (str, optional) – Drug name appended to the default title.
threshold (float or None, default=0.5) – Horizontal dashed line at this Jaccard value (conventional “still-stable” cut-off). Pass
Noneto omit.ax (Axes or None, default=None) – Pre-existing axes.
title (str or None, default=None) – Plot title. Defaults to
"Peak stability"(optionallyf"Peak stability - {drug}").figsize (tuple of float, default=(10, 4)) – Figure size in inches.
show (bool, default=True) – Whether to call
plt.show().
- Return type:
tuple[Figure,Axes]- Returns:
fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes)
- maldiamrkit.drift.plot_effect_size_drift(effect_df, peaks=None, *, drug=None, ax=None, title=None, figsize=(10, 4), legend_loc='best', reference_lines=True, show=True)[source]#
Multi-line plot of per-peak Cohen’s d over time.
- Parameters:
effect_df (pd.DataFrame) – Output of
DriftMonitor.monitor_effect_sizes(). Must contain awindow_startcolumn plus one column per tracked peak.peaks (list of str or None, default=None) – Subset of peak columns to plot.
Noneplots every peak column present ineffect_df.drug (str, optional) – Drug name appended to the default title.
ax (Axes or None, default=None) – Pre-existing axes.
title (str or None, default=None) – Plot title. Defaults to
"Effect size drift"(optionallyf"Effect size drift - {drug}").figsize (tuple of float, default=(10, 4)) – Figure size in inches.
legend_loc (str, default="best") –
matplotliblegend location or"outside"to place the legend to the right of the axes (useful for many peaks).reference_lines (bool, default=True) – Draw dashed guides at Cohen’s d = ±0.5 (medium effect) and ±0.8 (large effect).
show (bool, default=True) – Whether to call
plt.show().
- Return type:
tuple[Figure,Axes]- Returns:
fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes)
Example#
from maldiamrkit import MaldiSet
from maldiamrkit.differential import DifferentialAnalysis
from maldiamrkit.drift import (
DriftMonitor,
plot_reference_drift,
plot_pca_drift,
plot_peak_stability,
plot_effect_size_drift,
)
# MaldiSet with an acquisition-date metadata column
data = MaldiSet.from_directory(
"spectra/", "metadata.csv",
aggregate_by=dict(antibiotics="Ceftriaxone"),
)
# Reference similarity + PCA drift (no labels required)
monitor = DriftMonitor(
time_column="acquisition_date", window="30D",
).fit(data)
ref_df = monitor.monitor(data)
pca_df = monitor.monitor_pca(data)
plot_reference_drift(ref_df, title="Cosine distance to baseline median")
plot_pca_drift(pca_df, title="Centroid trajectory")
# Peak-selection stability + per-peak effect size drift
baseline_analysis = DifferentialAnalysis.from_maldi_set(
data, antibiotic="Ceftriaxone"
).run()
stability_df = monitor.monitor_peak_stability(
data, baseline_analysis, antibiotic="Ceftriaxone", n_top=20,
)
effect_df = monitor.monitor_effect_sizes(
data,
peaks=list(baseline_analysis.top_peaks(n=5)["mz_bin"]),
antibiotic="Ceftriaxone",
)
plot_peak_stability(stability_df)
plot_effect_size_drift(effect_df)