Drift Module#

Temporal drift monitoring for MALDI-TOF spectra. The DriftMonitor class anchors a baseline on the earliest timestamps and reports three complementary views of drift over subsequent time windows: reference similarity, PCA centroid trajectory, and peak-selection stability (plus per-peak Cohen’s d tracking).

Monitor#

class maldiamrkit.drift.DriftMonitor(time_column, window='30D', baseline_end=None, metric='cosine', n_components=2, min_samples=5)[source]#

Bases: object

Monitor spectral drift over time using baseline-anchored metrics.

Establishes a baseline from the earliest timestamps, then quantifies drift for later time windows via three complementary views:

reference similarity (distance to baseline median spectrum)
PCA centroid trajectory (baseline-fitted PCA space)
peak-selection stability (Jaccard overlap of top-k discriminative peaks per window vs. baseline) and Cohen’s d tracking of specific peaks

Output is data + plots only; no automated alerts.

Parameters:

time_column (str) – Metadata column containing timestamps (parsed via pandas.to_datetime()).
window (str or pd.Timedelta, default="30D") – Time window size for pandas.Grouper (e.g. "30D", "7D").
baseline_end (str, pd.Timestamp, or None, default=None) – End of the baseline period (inclusive). If None, defaults to the timestamp at the 20th percentile of sorted timestamps.
metric (str, default="cosine") – Distance metric for reference similarity (see maldiamrkit.similarity.spectral_distance()).
n_components (int, default=2) – PCA components for PCA-drift monitoring.
min_samples (int, default=5) – Skip time windows with fewer spectra than this (and, for peak-stability / effect-size monitoring, fewer than this many samples in either the R or S class).

__init__(time_column, window='30D', baseline_end=None, metric='cosine', n_components=2, min_samples=5)[source]#

Parameters:

time_column (str)
window (str | Timedelta)
baseline_end (str | Timestamp | None)
metric (str)
n_components (int)
min_samples (int)

Return type:

None

fit(maldi_set)[source]#

Establish the baseline reference and PCA space.

Parameters:: maldi_set (MaldiSet) – Dataset exposing .X and .meta with time_column.
Returns:: self, for chaining.
Return type:: DriftMonitor

property reference_: ndarray#: Baseline reference spectrum (read-only).

property baseline_end_: Timestamp#: Timestamp used as the (inclusive) baseline cut-off.

monitor(maldi_set)[source]#

Reference-similarity timeseries.

Only spectra with timestamp > baseline_end_ are monitored; the baseline is reserved as a reference and excluded from windows to avoid a self-reference artefact in the first window.

Returns a DataFrame with columns window_start, window_end, n_spectra, distance_to_reference.

Parameters:: maldi_set (MaldiSet)
Return type:: DataFrame

monitor_pca(maldi_set)[source]#

PCA centroid + dispersion timeseries.

Only post-baseline spectra (timestamp > baseline_end_) are included.

Returns a DataFrame with columns window_start, window_end, centroid_pc1, centroid_pc2, dispersion, n_spectra.

Parameters:: maldi_set (MaldiSet)
Return type:: DataFrame

monitor_peak_stability(maldi_set, differential_analysis, antibiotic=None, n_top=20)[source]#

Peak-selection stability (Jaccard) timeseries.

differential_analysis must already have been .run(); its top-n_top peaks define the baseline peak set. Only post-baseline spectra (timestamp > baseline_end_) are included in the monitored windows.

Returns a DataFrame with columns window_start, stability_score, n_spectra.

Parameters:

maldi_set (MaldiSet)
differential_analysis (DifferentialAnalysis)
antibiotic (str | None)
n_top (int)

Return type:

DataFrame

monitor_effect_sizes(maldi_set, peaks, antibiotic=None)[source]#

Per-peak Cohen’s d timeseries.

Only post-baseline spectra (timestamp > baseline_end_) are included.

Returns a DataFrame with window_start plus one column per requested peak (the peak’s mz_bin label as a string).

Parameters:

maldi_set (MaldiSet)
peaks (list[str])
antibiotic (str | None)

Return type:

DataFrame

Visualization#

maldiamrkit.drift.plot_reference_drift(monitoring_df, *, baseline_end=None, warning_threshold=None, ax=None, title=None, figsize=(10, 4), show=True)[source]#

Line plot of reference-similarity distance over time.

Parameters:

monitoring_df (pd.DataFrame) – Output of DriftMonitor.monitor(). Must contain window_start and distance_to_reference columns.
baseline_end (pd.Timestamp or str, optional) – If given, draw a dashed vertical line at this timestamp so the reader can tell where the baseline period ends and monitoring begins.
warning_threshold (float, optional) – If given, draw a horizontal dashed line at this distance so windows exceeding the threshold are visually flagged.
ax (Axes or None, default=None) – Pre-existing axes.
title (str or None, default=None) – Plot title. Defaults to "Reference drift".
figsize (tuple of float, default=(10, 4)) – Figure size in inches (only used when ax is None).
show (bool, default=True) – Whether to call plt.show().

Return type:

tuple[Figure, Axes]

Returns:

fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes)

maldiamrkit.drift.plot_pca_drift(pca_df, *, baseline_end=None, ax=None, title=None, figsize=(8, 6), show=True)[source]#

PCA centroid trajectory colored by time.

Consecutive windows are connected by a thin grey polyline so the reader can follow the temporal order; time direction is encoded by the colorbar (early → late). Marker size encodes per-window dispersion (mean distance from centroid) when the dispersion column is present.

Parameters:

pca_df (pd.DataFrame) – Output of DriftMonitor.monitor_pca(). Must contain window_start, centroid_pc1, and centroid_pc2 columns; dispersion is used for marker sizing when available.
baseline_end (pd.Timestamp or str, optional) – If given, ring the first post-baseline point with a thicker black outline and annotate it "post-baseline start".
ax (Axes or None, default=None) – Pre-existing axes.
title (str or None, default=None) – Plot title. Defaults to "PCA centroid drift".
figsize (tuple of float, default=(8, 6)) – Figure size in inches (only used when ax is None).
show (bool, default=True) – Whether to call plt.show().

Return type:

tuple[Figure, Axes]

Returns:

fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes)

maldiamrkit.drift.plot_peak_stability(stability_df, *, drug=None, threshold=0.5, ax=None, title=None, figsize=(10, 4), show=True)[source]#

Line plot of peak-selection Jaccard stability over time.

Parameters:

stability_df (pd.DataFrame) – Output of DriftMonitor.monitor_peak_stability(). Must contain window_start and stability_score columns.
drug (str, optional) – Drug name appended to the default title.
threshold (float or None, default=0.5) – Horizontal dashed line at this Jaccard value (conventional “still-stable” cut-off). Pass None to omit.
ax (Axes or None, default=None) – Pre-existing axes.
title (str or None, default=None) – Plot title. Defaults to "Peak stability" (optionally f"Peak stability - {drug}").
figsize (tuple of float, default=(10, 4)) – Figure size in inches.
show (bool, default=True) – Whether to call plt.show().

Return type:

tuple[Figure, Axes]

Returns:

fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes)

maldiamrkit.drift.plot_effect_size_drift(effect_df, peaks=None, *, drug=None, ax=None, title=None, figsize=(10, 4), legend_loc='best', reference_lines=True, show=True)[source]#

Multi-line plot of per-peak Cohen’s d over time.

Parameters:

effect_df (pd.DataFrame) – Output of DriftMonitor.monitor_effect_sizes(). Must contain a window_start column plus one column per tracked peak.
peaks (list of str or None, default=None) – Subset of peak columns to plot. None plots every peak column present in effect_df.
drug (str, optional) – Drug name appended to the default title.
ax (Axes or None, default=None) – Pre-existing axes.
title (str or None, default=None) – Plot title. Defaults to "Effect size drift" (optionally f"Effect size drift - {drug}").
figsize (tuple of float, default=(10, 4)) – Figure size in inches.
legend_loc (str, default="best") – matplotlib legend location or "outside" to place the legend to the right of the axes (useful for many peaks).
reference_lines (bool, default=True) – Draw dashed guides at Cohen’s d = ±0.5 (medium effect) and ±0.8 (large effect).
show (bool, default=True) – Whether to call plt.show().

Return type:

tuple[Figure, Axes]

Returns:

fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes)

Example#

from maldiamrkit import MaldiSet
from maldiamrkit.differential import DifferentialAnalysis
from maldiamrkit.drift import (
    DriftMonitor,
    plot_reference_drift,
    plot_pca_drift,
    plot_peak_stability,
    plot_effect_size_drift,
)

# MaldiSet with an acquisition-date metadata column
data = MaldiSet.from_directory(
    "spectra/", "metadata.csv",
    aggregate_by=dict(antibiotics="Ceftriaxone"),
)

# Reference similarity + PCA drift (no labels required)
monitor = DriftMonitor(
    time_column="acquisition_date", window="30D",
).fit(data)

ref_df = monitor.monitor(data)
pca_df = monitor.monitor_pca(data)

plot_reference_drift(ref_df, title="Cosine distance to baseline median")
plot_pca_drift(pca_df, title="Centroid trajectory")

# Peak-selection stability + per-peak effect size drift
baseline_analysis = DifferentialAnalysis.from_maldi_set(
    data, antibiotic="Ceftriaxone"
).run()
stability_df = monitor.monitor_peak_stability(
    data, baseline_analysis, antibiotic="Ceftriaxone", n_top=20,
)
effect_df = monitor.monitor_effect_sizes(
    data,
    peaks=list(baseline_analysis.top_peaks(n=5)["mz_bin"]),
    antibiotic="Ceftriaxone",
)

plot_peak_stability(stability_df)
plot_effect_size_drift(effect_df)