MaldiAMRKit Documentation#
A Python toolkit for MALDI-TOF mass spectrometry preprocessing for antimicrobial resistance (AMR) prediction. Scikit-learn compatible transformers for seamless integration into machine learning pipelines.
Key Features#
Composable transformers (smoothing, baseline, trimming, normalization), multiple binning strategies, and peak detection. Serializable to JSON/YAML.
Scikit-learn compatible transformers. Drop into any Pipeline,
cross_val_score, or GridSearchCV workflow.
Shift, linear, piecewise, and DTW warping for both binned and raw full-resolution spectra.
VME, ME, sensitivity, specificity, and classification reports following EUCAST conventions. Species-drug stratified and case-based splitting to prevent data leakage.
Build and load DRIAMS-like dataset directories from raw spectra and metadata with year-based subfolders and custom processing handlers.
SpeciesFilter, DrugFilter, QualityFilter, MetadataFilter
combinable with &, |, ~ operators.
PCA, t-SNE, and UMAP scatter plots colored by species, resistance phenotype, or any metadata column.
maldiamrkit preprocess, maldiamrkit quality, and
maldiamrkit build for batch processing. Export to CSV/TXT.
Multi-site and multi-instrument correction via combatlearn.
Quick Example#
from maldiamrkit import MaldiSpectrum, MaldiSet
from maldiamrkit.alignment import Warping
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
# Load dataset (with parallel loading)
data = MaldiSet.from_directory(
"spectra/", "metadata.csv",
aggregate_by=dict(antibiotics="Ceftriaxone"),
n_jobs=-1 # Use all cores
)
# Create pipeline (with parallel warping)
pipe = Pipeline([
("warp", Warping(method="shift", n_jobs=-1)),
("scaler", StandardScaler()),
("clf", RandomForestClassifier())
])
# Train and evaluate
pipe.fit(data.X, data.get_y_single())