Evaluation Module#
AMR-specific evaluation metrics and stratified splitting utilities, following EUCAST conventions.
Note
LabelEncoder and IntermediateHandling moved to the
Susceptibility module in v0.15. Importing them
from maldiamrkit.evaluation still works but emits a
DeprecationWarning and will be removed in v0.17.
Metrics#
- maldiamrkit.evaluation.very_major_error_rate(y_true, y_pred, resistant_label=1)[source]#
Very Major Error rate: resistant isolates classified as susceptible.
VME = FN / (FN + TP), i.e., the miss rate for resistant samples. This is the most dangerous error type in clinical microbiology.
- Parameters:
y_true (array-like) – True labels.
y_pred (array-like) – Predicted labels.
resistant_label (int, default=1) – Label value representing the resistant class.
- Returns:
VME rate in [0, 1]. Returns 0.0 if no resistant samples exist.
- Return type:
Examples
>>> very_major_error_rate([1, 1, 0, 0], [0, 1, 0, 0]) 0.5
- maldiamrkit.evaluation.major_error_rate(y_true, y_pred, resistant_label=1)[source]#
Major Error rate: susceptible isolates classified as resistant.
ME = FP / (FP + TN), i.e., the false alarm rate for susceptible samples.
- Parameters:
y_true (array-like) – True labels.
y_pred (array-like) – Predicted labels.
resistant_label (int, default=1) – Label value representing the resistant class.
- Returns:
ME rate in [0, 1]. Returns 0.0 if no susceptible samples exist.
- Return type:
Examples
>>> major_error_rate([1, 1, 0, 0], [1, 1, 1, 0]) 0.5
- maldiamrkit.evaluation.sensitivity_score(y_true, y_pred, resistant_label=1)[source]#
Sensitivity (recall) for the resistant class.
Sensitivity = TP / (TP + FN) = 1 - VME.
- maldiamrkit.evaluation.specificity_score(y_true, y_pred, resistant_label=1)[source]#
Specificity (true negative rate) for the susceptible class.
Specificity = TN / (TN + FP) = 1 - ME.
- maldiamrkit.evaluation.categorical_agreement(y_true, y_pred)[source]#
Categorical agreement (accuracy) as reported in AST studies.
CA = (TP + TN) / N.
- Parameters:
y_true (array-like) – True labels.
y_pred (array-like) – Predicted labels.
- Returns:
Agreement rate in [0, 1].
- Return type:
- maldiamrkit.evaluation.vme_me_curve(y_true, y_score, resistant_label=1)[source]#
VME and ME rates at varying decision thresholds.
Useful for selecting an optimal threshold balancing VME against ME.
- Parameters:
y_true (array-like) – True binary labels.
y_score (array-like) – Predicted scores (e.g., probabilities for the resistant class).
resistant_label (int, default=1) – Label value representing the resistant class.
- Return type:
- Returns:
vme_rates (np.ndarray) – VME rates at each threshold.
me_rates (np.ndarray) – ME rates at each threshold.
thresholds (np.ndarray) – Decision thresholds (sorted ascending).
- maldiamrkit.evaluation.amr_classification_report(y_true, y_pred, resistant_label=1)[source]#
Full AMR classification report.
Returns all clinical metrics in a single dictionary.
- Parameters:
y_true (array-like) – True labels.
y_pred (array-like) – Predicted labels.
resistant_label (int, default=1) – Label value representing the resistant class.
- Returns:
Dictionary with keys: vme, me, sensitivity, specificity, categorical_agreement, n_resistant, n_susceptible, n_total.
- Return type:
Examples
>>> report = amr_classification_report([1, 1, 0, 0], [1, 0, 0, 1]) >>> report["vme"] 0.5
- maldiamrkit.evaluation.amr_multilabel_report(y_true, y_pred, *, resistant_label=1, as_dataframe=False)[source]#
AMR classification report for multiple antibiotics.
Computes per-drug VME, ME, sensitivity, specificity, and categorical agreement, plus a macro-average across all drugs.
- Parameters:
y_true (pd.DataFrame) – True binary labels with one column per antibiotic.
y_pred (pd.DataFrame) – Predicted binary labels with matching columns.
resistant_label (int, default=1) – Label value representing the resistant class.
as_dataframe (bool, default=False) – If
True, return aDataFrameinstead of a nested dict.
- Returns:
Per-drug metrics plus a
"macro_avg"entry. When as_dataframe isTrue, rows are drugs +"macro_avg"and columns are metric names.- Return type:
dict or pd.DataFrame
Examples
>>> report = amr_multilabel_report(y_true, y_pred, as_dataframe=True) >>> report.loc["macro_avg", "vme"] 0.15
- maldiamrkit.evaluation.mic_regression_report(y_true, y_pred, *, breakpoints=None, species=None, drug=None, sample_weight=None)[source]#
Compute MIC regression metrics on log2-MIC predictions.
- Parameters:
y_true (array-like) – True
log2(MIC)values.y_pred (array-like) – Predicted
log2(MIC)values.breakpoints (BreakpointTable or None, default=None) – When provided, the report also includes categorical agreement after re-binning both
y_trueandy_predto S/I/R. Requiresspeciesanddrug.species (str or array-like, optional) – Species per sample (or a single species applied to all). Required when
breakpointsis provided.drug (str or array-like, optional) – Drug per sample (or a single drug applied to all). Required when
breakpointsis provided.sample_weight (array-like, optional) – Per-sample weights for the regression metrics. Ignored for categorical agreement.
- Returns:
Keys:
n,rmse_log2,mae_log2,bias_log2,essential_agreement(fraction within ±1 dilution), and when breakpoints are provided alsocategorical_agreement,very_major_error_rate(R predicted as S),major_error_rate(S predicted as R), and per-category sample counts.- Return type:
Notes
“Essential agreement” is the standard clinical benchmark for MIC prediction accuracy: a prediction is essential-agreement-correct if it is within one log2 dilution of the true value.
Sklearn Scorers#
Pre-built scorers for use with cross_val_score or GridSearchCV:
- maldiamrkit.evaluation.vme_scorer#
Scorer that minimizes VME (Very Major Error rate). Use with
cross_val_score(pipe, X, y, scoring=vme_scorer).
- maldiamrkit.evaluation.me_scorer#
Scorer that minimizes ME (Major Error rate). Use with
cross_val_score(pipe, X, y, scoring=me_scorer).
Metrics Example#
from maldiamrkit.evaluation import (
very_major_error_rate, major_error_rate,
amr_classification_report, vme_scorer,
)
from sklearn.model_selection import cross_val_score
# Individual metrics
vme = very_major_error_rate(y_true, y_pred)
me = major_error_rate(y_true, y_pred)
# Full report
report = amr_classification_report(y_true, y_pred)
# Use scorer in cross-validation
scores = cross_val_score(pipe, X, y, cv=5, scoring=vme_scorer)
Splitting Utilities#
- maldiamrkit.evaluation.stratified_species_drug_split(X, y, species, test_size=0.2, random_state=None, min_count=2)[source]#
Stratified train/test split preserving species-drug label distributions.
- Parameters:
X (pd.DataFrame or np.ndarray) – Feature matrix.
y (array-like) – Resistance labels.
species (array-like) – Species labels aligned with X.
test_size (float, default=0.2) – Fraction of samples for the test set.
random_state (int or None, default=None) – Random seed for reproducibility.
min_count (int, default=2) – Minimum samples per species-drug stratum. Smaller groups are merged.
- Returns:
X_train, X_test, y_train, y_test – Split data.
- Return type:
arrays
- maldiamrkit.evaluation.case_based_split(X, y, case_ids, test_size=0.2, random_state=None)[source]#
Train/test split keeping all samples from the same patient together.
Prevents data leakage from having the same patient in both train and test.
- Parameters:
- Returns:
X_train, X_test, y_train, y_test – Split data.
- Return type:
arrays
- class maldiamrkit.evaluation.SpeciesDrugStratifiedKFold(n_splits=5, shuffle=True, random_state=None, min_count=2)[source]#
Bases:
objectK-fold cross-validation with species-drug stratification.
Ensures each fold preserves the distribution of species-drug combinations. Implements the sklearn splitter interface.
- Parameters:
Examples
>>> cv = SpeciesDrugStratifiedKFold(n_splits=5) >>> for train_idx, test_idx in cv.split(X, y, species=species): ... X_train, X_test = X[train_idx], X[test_idx]
- split(X, y, species=None, groups=None)[source]#
Generate train/test indices for each fold.
- Parameters:
X (array-like) – Feature matrix.
y (array-like) – Resistance labels.
species (array-like) – Species labels. If None, falls back to plain stratified KFold.
groups (ignored) – Not used, present for API compatibility.
- Yields:
train_idx, test_idx (np.ndarray) – Indices for train and test sets.
- Return type:
- class maldiamrkit.evaluation.CaseGroupedKFold(n_splits=5, shuffle=True, random_state=None)[source]#
Bases:
objectK-fold cross-validation keeping patient cases together and stratified by
y.All samples from the same case/patient are always in the same fold, and folds are stratified on the resistance label to preserve class balance. Wraps
sklearn.model_selection.StratifiedGroupKFold.- Parameters:
Examples
>>> cv = CaseGroupedKFold(n_splits=5) >>> for train_idx, test_idx in cv.split(X, y, groups=case_ids): ... X_train, X_test = X[train_idx], X[test_idx]
- split(X, y=None, groups=None)[source]#
Generate stratified, group-preserving train/test indices for each fold.
- Parameters:
X (array-like) – Feature matrix.
y (array-like) – Resistance labels. Required for stratification.
groups (array-like) – Case/patient identifiers. Required.
- Yields:
train_idx, test_idx (np.ndarray) – Indices for train and test sets.
- Raises:
ValueError – If
groupsoryis None.- Return type:
Splitting Example#
from maldiamrkit.evaluation import (
stratified_species_drug_split,
case_based_split,
SpeciesDrugStratifiedKFold,
CaseGroupedKFold,
)
# Single split preserving species-drug distributions
X_train, X_test, y_train, y_test = stratified_species_drug_split(
X, y, species=species_labels, test_size=0.2, random_state=42
)
# Patient-grouped split
X_train, X_test, y_train, y_test = case_based_split(
X, y, case_ids=patient_ids, test_size=0.2
)
# Sklearn-compatible CV splitters
cv = SpeciesDrugStratifiedKFold(n_splits=5)
for train_idx, test_idx in cv.split(X, y, species=species_labels):
pass
cv = CaseGroupedKFold(n_splits=5)
for train_idx, test_idx in cv.split(X, y, groups=patient_ids):
pass
Multi-Drug Evaluation#
For predicting resistance to multiple antibiotics simultaneously:
from maldiamrkit.susceptibility import LabelEncoder
from maldiamrkit.evaluation import amr_multilabel_report
from sklearn.multioutput import MultiOutputClassifier
from sklearn.ensemble import RandomForestClassifier
# Encode multi-drug labels (intermediate -> NaN)
enc = LabelEncoder(intermediate="nan")
y_encoded = enc.fit_transform(data.y) # DataFrame with one column per drug
# Train multi-output model
clf = MultiOutputClassifier(RandomForestClassifier())
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
# Per-drug AMR report
report = amr_multilabel_report(y_test, y_pred, as_dataframe=True)
print(report)