Susceptibility Module#
Clinical susceptibility utilities: MIC encoding, breakpoint tables, and
R/I/S label encoding. Added in v0.15. The
LabelEncoder previously lived in the
Evaluation module and was moved here to sit alongside
the new MIC tooling; the old import path still works for one release with
a DeprecationWarning.
The regression-style evaluation function
maldiamrkit.evaluation.mic_regression_report() lives in the
Evaluation module alongside the binary AMR metrics it
complements.
MIC Encoding#
- class maldiamrkit.susceptibility.MICEncoder(breakpoints=None, *, mic_col='MIC', species_col=None, species=None, drug=None, drug_col=None)[source]#
Bases:
BaseEstimator,TransformerMixinEncode MIC strings into log2 numeric values and optional S/I/R labels.
- Parameters:
breakpoints (BreakpointTable or None, default=None) – When provided, each MIC is also categorised as
S/I/Rand flagged for ATU. WhenNone, onlylog2_micandcensoredcolumns are populated;category/atu/sourcecolumns are present but filled withpd.NA.mic_col (str, default="MIC") – Name of the MIC column in the input DataFrame.
species_col (str or None, default=None) – Name of the species column in the input DataFrame. Required when
breakpointsis provided unlessspeciesis given as a scalar.drug (str or None, default=None) – Antibiotic name applied to all rows (single-drug case). Mutually exclusive with
drug_col.drug_col (str or None, default=None) – Name of the drug column in the input DataFrame (multi-drug case). Mutually exclusive with
drug.species (str or None, default=None) – Species applied to all rows (single-species case). Mutually exclusive with
species_col.
Notes
The censoring rule treats
≤/</≥/>qualifiers in the source MIC strings as censored point estimates: the parsed numeric is kept aslog2_micandcensoredis set toTrue, so downstream code (e.g. censoring-aware loss functions) can choose how to use them.See also
BreakpointTableClinical breakpoint lookup consumed by this encoder.
maldiamrkit.io.parse_mic_columnUnderlying MIC string parser.
- __init__(breakpoints=None, *, mic_col='MIC', species_col=None, species=None, drug=None, drug_col=None)[source]#
- fit(X, y=None, **kwargs)[source]#
Validate configuration (no statistics learned).
- Parameters:
X (pd.DataFrame) – Input frame with at least
mic_col. Other required columns depend on the chosen species/drug configuration.y (ignored) – Present for sklearn API compatibility.
**kwargs – Ignored.
- Return type:
self
- transform(X)[source]#
Encode MIC strings.
- Parameters:
X (pd.DataFrame) – Input frame with
mic_col.- Returns:
Columns
log2_mic,censored,category,atu,sourceindexed likeX.- Return type:
pd.DataFrame
Breakpoints#
- class maldiamrkit.susceptibility.BreakpointTable(rows, *, guideline='EUCAST', version='', year=None, source=None)[source]#
Bases:
objectClinical breakpoint table for MIC interpretation.
Holds a set of
(species, drug) → (s_le, r_gt, [atu_low, atu_high])rows from a single guideline release (e.g. EUCAST v16.0). Useapply()for single MICs andapply_batch()for arrays;MICEncoderconsumes the batch API.- Parameters:
rows (pd.DataFrame) – DataFrame with at least the columns
species,drug,s_le,r_gt. Optional columns:atu_low,atu_high.guideline (str, default="EUCAST") – e.g.
"EUCAST".version (str, default="") – Guideline version, e.g.
"16.0".year (int or None, default=None) – Calendar year the guideline was published.
source (str or None, default=None) – Free-text provenance, e.g.
"EUCAST Clinical Breakpoints v16.0 (2026-01-01)".
- Raises:
ValueError – If required columns are missing, threshold types are not numeric, or any row violates
s_le ≤ r_gt.
Notes
EUCAST’s literal table format is preserved:
s_leis the largest MIC classified asSandr_gtis the largest MIC not classified asR. Whens_le == r_gtthere is noIzone.- apply(species, drug, mic)[source]#
Categorise a single MIC value against the table.
- Parameters:
- Returns:
See
BreakpointResult.- Return type:
- apply_batch(species, drug, mic)[source]#
Categorise an array of MIC values.
speciesanddrugmay be scalars (broadcast to all rows) or arrays of the same length asmic.- Parameters:
- Returns:
Columns:
category(object,"S"/"I"/"R"/NA),atu(bool),source(object, possibly NA for unmatched rows).- Return type:
pd.DataFrame
- classmethod from_yaml(path)[source]#
Load a breakpoint table from a YAML file.
The YAML must have keys
guideline,version, optionalyearandsource, and arowslist whose entries carryspecies, drug, s_le, r_gtand optionallyatu_low, atu_high.- Parameters:
- Return type:
- classmethod from_version(version)[source]#
Load a bundled EUCAST table by version string, e.g.
"16.0".- Parameters:
version (
str)- Return type:
- classmethod from_year(year)[source]#
Load a bundled EUCAST table by calendar year of publication.
EUCAST publishes annually but the version-to-year mapping isn’t a clean function (mid-year dot releases exist). When several bundled versions match the same year, the highest version is returned.
- Parameters:
year (
int)- Return type:
- class maldiamrkit.susceptibility.BreakpointResult(category, atu, source)[source]#
Bases:
objectResult of applying a clinical breakpoint to a single MIC value.
- Variables:
category ({"S", "I", "R"} or None) – Clinical category.
"S"(Susceptible, standard dosing),"I"(Susceptible, increased exposure – modern EUCAST), or"R"(Resistant).Nonewhen the lookup failed (no row for this(species, drug), or MIC is NaN).atu (bool) – True when the MIC value falls in the species/drug ATU range. Orthogonal to
category– not a third clinical category.source (str or None) – Provenance string, e.g.
"EUCAST v16.0".Nonewhen the lookup failed.
- Parameters:
Label Encoding#
- class maldiamrkit.susceptibility.LabelEncoder(intermediate=IntermediateHandling.susceptible)[source]#
Bases:
BaseEstimator,TransformerMixinEncode R/I/S resistance labels to binary (0/1).
Supports configurable handling of intermediate (I) labels. Accepts both 1-D arrays (single drug) and 2-D DataFrames (multiple drugs).
- Parameters:
intermediate (str, default="susceptible") –
How to handle intermediate (“I”) labels:
"susceptible": treat I as susceptible (0) - conservative, avoids false resistance calls."resistant": treat I as resistant (1) - stricter, avoids missing resistance."drop": remove samples with I labels entirely. Note: this changes the output array length (samples with I labels are excluded) and is not compatible with sklearn pipelines that expect consistent sample counts."nan": map I toNaN. Useful for multi-drug encoding where each drug is handled independently. Output dtype isfloat64(required to holdNaN).
- Variables:
classes (ndarray) – Array of
[0, 1]after fitting.- Raises:
ValueError – If
intermediateis not one of the accepted values.
- __init__(intermediate=IntermediateHandling.susceptible)[source]#
- Parameters:
intermediate (
str|IntermediateHandling)- Return type:
None
- fit(y, **kwargs)[source]#
Fit the encoder (no-op, just sets
classes_).- Parameters:
y (array-like) – Labels to learn from (unused beyond validation).
**kwargs (dict) – Additional keyword arguments (unused, accepted for sklearn compatibility).
- Return type:
self
- transform(y)[source]#
Transform labels to binary.
- Parameters:
y (array-like or pd.DataFrame) – String labels (R/I/S or resistant/intermediate/susceptible). If a DataFrame is passed, each column is encoded independently.
- Returns:
Binary encoded labels. Returns a DataFrame when the input is a DataFrame (or a 2-D ndarray), preserving column names and index. Returns a 1-D ndarray for 1-D input.
- Return type:
ndarray or pd.DataFrame
- fit_transform(y, **kwargs)[source]#
Fit the encoder and transform labels in one step.
- Parameters:
y (array-like or pd.DataFrame) – String labels (R/I/S or resistant/intermediate/susceptible). If a DataFrame is passed, each column is encoded independently.
**kwargs (dict) – Additional keyword arguments (unused, accepted for sklearn compatibility).
- Returns:
Binary encoded labels. Returns a DataFrame when the input is a DataFrame, preserving column names and index.
- Return type:
ndarray or pd.DataFrame
- class maldiamrkit.susceptibility.IntermediateHandling(value)[source]#
-
Strategy for handling intermediate (I) resistance labels.
- Variables:
- susceptible = 'susceptible'#
- resistant = 'resistant'#
- drop = 'drop'#
- nan = 'nan'#
Label Encoding Example#
from maldiamrkit.susceptibility import LabelEncoder
enc = LabelEncoder() # I -> susceptible (default)
y_binary = enc.fit_transform(["R", "S", "I", "R", "S"])
# array([1, 0, 0, 1, 0])
# Treat intermediate as resistant
enc = LabelEncoder(intermediate="resistant")
y_binary = enc.fit_transform(["R", "S", "I"])
# array([1, 0, 1])
# Drop intermediate samples entirely
enc = LabelEncoder(intermediate="drop")
y_binary = enc.fit_transform(["R", "S", "I"])
# array([1, 0])
MIC Encoding Example#
End-to-end: from raw MIC strings to log2(MIC) regression targets and
S/I/R category labels, using a bundled EUCAST breakpoint table. The
regression evaluator (maldiamrkit.evaluation.mic_regression_report())
is imported from the Evaluation module.
from maldiamrkit.susceptibility import BreakpointTable, MICEncoder
from maldiamrkit.evaluation import mic_regression_report
# Load the latest bundled EUCAST table
bp = BreakpointTable.from_latest()
enc = MICEncoder(
breakpoints=bp,
species_col="Species",
drug="Ceftriaxone",
)
targets = enc.fit_transform(meta) # log2_mic, censored, category, atu, source
# Evaluate regression predictions against ground truth
report = mic_regression_report(
y_true=targets["log2_mic"],
y_pred=y_pred_log2,
breakpoints=bp,
species="Klebsiella pneumoniae",
drug="Ceftriaxone",
)
print(report["rmse_log2"], report["essential_agreement"])