Susceptibility Module#

Clinical susceptibility utilities: MIC encoding, breakpoint tables, and R/I/S label encoding. Added in v0.15. The LabelEncoder previously lived in the Evaluation module and was moved here to sit alongside the new MIC tooling; the old import path still works for one release with a DeprecationWarning.

The regression-style evaluation function maldiamrkit.evaluation.mic_regression_report() lives in the Evaluation module alongside the binary AMR metrics it complements.

MIC Encoding#

class maldiamrkit.susceptibility.MICEncoder(breakpoints=None, *, mic_col='MIC', species_col=None, species=None, drug=None, drug_col=None)[source]#

Bases: BaseEstimator, TransformerMixin

Encode MIC strings into log2 numeric values and optional S/I/R labels.

Parameters:

breakpoints (BreakpointTable or None, default=None) – When provided, each MIC is also categorised as S/I/R and flagged for ATU. When None, only log2_mic and censored columns are populated; category / atu / source columns are present but filled with pd.NA.
mic_col (str, default="MIC") – Name of the MIC column in the input DataFrame.
species_col (str or None, default=None) – Name of the species column in the input DataFrame. Required when breakpoints is provided unless species is given as a scalar.
drug (str or None, default=None) – Antibiotic name applied to all rows (single-drug case). Mutually exclusive with drug_col.
drug_col (str or None, default=None) – Name of the drug column in the input DataFrame (multi-drug case). Mutually exclusive with drug.
species (str or None, default=None) – Species applied to all rows (single-species case). Mutually exclusive with species_col.

Notes

The censoring rule treats ≤ / < / ≥ / > qualifiers in the source MIC strings as censored point estimates: the parsed numeric is kept as log2_mic and censored is set to True, so downstream code (e.g. censoring-aware loss functions) can choose how to use them.

Breakpoints#

class maldiamrkit.susceptibility.BreakpointTable(rows, *, guideline='EUCAST', version='', year=None, source=None)[source]#

Bases: object

Clinical breakpoint table for MIC interpretation.

Holds a set of (species, drug) → (s_le, r_gt, [atu_low, atu_high]) rows from a single guideline release (e.g. EUCAST v16.0). Use apply() for single MICs and apply_batch() for arrays; MICEncoder consumes the batch API.

Parameters:

rows (pd.DataFrame) – DataFrame with at least the columns species, drug, s_le, r_gt. Optional columns: atu_low, atu_high.
guideline (str, default="EUCAST") – e.g. "EUCAST".
version (str, default="") – Guideline version, e.g. "16.0".
year (int or None, default=None) – Calendar year the guideline was published.
source (str or None, default=None) – Free-text provenance, e.g. "EUCAST Clinical Breakpoints v16.0 (2026-01-01)".

Raises:

ValueError – If required columns are missing, threshold types are not numeric, or any row violates s_le ≤ r_gt.

Notes

EUCAST’s literal table format is preserved: s_le is the largest MIC classified as S and r_gt is the largest MIC not classified as R. When s_le == r_gt there is no I zone.

__init__(rows, *, guideline='EUCAST', version='', year=None, source=None)[source]#

Parameters:

rows (DataFrame)
guideline (str)
version (str)
year (int | None)
source (str | None)

Return type:

None

property rows: DataFrame#: Return a copy of the underlying breakpoint rows.

species()[source]#

List unique species present in the table.

Return type:: list[str]

drugs()[source]#

List unique drugs present in the table.

Return type:: list[str]

apply(species, drug, mic)[source]#

Categorise a single MIC value against the table.

Parameters:

species (str) – Bacterial species, e.g. "Klebsiella pneumoniae". Matched case-insensitively against the table.
drug (str) – Antibiotic name. Matched case-insensitively.
mic (float or None) – MIC value in mg/L (linear scale, not log2). None / NaN returns a result with category=None.

Returns:

See BreakpointResult.

Return type:

BreakpointResult

apply_batch(species, drug, mic)[source]#

Categorise an array of MIC values.

species and drug may be scalars (broadcast to all rows) or arrays of the same length as mic.

Parameters:

species (str or array-like) – Species per sample, or a single species applied to all.
drug (str or array-like) – Drug per sample, or a single drug applied to all.
mic (array-like) – MIC values in mg/L (linear scale).

Returns:

Columns: category (object, "S"/"I"/"R"/NA), atu (bool), source (object, possibly NA for unmatched rows).

Return type:

pd.DataFrame

classmethod from_yaml(path)[source]#

Load a breakpoint table from a YAML file.

The YAML must have keys guideline, version, optional year and source, and a rows list whose entries carry species, drug, s_le, r_gt and optionally atu_low, atu_high.

Parameters:: path (str | Path)
Return type:: BreakpointTable

classmethod from_version(version)[source]#

Load a bundled EUCAST table by version string, e.g. "16.0".

Parameters:: version (str)
Return type:: BreakpointTable

classmethod from_year(year)[source]#

Load a bundled EUCAST table by calendar year of publication.

EUCAST publishes annually but the version-to-year mapping isn’t a clean function (mid-year dot releases exist). When several bundled versions match the same year, the highest version is returned.

Parameters:: year (int)
Return type:: BreakpointTable

classmethod from_latest()[source]#

Load the highest-numbered bundled EUCAST table.

Return type:: BreakpointTable

classmethod list_available()[source]#

List bundled EUCAST version strings, sorted numerically.

Return type:: list[str]

class maldiamrkit.susceptibility.BreakpointResult(category, atu, source)[source]#

Bases: object

Result of applying a clinical breakpoint to a single MIC value.

Variables:

category ({"S", "I", "R"} or None) – Clinical category. "S" (Susceptible, standard dosing), "I" (Susceptible, increased exposure – modern EUCAST), or "R" (Resistant). None when the lookup failed (no row for this (species, drug), or MIC is NaN).
atu (bool) – True when the MIC value falls in the species/drug ATU range. Orthogonal to category – not a third clinical category.
source (str or None) – Provenance string, e.g. "EUCAST v16.0". None when the lookup failed.

Parameters:

category (str | None)
atu (bool)
source (str | None)

category: str | None#

atu: bool#

source: str | None#

__init__(category, atu, source)#

Parameters:

category (str | None)
atu (bool)
source (str | None)

Return type:

None

Label Encoding#

class maldiamrkit.susceptibility.LabelEncoder(intermediate=IntermediateHandling.susceptible)[source]#

Bases: BaseEstimator, TransformerMixin

Encode R/I/S resistance labels to binary (0/1).

Supports configurable handling of intermediate (I) labels. Accepts both 1-D arrays (single drug) and 2-D DataFrames (multiple drugs).

Parameters:

intermediate (str, default="susceptible") –

How to handle intermediate (“I”) labels:

"susceptible": treat I as susceptible (0) - conservative, avoids false resistance calls.
"resistant": treat I as resistant (1) - stricter, avoids missing resistance.
"drop": remove samples with I labels entirely. Note: this changes the output array length (samples with I labels are excluded) and is not compatible with sklearn pipelines that expect consistent sample counts.
"nan": map I to NaN. Useful for multi-drug encoding where each drug is handled independently. Output dtype is float64 (required to hold NaN).

Variables:

classes (ndarray) – Array of [0, 1] after fitting.

Raises:

ValueError – If intermediate is not one of the accepted values.

__init__(intermediate=IntermediateHandling.susceptible)[source]#

Parameters:: intermediate (str | IntermediateHandling)
Return type:: None

fit(y, **kwargs)[source]#

Fit the encoder (no-op, just sets classes_).

Parameters:

y (array-like) – Labels to learn from (unused beyond validation).
**kwargs (dict) – Additional keyword arguments (unused, accepted for sklearn compatibility).

Return type:

self

transform(y)[source]#

Transform labels to binary.

Parameters:: y (array-like or pd.DataFrame) – String labels (R/I/S or resistant/intermediate/susceptible). If a DataFrame is passed, each column is encoded independently.
Returns:: Binary encoded labels. Returns a DataFrame when the input is a DataFrame (or a 2-D ndarray), preserving column names and index. Returns a 1-D ndarray for 1-D input.
Return type:: ndarray or pd.DataFrame

fit_transform(y, **kwargs)[source]#

Fit the encoder and transform labels in one step.

Parameters:

y (array-like or pd.DataFrame) – String labels (R/I/S or resistant/intermediate/susceptible). If a DataFrame is passed, each column is encoded independently.
**kwargs (dict) – Additional keyword arguments (unused, accepted for sklearn compatibility).

Returns:

Binary encoded labels. Returns a DataFrame when the input is a DataFrame, preserving column names and index.

Return type:

ndarray or pd.DataFrame

class maldiamrkit.susceptibility.IntermediateHandling(value)[source]#

Bases: str, Enum

Strategy for handling intermediate (I) resistance labels.

Variables:

susceptible (str) – Map intermediate to susceptible (0).
resistant (str) – Map intermediate to resistant (1).
drop (str) – Remove intermediate samples.
nan (str) – Map intermediate to NaN.

susceptible = 'susceptible'#

resistant = 'resistant'#

drop = 'drop'#

nan = 'nan'#

Label Encoding Example#

from maldiamrkit.susceptibility import LabelEncoder

enc = LabelEncoder()  # I -> susceptible (default)
y_binary = enc.fit_transform(["R", "S", "I", "R", "S"])
# array([1, 0, 0, 1, 0])

# Treat intermediate as resistant
enc = LabelEncoder(intermediate="resistant")
y_binary = enc.fit_transform(["R", "S", "I"])
# array([1, 0, 1])

# Drop intermediate samples entirely
enc = LabelEncoder(intermediate="drop")
y_binary = enc.fit_transform(["R", "S", "I"])
# array([1, 0])

MIC Encoding Example#

End-to-end: from raw MIC strings to log2(MIC) regression targets and S/I/R category labels, using a bundled EUCAST breakpoint table. The regression evaluator (maldiamrkit.evaluation.mic_regression_report()) is imported from the Evaluation module.

from maldiamrkit.susceptibility import BreakpointTable, MICEncoder
from maldiamrkit.evaluation import mic_regression_report

# Load the latest bundled EUCAST table
bp = BreakpointTable.from_latest()

enc = MICEncoder(
    breakpoints=bp,
    species_col="Species",
    drug="Ceftriaxone",
)
targets = enc.fit_transform(meta)  # log2_mic, censored, category, atu, source

# Evaluate regression predictions against ground truth
report = mic_regression_report(
    y_true=targets["log2_mic"],
    y_pred=y_pred_log2,
    breakpoints=bp,
    species="Klebsiella pneumoniae",
    drug="Ceftriaxone",
)
print(report["rmse_log2"], report["essential_agreement"])