Susceptibility Module#

Clinical susceptibility utilities: MIC encoding, breakpoint tables, and R/I/S label encoding. Added in v0.15. The LabelEncoder previously lived in the Evaluation module and was moved here to sit alongside the new MIC tooling; the old import path still works for one release with a DeprecationWarning.

The regression-style evaluation function maldiamrkit.evaluation.mic_regression_report() lives in the Evaluation module alongside the binary AMR metrics it complements.

MIC Encoding#

class maldiamrkit.susceptibility.MICEncoder(breakpoints=None, *, mic_col='MIC', species_col=None, species=None, drug=None, drug_col=None)[source]#

Bases: BaseEstimator, TransformerMixin

Encode MIC strings into log2 numeric values and optional S/I/R labels.

Parameters:
  • breakpoints (BreakpointTable or None, default=None) – When provided, each MIC is also categorised as S/I/R and flagged for ATU. When None, only log2_mic and censored columns are populated; category / atu / source columns are present but filled with pd.NA.

  • mic_col (str, default="MIC") – Name of the MIC column in the input DataFrame.

  • species_col (str or None, default=None) – Name of the species column in the input DataFrame. Required when breakpoints is provided unless species is given as a scalar.

  • drug (str or None, default=None) – Antibiotic name applied to all rows (single-drug case). Mutually exclusive with drug_col.

  • drug_col (str or None, default=None) – Name of the drug column in the input DataFrame (multi-drug case). Mutually exclusive with drug.

  • species (str or None, default=None) – Species applied to all rows (single-species case). Mutually exclusive with species_col.

Notes

The censoring rule treats / < / / > qualifiers in the source MIC strings as censored point estimates: the parsed numeric is kept as log2_mic and censored is set to True, so downstream code (e.g. censoring-aware loss functions) can choose how to use them.

See also

BreakpointTable

Clinical breakpoint lookup consumed by this encoder.

maldiamrkit.io.parse_mic_column

Underlying MIC string parser.

__init__(breakpoints=None, *, mic_col='MIC', species_col=None, species=None, drug=None, drug_col=None)[source]#
Parameters:
Return type:

None

fit(X, y=None, **kwargs)[source]#

Validate configuration (no statistics learned).

Parameters:
  • X (pd.DataFrame) – Input frame with at least mic_col. Other required columns depend on the chosen species/drug configuration.

  • y (ignored) – Present for sklearn API compatibility.

  • **kwargs – Ignored.

Return type:

self

transform(X)[source]#

Encode MIC strings.

Parameters:

X (pd.DataFrame) – Input frame with mic_col.

Returns:

Columns log2_mic, censored, category, atu, source indexed like X.

Return type:

pd.DataFrame

fit_transform(X, y=None, **kwargs)[source]#

Fit then transform in one step.

Parameters:

X (DataFrame)

Return type:

DataFrame

get_feature_names_out(input_features=None)[source]#

Return output column names for sklearn pipelines.

Parameters:

input_features (Optional[Iterable[str]])

Return type:

ndarray

Breakpoints#

class maldiamrkit.susceptibility.BreakpointTable(rows, *, guideline='EUCAST', version='', year=None, source=None)[source]#

Bases: object

Clinical breakpoint table for MIC interpretation.

Holds a set of (species, drug) (s_le, r_gt, [atu_low, atu_high]) rows from a single guideline release (e.g. EUCAST v16.0). Use apply() for single MICs and apply_batch() for arrays; MICEncoder consumes the batch API.

Parameters:
  • rows (pd.DataFrame) – DataFrame with at least the columns species, drug, s_le, r_gt. Optional columns: atu_low, atu_high.

  • guideline (str, default="EUCAST") – e.g. "EUCAST".

  • version (str, default="") – Guideline version, e.g. "16.0".

  • year (int or None, default=None) – Calendar year the guideline was published.

  • source (str or None, default=None) – Free-text provenance, e.g. "EUCAST Clinical Breakpoints v16.0 (2026-01-01)".

Raises:

ValueError – If required columns are missing, threshold types are not numeric, or any row violates s_le r_gt.

Notes

EUCAST’s literal table format is preserved: s_le is the largest MIC classified as S and r_gt is the largest MIC not classified as R. When s_le == r_gt there is no I zone.

__init__(rows, *, guideline='EUCAST', version='', year=None, source=None)[source]#
Parameters:
Return type:

None

property rows: DataFrame#

Return a copy of the underlying breakpoint rows.

species()[source]#

List unique species present in the table.

Return type:

list[str]

drugs()[source]#

List unique drugs present in the table.

Return type:

list[str]

apply(species, drug, mic)[source]#

Categorise a single MIC value against the table.

Parameters:
  • species (str) – Bacterial species, e.g. "Klebsiella pneumoniae". Matched case-insensitively against the table.

  • drug (str) – Antibiotic name. Matched case-insensitively.

  • mic (float or None) – MIC value in mg/L (linear scale, not log2). None / NaN returns a result with category=None.

Returns:

See BreakpointResult.

Return type:

BreakpointResult

apply_batch(species, drug, mic)[source]#

Categorise an array of MIC values.

species and drug may be scalars (broadcast to all rows) or arrays of the same length as mic.

Parameters:
  • species (str or array-like) – Species per sample, or a single species applied to all.

  • drug (str or array-like) – Drug per sample, or a single drug applied to all.

  • mic (array-like) – MIC values in mg/L (linear scale).

Returns:

Columns: category (object, "S"/"I"/"R"/NA), atu (bool), source (object, possibly NA for unmatched rows).

Return type:

pd.DataFrame

classmethod from_yaml(path)[source]#

Load a breakpoint table from a YAML file.

The YAML must have keys guideline, version, optional year and source, and a rows list whose entries carry species, drug, s_le, r_gt and optionally atu_low, atu_high.

Parameters:

path (str | Path)

Return type:

BreakpointTable

classmethod from_version(version)[source]#

Load a bundled EUCAST table by version string, e.g. "16.0".

Parameters:

version (str)

Return type:

BreakpointTable

classmethod from_year(year)[source]#

Load a bundled EUCAST table by calendar year of publication.

EUCAST publishes annually but the version-to-year mapping isn’t a clean function (mid-year dot releases exist). When several bundled versions match the same year, the highest version is returned.

Parameters:

year (int)

Return type:

BreakpointTable

classmethod from_latest()[source]#

Load the highest-numbered bundled EUCAST table.

Return type:

BreakpointTable

classmethod list_available()[source]#

List bundled EUCAST version strings, sorted numerically.

Return type:

list[str]

class maldiamrkit.susceptibility.BreakpointResult(category, atu, source)[source]#

Bases: object

Result of applying a clinical breakpoint to a single MIC value.

Variables:
  • category ({"S", "I", "R"} or None) – Clinical category. "S" (Susceptible, standard dosing), "I" (Susceptible, increased exposure – modern EUCAST), or "R" (Resistant). None when the lookup failed (no row for this (species, drug), or MIC is NaN).

  • atu (bool) – True when the MIC value falls in the species/drug ATU range. Orthogonal to category – not a third clinical category.

  • source (str or None) – Provenance string, e.g. "EUCAST v16.0". None when the lookup failed.

Parameters:
category: str | None#
atu: bool#
source: str | None#
__init__(category, atu, source)#
Parameters:
Return type:

None

Label Encoding#

class maldiamrkit.susceptibility.LabelEncoder(intermediate=IntermediateHandling.susceptible)[source]#

Bases: BaseEstimator, TransformerMixin

Encode R/I/S resistance labels to binary (0/1).

Supports configurable handling of intermediate (I) labels. Accepts both 1-D arrays (single drug) and 2-D DataFrames (multiple drugs).

Parameters:

intermediate (str, default="susceptible") –

How to handle intermediate (“I”) labels:

  • "susceptible": treat I as susceptible (0) - conservative, avoids false resistance calls.

  • "resistant": treat I as resistant (1) - stricter, avoids missing resistance.

  • "drop": remove samples with I labels entirely. Note: this changes the output array length (samples with I labels are excluded) and is not compatible with sklearn pipelines that expect consistent sample counts.

  • "nan": map I to NaN. Useful for multi-drug encoding where each drug is handled independently. Output dtype is float64 (required to hold NaN).

Variables:

classes (ndarray) – Array of [0, 1] after fitting.

Raises:

ValueError – If intermediate is not one of the accepted values.

__init__(intermediate=IntermediateHandling.susceptible)[source]#
Parameters:

intermediate (str | IntermediateHandling)

Return type:

None

fit(y, **kwargs)[source]#

Fit the encoder (no-op, just sets classes_).

Parameters:
  • y (array-like) – Labels to learn from (unused beyond validation).

  • **kwargs (dict) – Additional keyword arguments (unused, accepted for sklearn compatibility).

Return type:

self

transform(y)[source]#

Transform labels to binary.

Parameters:

y (array-like or pd.DataFrame) – String labels (R/I/S or resistant/intermediate/susceptible). If a DataFrame is passed, each column is encoded independently.

Returns:

Binary encoded labels. Returns a DataFrame when the input is a DataFrame (or a 2-D ndarray), preserving column names and index. Returns a 1-D ndarray for 1-D input.

Return type:

ndarray or pd.DataFrame

fit_transform(y, **kwargs)[source]#

Fit the encoder and transform labels in one step.

Parameters:
  • y (array-like or pd.DataFrame) – String labels (R/I/S or resistant/intermediate/susceptible). If a DataFrame is passed, each column is encoded independently.

  • **kwargs (dict) – Additional keyword arguments (unused, accepted for sklearn compatibility).

Returns:

Binary encoded labels. Returns a DataFrame when the input is a DataFrame, preserving column names and index.

Return type:

ndarray or pd.DataFrame

class maldiamrkit.susceptibility.IntermediateHandling(value)[source]#

Bases: str, Enum

Strategy for handling intermediate (I) resistance labels.

Variables:
  • susceptible (str) – Map intermediate to susceptible (0).

  • resistant (str) – Map intermediate to resistant (1).

  • drop (str) – Remove intermediate samples.

  • nan (str) – Map intermediate to NaN.

susceptible = 'susceptible'#
resistant = 'resistant'#
drop = 'drop'#
nan = 'nan'#

Label Encoding Example#

from maldiamrkit.susceptibility import LabelEncoder

enc = LabelEncoder()  # I -> susceptible (default)
y_binary = enc.fit_transform(["R", "S", "I", "R", "S"])
# array([1, 0, 0, 1, 0])

# Treat intermediate as resistant
enc = LabelEncoder(intermediate="resistant")
y_binary = enc.fit_transform(["R", "S", "I"])
# array([1, 0, 1])

# Drop intermediate samples entirely
enc = LabelEncoder(intermediate="drop")
y_binary = enc.fit_transform(["R", "S", "I"])
# array([1, 0])

MIC Encoding Example#

End-to-end: from raw MIC strings to log2(MIC) regression targets and S/I/R category labels, using a bundled EUCAST breakpoint table. The regression evaluator (maldiamrkit.evaluation.mic_regression_report()) is imported from the Evaluation module.

from maldiamrkit.susceptibility import BreakpointTable, MICEncoder
from maldiamrkit.evaluation import mic_regression_report

# Load the latest bundled EUCAST table
bp = BreakpointTable.from_latest()

enc = MICEncoder(
    breakpoints=bp,
    species_col="Species",
    drug="Ceftriaxone",
)
targets = enc.fit_transform(meta)  # log2_mic, censored, category, atu, source

# Evaluate regression predictions against ground truth
report = mic_regression_report(
    y_true=targets["log2_mic"],
    y_pred=y_pred_log2,
    breakpoints=bp,
    species="Klebsiella pneumoniae",
    drug="Ceftriaxone",
)
print(report["rmse_log2"], report["essential_agreement"])