{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MaldiAMRKit - Peak Detection\n", "\n", "This notebook covers peak detection methods including local maxima and persistent homology." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import Libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "from sklearn.linear_model import LogisticRegression\nfrom sklearn.model_selection import StratifiedKFold, cross_val_score\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nfrom maldiamrkit import MaldiSet\nfrom maldiamrkit.detection import MaldiPeakDetector\nfrom maldiamrkit.visualization import plot_peaks" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Dataset" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Features shape: (29, 6000)\n", "Labels shape: (29,)\n" ] } ], "source": [ "data = MaldiSet.from_directory(\n", " \"../data/\",\n", " \"../data/metadata/metadata.csv\",\n", " aggregate_by=dict(antibiotics=\"Drug\"),\n", ")\n", "X = data.X\n", "y = data.y[\"Drug\"].map({\"S\": 0, \"I\": 1, \"R\": 1})\n", "\n", "print(f\"Features shape: {X.shape}\")\n", "print(f\"Labels shape: {y.shape}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Peak Detection with Local Maxima\n", "\n", "The `MaldiPeakDetector` uses local maxima detection by default. It's fast and works well for most cases." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CV ROC AUC: 0.417 +/- 0.260\n" ] } ], "source": [ "cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)\n", "\n", "pipe = Pipeline(\n", " [\n", " (\"peaks\", MaldiPeakDetector(binary=False, prominence=1e-8)),\n", " (\"scaler\", StandardScaler()),\n", " (\"clf\", LogisticRegression()),\n", " ]\n", ")\n", "\n", "scores = cross_val_score(pipe, X, y, cv=cv, scoring=\"roc_auc\")\n", "print(f\"CV ROC AUC: {scores.mean():.3f} +/- {scores.std():.3f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Peak Detection with Persistent Homology\n", "\n", "Persistent homology (`method=\"ph\"`) is a topological approach that can better handle noise. It's slower but often more robust.\n", "\n", "**Note:** Requires the `gudhi` package." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | n_peaks | \n", "mean_intensity | \n", "max_intensity | \n", "
|---|---|---|---|
| 10s | \n", "40 | \n", "0.001594 | \n", "0.005164 | \n", "
| 11s | \n", "43 | \n", "0.001695 | \n", "0.006086 | \n", "
| 12s | \n", "48 | \n", "0.001844 | \n", "0.007017 | \n", "
| 13s | \n", "56 | \n", "0.001946 | \n", "0.007615 | \n", "
| 14s | \n", "49 | \n", "0.001784 | \n", "0.005047 | \n", "