Detection Module#
Peak detection algorithms and transformers.
MaldiPeakDetector supports parallel processing via the n_jobs parameter.
Use n_jobs=-1 to utilize all available CPU cores.
MaldiPeakDetector#
- class maldiamrkit.detection.MaldiPeakDetector(method=PeakMethod.local, binary=True, persistence_threshold=1e-06, n_jobs=1, prominence=None, height=None, distance=None, width=None, **kwargs)[source]#
Bases:
BaseEstimator,TransformerMixinPeak detector for MALDI-TOF spectra with local maxima and topological methods.
The transformer maintains the original feature dimension; all non-peak positions are set to 0. Peaks can be returned as binary flags or with their original intensities.
- Parameters:
method ({"local", "ph"}, default="local") – Detection method to use: - “local” : Local maxima detection using scipy.signal.find_peaks - “ph” : Persistent homology based detection using gudhi
binary (bool, default=True) – If True, peaks are marked with 1; otherwise, original intensity is kept.
persistence_threshold (float, default=1e-6) – Minimum persistence (death - birth) required for a peak when using method=”ph”. For normalized spectra (sum=1), typical values are 1e-6 to 1e-4. Higher values detect fewer, more prominent peaks.
n_jobs (int, default=1) – Number of parallel jobs for peak detection. Use -1 for all available cores, 1 for sequential processing. Parallelization is particularly beneficial for the “ph” method which is CPU-intensive.
prominence (float or None, default=None) – Minimum prominence of peaks (recommended: 1e-5 to 1e-2). Passed to
scipy.signal.find_peaks()whenmethod="local".height (float or None, default=None) – Minimum height of peaks. Passed to
scipy.signal.find_peaks()whenmethod="local".distance (int or None, default=None) – Minimum distance between peaks in bins. Passed to
scipy.signal.find_peaks()whenmethod="local".width (float or None, default=None) – Minimum width of peaks. Passed to
scipy.signal.find_peaks()whenmethod="local".**kwargs – Additional keyword arguments passed to
scipy.signal.find_peaks()whenmethod="local".
Notes
For MALDI-TOF spectra normalized to sum=1: - prominence=1e-5 to 1e-3 typically works well for local maxima - persistence_threshold=1e-6 to 1e-4 for persistent homology
- Raises:
ValueError – If
methodis not one of ‘local’ or ‘ph’.- Parameters:
Examples
>>> # Local maxima detection with prominence filter >>> detector = MaldiPeakDetector(method="local", prominence=0.01) >>> peaks = detector.fit_transform(spectra_df)
>>> # Persistent homology based detection >>> detector = MaldiPeakDetector(method="ph", persistence_threshold=1e-6) >>> peaks = detector.fit_transform(spectra_df)
- __init__(method=PeakMethod.local, binary=True, persistence_threshold=1e-06, n_jobs=1, prominence=None, height=None, distance=None, width=None, **kwargs)[source]#
- fit(X, y=None)[source]#
Fit the peak detector (no learning performed).
- Parameters:
X (pd.DataFrame) – Input spectra with shape (n_samples, n_bins).
y (array-like, optional) – Target values (ignored).
- Returns:
self – Fitted transformer.
- Return type:
- Raises:
ValueError – If the input DataFrame is empty.
- transform(X)[source]#
Detect peaks in each spectrum and mask non-peak positions.
- Parameters:
X (pd.DataFrame or pd.Series) – Input spectra with shape (n_samples, n_bins).
- Returns:
X_peaks – Transformed spectra where non-peak positions are set to 0. Peak positions contain 1 (if binary=True) or original intensity.
- Return type:
pd.DataFrame or pd.Series
- fit_transform(X, y=None, **fit_params)[source]#
Fit and transform in one step.
- Parameters:
X (pd.DataFrame or pd.Series) – Input spectra with shape (n_samples, n_bins).
y (array-like, optional) – Target values (ignored).
**fit_params – Additional fit parameters (unused).
- Returns:
X_peaks – Transformed spectra with detected peaks.
- Return type:
pd.DataFrame or pd.Series
- get_peak_statistics(X)[source]#
Get statistics about detected peaks for each spectrum.
- Parameters:
X (pd.DataFrame or pd.Series) – Input spectra with shape (n_samples, n_bins).
- Returns:
stats – DataFrame with columns: - n_peaks: number of peaks detected - mean_intensity: mean intensity of detected peaks - max_intensity: maximum intensity of detected peaks
- Return type:
pd.DataFrame
Peak Detection Methods#
Parallel Processing Example#
from maldiamrkit.detection import MaldiPeakDetector
# Parallel peak detection
detector = MaldiPeakDetector(
method="local",
prominence=0.01,
n_jobs=-1 # use all cores
)
peaks = detector.fit_transform(X)