Detection Module#

Peak detection algorithms and transformers.

MaldiPeakDetector supports parallel processing via the n_jobs parameter. Use n_jobs=-1 to utilize all available CPU cores.

MaldiPeakDetector#

class maldiamrkit.detection.MaldiPeakDetector(method=PeakMethod.local, binary=True, persistence_threshold=1e-06, n_jobs=1, prominence=None, height=None, distance=None, width=None, **kwargs)[source]#

Bases: BaseEstimator, TransformerMixin

Peak detector for MALDI-TOF spectra with local maxima and topological methods.

The transformer maintains the original feature dimension; all non-peak positions are set to 0. Peaks can be returned as binary flags or with their original intensities.

Parameters:
  • method ({"local", "ph"}, default="local") – Detection method to use: - “local” : Local maxima detection using scipy.signal.find_peaks - “ph” : Persistent homology based detection using gudhi

  • binary (bool, default=True) – If True, peaks are marked with 1; otherwise, original intensity is kept.

  • persistence_threshold (float, default=1e-6) – Minimum persistence (death - birth) required for a peak when using method=”ph”. For normalized spectra (sum=1), typical values are 1e-6 to 1e-4. Higher values detect fewer, more prominent peaks.

  • n_jobs (int, default=1) – Number of parallel jobs for peak detection. Use -1 for all available cores, 1 for sequential processing. Parallelization is particularly beneficial for the “ph” method which is CPU-intensive.

  • prominence (float or None, default=None) – Minimum prominence of peaks (recommended: 1e-5 to 1e-2). Passed to scipy.signal.find_peaks() when method="local".

  • height (float or None, default=None) – Minimum height of peaks. Passed to scipy.signal.find_peaks() when method="local".

  • distance (int or None, default=None) – Minimum distance between peaks in bins. Passed to scipy.signal.find_peaks() when method="local".

  • width (float or None, default=None) – Minimum width of peaks. Passed to scipy.signal.find_peaks() when method="local".

  • **kwargs – Additional keyword arguments passed to scipy.signal.find_peaks() when method="local".

Notes

For MALDI-TOF spectra normalized to sum=1: - prominence=1e-5 to 1e-3 typically works well for local maxima - persistence_threshold=1e-6 to 1e-4 for persistent homology

Raises:

ValueError – If method is not one of ‘local’ or ‘ph’.

Parameters:

Examples

>>> # Local maxima detection with prominence filter
>>> detector = MaldiPeakDetector(method="local", prominence=0.01)
>>> peaks = detector.fit_transform(spectra_df)
>>> # Persistent homology based detection
>>> detector = MaldiPeakDetector(method="ph", persistence_threshold=1e-6)
>>> peaks = detector.fit_transform(spectra_df)
__init__(method=PeakMethod.local, binary=True, persistence_threshold=1e-06, n_jobs=1, prominence=None, height=None, distance=None, width=None, **kwargs)[source]#
Parameters:
Return type:

None

fit(X, y=None)[source]#

Fit the peak detector (no learning performed).

Parameters:
  • X (pd.DataFrame) – Input spectra with shape (n_samples, n_bins).

  • y (array-like, optional) – Target values (ignored).

Returns:

self – Fitted transformer.

Return type:

MaldiPeakDetector

Raises:

ValueError – If the input DataFrame is empty.

transform(X)[source]#

Detect peaks in each spectrum and mask non-peak positions.

Parameters:

X (pd.DataFrame or pd.Series) – Input spectra with shape (n_samples, n_bins).

Returns:

X_peaks – Transformed spectra where non-peak positions are set to 0. Peak positions contain 1 (if binary=True) or original intensity.

Return type:

pd.DataFrame or pd.Series

fit_transform(X, y=None, **fit_params)[source]#

Fit and transform in one step.

Parameters:
  • X (pd.DataFrame or pd.Series) – Input spectra with shape (n_samples, n_bins).

  • y (array-like, optional) – Target values (ignored).

  • **fit_params – Additional fit parameters (unused).

Returns:

X_peaks – Transformed spectra with detected peaks.

Return type:

pd.DataFrame or pd.Series

get_peak_statistics(X)[source]#

Get statistics about detected peaks for each spectrum.

Parameters:

X (pd.DataFrame or pd.Series) – Input spectra with shape (n_samples, n_bins).

Returns:

stats – DataFrame with columns: - n_peaks: number of peaks detected - mean_intensity: mean intensity of detected peaks - max_intensity: maximum intensity of detected peaks

Return type:

pd.DataFrame

Peak Detection Methods#

class maldiamrkit.detection.PeakMethod(value)[source]#

Bases: str, Enum

Supported peak detection methods.

Variables:
  • local (str) – Local maxima detection via scipy.signal.find_peaks.

  • ph (str) – Persistent homology based peak detection.

local = 'local'#
ph = 'ph'#

Parallel Processing Example#

from maldiamrkit.detection import MaldiPeakDetector

# Parallel peak detection
detector = MaldiPeakDetector(
    method="local",
    prominence=0.01,
    n_jobs=-1  # use all cores
)
peaks = detector.fit_transform(X)