TIMS-Bench: Towards community standards for benchmarking untargeted trapped ion mobility metabolomics tools and datasets
This repository contains code and data described in detail in our paper (Rajkumar et al., 2026).
If you have found our manuscript useful in your work, please consider citing:
- Python >= 3.13.5
- UV for environment management
- A Linux machine is recommended for running the DreaMS embeddings (see note in How to run)
Datasets are publicly available and can be directly downloaded from Zenodo (DOI: TBD).
Unzip the downloaded files and place them under the data/ directory as described in Repository structure.
.
├── benchmarking/ # Python package with shared utilities
│ ├── harmonizer/ # Tool-specific parsers (MetaboScape, MS-DIAL, MZmine)
│ ├── metrics/ # Benchmarking metrics (base metrics, clique analysis)
│ ├── similarity/ # Spectral similarity methods (cosine, entropy, DreaMS)
│ ├── constants.py
│ ├── loader.py
│ ├── plots.py
│ └── utils.py
├── data/ # Downloaded from Zenodo (not tracked by git)
│ ├── groundtruth_dataset/ # MSV000098263, plant_spikein, nist_srm
│ ├── library_spectra/ # Reference library files (.parquet, .pq)
│ └── public_dataset/ # Eg. MSV000084402, MSV000090327, ..
├── figures/ # Output figures for the manuscript
├── notebooks/ # Analysis notebooks — run in numbered order
│ ├── 01a_harmonization.ipynb
│ ├── 01b_annotations.ipynb
│ ├── 01b_annotations.py
│ ├── 01c_dataset_qc.ipynb
│ ├── 02_tolerance_selection.ipynb
│ ├── 03_base_metrics.ipynb
│ ├── 03_groundtruth_metrics.ipynb
│ ├── 04a_reframe_based_metrics.ipynb
│ ├── 04b_reframe_css_evaluation.ipynb
│ ├── 04c_reframe_mirror_plots.ipynb
│ ├── 05_nist_srm_based_metrics.ipynb
│ ├── 06_plant_spikein_base_metrics.ipynb
│ └── 06_plant_spikein_overlap.ipynb
└── pyproject.toml
Each dataset folder under groundtruth_dataset/ and public_dataset/ follows the same layout:
{dataset}/
├── raw/ # Original tool exports (MetaboScape, MS-DIAL, MZmine)
├── harmonized/ # Unified parquet files per tool
├── annotated_cosine_similarity/
├── annotated_spectral_entropy/
├── annotated_dreams_similarity/
└── embeddings/ # DreaMS embedding files (.npz)
The library spectra folder looks like this:
.
├── all_sorted_library_spectra.parquet
├── all_sorted_library_spectra.npz # DreaMS embedding files (.npz)
├── nist_srm_spikein_lib.pq
├── plant_spikein_lib.pq
├── reframe_ms2s_with_ccs.parquet
└── reframe_spikein_lib.pq
- Clone the repository:
git clone https://github.com/enveda/benchmarking-untargeted-metabolomics-software.git
cd benchmarking-untargeted-metabolomics-software-
Prepare the
data/directory as described in the Data and repository section section. -
Install dependencies using UV:
uv sync
- Run the notebooks in numbered order. Select the UV virtual environment as the kernel, or launch Jupyter directly:
uv run jupyter notebook
For standalone Python scripts (used only for running DreaMS matching):
uv run python notebooks/01b_annotations.py
NOTE: The DreaMS embeddings and matching were run independently on a Linux server. Ensure you have the correct environment configuration as per their GitHub.
- 01a_harmonization - Code to read the raw output files of the tool and generate feature tables for analysis.
- 01b_annotations - Annotates harmonized feature tables using multiple similarity approaches (Spectral Entropy and Cosine) against a spectral library.
- 01b_annotations.py - Python script used to run DreaMS similarity search. Works only on Linux environment.
- 01c_dataset_qc - Merges and performs quality control on feature tables from public and internal datasets across multiple tools with configurable similarity thresholds.
- 02_tolerance_selection - Identifies optimal MS1 and MS2 tolerance parameters by testing varied tolerance values on the ReFRAME library dataset and comparing annotation results.
- 03a_base_metrics - Computes and visualizes base performance metrics across 10 public metabolomics datasets, comparing detection and annotation performance across analysis tools.
- 03b_groundtruth_metrics - Calculates and visualizes base metrics across three ground-truth datasets (ReFRAME, NIST SRM, plant spike-in) with radar plots comparing tool performance.
- 04a_reframe_based_metrics - Analyzes ReFRAME spike-in library performance using precision-recall curves, F1 scores, and CCS error distributions across different similarity thresholds and annotation methods.
- 04b_reframe_css_evaluation - Evaluates CCS-based discrimination of structural isomers from the ReFRAME library using relative CCS differences and ion mobility separation thresholds.
- 04c_reframe_mirror_plots - Generates spectral mirror plots comparing experimental MS2 spectra against ReFRAME library reference spectra to visually validate annotations.
- 05_nist_srm_based_metrics - Computes precision-recall curves and R2 distributions for the NIST SRM spike-in dataset to evaluate annotation accuracy and correlation with expected concentrations.
- 06a_plant_spikein_base_metrics - Analyzes plant spike-in dataset performance using precision-recall metrics, R2 distributions, and concentration-dependent recovery curves across analysis tools.
- 06b_plant_spikein_overlap - Visualizes compound detection overlap across analysis tools at different spike-in concentrations using Venn diagrams and identifies compounds detected at all concentration levels.