Skip to main content

OpenProteo

OpenProteo is a pure-Rust mass-spectrometry I/O stack. It reads native vendor acquisitions from Thermo Fisher, Bruker, and Waters instruments and emits standards-compliant mzML, Arrow record batches, or native Rust / Python data structures - without any vendor SDK, runtime, or binary blob.

What is in the box

ComponentPurpose
openproteo-coreShared schema, mzML writer, conformance harness, Arrow.
openproteo-ioUmbrella crate: auto-detects vendor format and dispatches.
openproteo-io-clivendor2mzml binary: one-shot conversion + introspection.
openproteo-io-pyPyO3 bindings (openproteo-io on PyPI).
opentfrawThermo Finnigan .raw reader (Rust 2021, MSRV 1.75).
opentimstdfBruker .d/ (TDF) reader.
openwrawWaters MassLynx .raw/ reader.

Design goals

  1. Pure Rust, no vendor SDK. No Thermo .NET assemblies, no Bruker shared library, no MassLynx COM server. The reader stack is fully forbidden from unsafe_code.
  2. mzML byte-stability. The same input on the same OpenProteo release always produces byte-identical mzML. Conformance tests pin this against the PSI-MS controlled vocabulary.
  3. One schema across vendors. Every reader yields the same SpectrumRecord shape; Arrow batches share a single schema across Thermo / Bruker / Waters.
  4. Streaming where possible. Spectra are produced as an iterator; the mzML writer never buffers the full run.

Where to start