Skip to main content

openproteo-core

openproteo-core is the shared Rust foundation every vendor parser in the stack builds on. It defines the vendor-neutral records, the SpectrumSource trait, the canonical mzML 1.1.0 writer, an optional Apache Arrow bridge, and the cross-vendor conformance harness.

Install

[dependencies]
openproteo-core = "0.1"

# Optional: zero-copy Arrow RecordBatch builder.
openproteo-core = { version = "0.1", features = ["arrow"] }

The SpectrumSource trait

Every vendor parser (opentfraw, opentimstdf, openwraw) implements this trait. Anything downstream of a parser - the canonical mzML writer, the Arrow batch builder, the conformance harness, the openproteo-io umbrella, the vendor2mzml CLI - operates against &mut dyn SpectrumSource.

use openproteo_core::{RunMetadata, SpectrumRecord, ChromatogramRecord};

pub trait SpectrumSource {
fn run_metadata(&self) -> RunMetadata;
fn iter_spectra<'a>(&'a mut self)
-> Box<dyn Iterator<Item = SpectrumRecord> + 'a>;
fn iter_chromatograms<'a>(&'a mut self)
-> Box<dyn Iterator<Item = ChromatogramRecord> + 'a> {
Box::new(std::iter::empty())
}
fn spectrum_count(&self) -> Option<usize> { None }
}

Boxed iterators (rather than RPITIT) keep the trait dyn-compatible so the rest of the stack can hold &mut dyn SpectrumSource.

Public API

SymbolModulePurpose
SpectrumRecordtypesDecoded spectrum: id, ms level, polarity, rt, peaks, precursor.
PrecursorInfotypesSelected / isolated precursor, charge, activation, scan window.
ChromatogramRecordtypesTIC / BPC / SRM trace.
RunMetadatatypesRun-level CV terms: instrument, source format, native id format.
CvTermtypesA PSI-MS controlled-vocabulary term.
Polarity, Analyzer, ScanMode, MsPower, ActivationenumsStandard enumerations.
MobilityArrayKindenumsPer-peak inverse-mobility / drift-time array kind.
SpectrumSourcesourceTrait every parser implements.
write_mzmlmzmlStream a SpectrumSource to a plain mzML 1.1.0 document.
write_indexed_mzmlmzmlSame, with <indexList> + SHA-1 footer for byte-offset indexing.
conformance::assert_source_invariantsconformanceCheck a live SpectrumSource for cross-vendor invariants.
conformance::assert_iter_invariantsconformanceSame, but from any IntoIterator<Item = SpectrumRecord>.
arrow::SpectrumBatchBuilderarrow (feat)Zero-copy builder for arrow_array::RecordBatch from a spectrum stream.
arrow::spectrum_record_schemaarrow (feat)The canonical Arrow schema.
ErrorerrorAggregate thiserror-based error type.

mzML writer

use openproteo_core::{write_indexed_mzml, SpectrumSource};

fn export<S: SpectrumSource>(mut src: S, path: &std::path::Path)
-> std::io::Result<()>
{
let mut out = std::fs::File::create(path)?;
write_indexed_mzml(&mut src, &mut out).map_err(std::io::Error::other)?;
Ok(())
}

write_indexed_mzml emits a standards-compliant <indexList> plus a SHA-1 footer so downstream tools (mzML2HDF, ProteoWizard msconvert, MzIdentML builders, ...) can index-jump into the file.

Conformance harness

The harness enforces the cross-vendor invariants every parser must satisfy, surfaced as structured ConformanceError variants (PeakArrayLengthMismatch, MobilityArrayLengthMismatch, RetentionTimeNonMonotonic, MissingPrecursor, IndexSequence, EmptySpectrum, ...).

use openproteo_core::conformance::assert_iter_invariants;

let count = assert_iter_invariants(records)?;
println!("validated {count} spectra");
# Ok::<(), openproteo_core::conformance::ConformanceError>(())

The vendor2mzml validate subcommand runs this harness on any vendor input or pre-existing mzML.

Arrow bridge (feature: arrow)

# #[cfg(feature = "arrow")]
# fn _doc() -> arrow_array::RecordBatch {
use openproteo_core::arrow::SpectrumBatchBuilder;

let mut b = SpectrumBatchBuilder::new();
for s in /* SpectrumSource iter_spectra */ std::iter::empty() {
b.push(&s);
}
let batch = b.finish();
# batch
# }

The canonical schema is documented in Arrow schema.

Feature flags

FlagDefaultEffect
arrownoEnables arrow_array::RecordBatch building from spectra.

Where it sits in the stack

openproteo-core (this crate)
^
+------------+------------+
| | |
opentfraw opentimstdf openwraw (vendor parsers)
| | |
+------------+------------+
v
openproteo-io (umbrella: detect_format, collect, to_mzml)
|
+------------+------------+
| |
vendor2mzml CLI openproteo (Python metapackage)