Skip to main content

Waters RAW Format - Overview

The Waters MassLynx RAW format is a directory-based vendor format used by Waters LC-MS instruments including the Synapt, Xevo, ACQUITY, and MALDI HDMS product lines.

Each acquisition produces a .raw directory (not a single file) containing a set of binary and plain-text files that together describe the instrument method, calibration state, and all acquired spectra.

Files Present in a Typical .raw Directory

FilenameTypeStatusDescription
_HEADER.TXTASCIIFully knownRun metadata, calibration polynomials
_FUNCTNS.INFBinaryFully knownFunction table: one 416-byte record per MS function
_FUNCnnn.IDXBinaryFully knownScan index (DAT offsets, RT, housekeeping)
_FUNCnnn.DATBinaryFully knownPacked spectrum data (3 encodings; see below)
_FUNCnnn.STSBinaryFully decodedPer-scan instrument statistics (voltages, TIC, push count)
_CHROMS.INFBinaryFully decodedLC channel descriptor table
_CHROnnnn.DATBinaryFully decodedLC channel time-series data (f32 RT + f32 value)
_extern.infASCIIFully knownInstrument geometry constants (Lteff, Veff, pusher period)
_INLET.INFASCII textFully knownACE inlet method record (LC runs only)
_HISTORY.INFBinaryPartially decodedWaters PT with 0 descriptors; data opaque
_PROCnnn.DAT/IDX/STSBinaryPartially decodedPost-processed IMS-MS peak data (IMS runs only)
APEXnnnD.BINBinaryContainer decodedMulti-section binary; ASCII command-line params + binary peak data (PeakEx not decoded)
APEXnnnDIONS.CSVCSVFully knownApex3D ion list: m/z, RT, intensity, drift time per detected 3D peak

Files without a number suffix appear once per .raw directory. Files with nnn are numbered 001-099, one per MS function.

Function Concept

A "function" in Waters terminology is a discrete acquisition channel. A typical experiment structure:

Experiment typeFunctions
MS survey onlyFunction 1 = MS1
DDA (auto-MS/MS)Function 1 = survey, Functions 2-N = triggered MS/MS
IMS-MS (HDMS)Function 1 = IMS-MS, Function 2 = reference/lock-mass
Lock-mass referenceLast function = calibrant channel

An MRM experiment (triple-quadrupole) would have one function per precursor/product pair. MRM data is rare in public repositories and this format variant has not yet been observed in corpus data.

DAT Encoding Variants

Three distinct record encodings have been observed in _FUNCnnn.DAT:

EncodingRecord sizeIDX variantInstrumentsDescription
A6 bytesVariant A (22-byte IDX)Older QTOF (Q-TOF Ultima)flags(u8), zero(u8), intensity(u8), tof_bin(u16)
B8 bytesVariant B (30-byte IDX)SYNAPT G2-Si IMSzero(u16), count(u16), dt_bin(u16), tof_bin(u16)
C8 bytesVariant B (30-byte IDX)Xevo G2-XS QTofzero(u16), count(u16), sub_bin(u16), tof_bin(u16)

All three encodings are fully decoded. Encodings A, B, and C are decodable to m/z and (for Encoding B) IMS drift time using formulas derived from _extern.inf, _FUNCTNS.INF, and the T1 calibration polynomial.

IDX Variants

VariantRecord sizeDAT offset fieldObserved in
A22 bytesu32@0x00Older non-IMS QTOF
B30 bytesu32@0x16SYNAPT G2-Si (IMS and non-IMS), Xevo G2-XS

Variant B is used by both IMS and non-IMS Xevo/SYNAPT G2-generation instruments. IDX stride alone does not distinguish IMS from non-IMS; presence of APEXnnnD.BIN or APEXnnnDIONS.CSV is the reliable IMS indicator.

m/z Decoding Summary

All three encodings are fully decodable to m/z. Encoding B additionally yields IMS drift time.

# Common to all:
A_us = sqrt(m_proton * Lteff_m / (2 * e * Veff)) * 1e6 # from _extern.inf
mz = (t_cal_us / A_us)^2
t_cal = c0 + c1*t_raw + c2*t_raw^2 + ... + ck*t_raw^k # T1 polynomial, _HEADER.TXT

# Encoding A (6-byte, non-IMS QTOF):
# First record of each scan is a zero-intensity sentinel;
# sentinel.tof_bin = max TOF bin corresponding to mz_high.
t_bin_us = A_us * sqrt(mz_high) / sentinel_tof_bin # bin width in microseconds
t_raw_us = tof_bin * t_bin_us

# Encoding B (8-byte, SYNAPT G2-Si IMS):
# bytes[2:4]=count(u16), bytes[4:6]=dt_bin(u16), bytes[6:8]=tof_bin(u16)
# tof_bin_low/high from first/last record of scan (sentinel if count=0, else first hit).
t_low_us = A_us * sqrt(mz_low)
t_high_us = A_us * sqrt(mz_high)
t_bin_us = (t_high_us - t_low_us) / (tof_bin_high - tof_bin_low)
t_raw_us = t_low_us + (tof_bin - tof_bin_low) * t_bin_us
drift_ms = dt_bin * scan_time_ms / 65536 # scan_time_ms from _FUNCTNS.INF

# Encoding C (8-byte, Xevo G2-XS):
# First record = sentinel at mz_low_bin, last = sentinel at mz_high_bin.
t_low_us = A_us * sqrt(mz_low)
t_high_us = A_us * sqrt(mz_high)
t_bin_us = (t_high_us - t_low_us) / (mz_high_bin - mz_low_bin)
frac_bin = (tof_bin - mz_low_bin) + sub_bin / 65536
t_raw_us = t_low_us + frac_bin * t_bin_us

where m_proton = 1.6726e-27 kg, e = 1.6022e-19 C, Lteff_m = Lteff_mm / 1000, and mz_low/mz_high come from _FUNCTNS.INF.

Waters Parameter Table Format

Several binary files (_CHROMS.INF, _FUNCnnn.STS, _CHROnnnn.DAT) share a common "parameter table" structure:

[32-byte preamble]
u16@0 = data_offset (= 32 + n_desc * 48)
u16@2 = version (always 1)
u16@4 = record_size
u16@6 = n_desc
[n_desc * 48-byte descriptor records, starting at 0x20]
u16@0 = channel sequence number
u16@2 = encoding type (0=u8, 1=i16, 2=u32, 3=f32)
u16@4 = byte offset in data record
bytes[6:48] = null-padded ASCII channel name
[n_records * record_size bytes of data]

_CHROMS.INF uses a 128-byte header + 85-byte records (different stride).

Known Instrument Generations

InstrumentNotes
Waters SYNAPT G2-SiIMS + MS; IDX Variant B; DAT Encoding B (IMS)
Waters Xevo G2-XS QTofNo IMS; IDX Variant B; DAT Encoding C
Waters Q-TOF UltimaNo IMS; IDX Variant A; DAT Encoding A

Corpus

AccessionInstrumentNotes
PXD058812Q-TOF (non-IMS)3 small files, Encoding A, 197-426 scans
PXD066594SYNAPT G2-Si IMSWANG.raw, 590 scans, large IMS data
PXD068881SYNAPT G2-Si IMSCtpA LC-MS, 1138 scans, has CHROMS.INF
PXD075602Xevo G2-XS QTofDHPR LC-MS, 3 functions, Encoding C

See Also