Waters RAW Format - Overview
The Waters MassLynx RAW format is a directory-based vendor format used by Waters LC-MS instruments including the Synapt, Xevo, ACQUITY, and MALDI HDMS product lines.
Each acquisition produces a .raw directory (not a single file) containing
a set of binary and plain-text files that together describe the instrument
method, calibration state, and all acquired spectra.
Files Present in a Typical .raw Directory
| Filename | Type | Status | Description |
|---|---|---|---|
_HEADER.TXT | ASCII | Fully known | Run metadata, calibration polynomials |
_FUNCTNS.INF | Binary | Fully known | Function table: one 416-byte record per MS function |
_FUNCnnn.IDX | Binary | Fully known | Scan index (DAT offsets, RT, housekeeping) |
_FUNCnnn.DAT | Binary | Fully known | Packed spectrum data (3 encodings; see below) |
_FUNCnnn.STS | Binary | Fully decoded | Per-scan instrument statistics (voltages, TIC, push count) |
_CHROMS.INF | Binary | Fully decoded | LC channel descriptor table |
_CHROnnnn.DAT | Binary | Fully decoded | LC channel time-series data (f32 RT + f32 value) |
_extern.inf | ASCII | Fully known | Instrument geometry constants (Lteff, Veff, pusher period) |
_INLET.INF | ASCII text | Fully known | ACE inlet method record (LC runs only) |
_HISTORY.INF | Binary | Partially decoded | Waters PT with 0 descriptors; data opaque |
_PROCnnn.DAT/IDX/STS | Binary | Partially decoded | Post-processed IMS-MS peak data (IMS runs only) |
APEXnnnD.BIN | Binary | Container decoded | Multi-section binary; ASCII command-line params + binary peak data (PeakEx not decoded) |
APEXnnnDIONS.CSV | CSV | Fully known | Apex3D ion list: m/z, RT, intensity, drift time per detected 3D peak |
Files without a number suffix appear once per .raw directory. Files with
nnn are numbered 001-099, one per MS function.
Function Concept
A "function" in Waters terminology is a discrete acquisition channel. A typical experiment structure:
| Experiment type | Functions |
|---|---|
| MS survey only | Function 1 = MS1 |
| DDA (auto-MS/MS) | Function 1 = survey, Functions 2-N = triggered MS/MS |
| IMS-MS (HDMS) | Function 1 = IMS-MS, Function 2 = reference/lock-mass |
| Lock-mass reference | Last function = calibrant channel |
An MRM experiment (triple-quadrupole) would have one function per precursor/product pair. MRM data is rare in public repositories and this format variant has not yet been observed in corpus data.
DAT Encoding Variants
Three distinct record encodings have been observed in _FUNCnnn.DAT:
| Encoding | Record size | IDX variant | Instruments | Description |
|---|---|---|---|---|
| A | 6 bytes | Variant A (22-byte IDX) | Older QTOF (Q-TOF Ultima) | flags(u8), zero(u8), intensity(u8), tof_bin(u16) |
| B | 8 bytes | Variant B (30-byte IDX) | SYNAPT G2-Si IMS | zero(u16), count(u16), dt_bin(u16), tof_bin(u16) |
| C | 8 bytes | Variant B (30-byte IDX) | Xevo G2-XS QTof | zero(u16), count(u16), sub_bin(u16), tof_bin(u16) |
All three encodings are fully decoded. Encodings A, B, and C are decodable to
m/z and (for Encoding B) IMS drift time using formulas derived from _extern.inf,
_FUNCTNS.INF, and the T1 calibration polynomial.
IDX Variants
| Variant | Record size | DAT offset field | Observed in |
|---|---|---|---|
| A | 22 bytes | u32@0x00 | Older non-IMS QTOF |
| B | 30 bytes | u32@0x16 | SYNAPT G2-Si (IMS and non-IMS), Xevo G2-XS |
Variant B is used by both IMS and non-IMS Xevo/SYNAPT G2-generation
instruments. IDX stride alone does not distinguish IMS from non-IMS;
presence of APEXnnnD.BIN or APEXnnnDIONS.CSV is the reliable IMS indicator.
m/z Decoding Summary
All three encodings are fully decodable to m/z. Encoding B additionally yields IMS drift time.
# Common to all:
A_us = sqrt(m_proton * Lteff_m / (2 * e * Veff)) * 1e6 # from _extern.inf
mz = (t_cal_us / A_us)^2
t_cal = c0 + c1*t_raw + c2*t_raw^2 + ... + ck*t_raw^k # T1 polynomial, _HEADER.TXT
# Encoding A (6-byte, non-IMS QTOF):
# First record of each scan is a zero-intensity sentinel;
# sentinel.tof_bin = max TOF bin corresponding to mz_high.
t_bin_us = A_us * sqrt(mz_high) / sentinel_tof_bin # bin width in microseconds
t_raw_us = tof_bin * t_bin_us
# Encoding B (8-byte, SYNAPT G2-Si IMS):
# bytes[2:4]=count(u16), bytes[4:6]=dt_bin(u16), bytes[6:8]=tof_bin(u16)
# tof_bin_low/high from first/last record of scan (sentinel if count=0, else first hit).
t_low_us = A_us * sqrt(mz_low)
t_high_us = A_us * sqrt(mz_high)
t_bin_us = (t_high_us - t_low_us) / (tof_bin_high - tof_bin_low)
t_raw_us = t_low_us + (tof_bin - tof_bin_low) * t_bin_us
drift_ms = dt_bin * scan_time_ms / 65536 # scan_time_ms from _FUNCTNS.INF
# Encoding C (8-byte, Xevo G2-XS):
# First record = sentinel at mz_low_bin, last = sentinel at mz_high_bin.
t_low_us = A_us * sqrt(mz_low)
t_high_us = A_us * sqrt(mz_high)
t_bin_us = (t_high_us - t_low_us) / (mz_high_bin - mz_low_bin)
frac_bin = (tof_bin - mz_low_bin) + sub_bin / 65536
t_raw_us = t_low_us + frac_bin * t_bin_us
where m_proton = 1.6726e-27 kg, e = 1.6022e-19 C,
Lteff_m = Lteff_mm / 1000, and mz_low/mz_high come from _FUNCTNS.INF.
Waters Parameter Table Format
Several binary files (_CHROMS.INF, _FUNCnnn.STS, _CHROnnnn.DAT) share
a common "parameter table" structure:
[32-byte preamble]
u16@0 = data_offset (= 32 + n_desc * 48)
u16@2 = version (always 1)
u16@4 = record_size
u16@6 = n_desc
[n_desc * 48-byte descriptor records, starting at 0x20]
u16@0 = channel sequence number
u16@2 = encoding type (0=u8, 1=i16, 2=u32, 3=f32)
u16@4 = byte offset in data record
bytes[6:48] = null-padded ASCII channel name
[n_records * record_size bytes of data]
_CHROMS.INF uses a 128-byte header + 85-byte records (different stride).
Known Instrument Generations
| Instrument | Notes |
|---|---|
| Waters SYNAPT G2-Si | IMS + MS; IDX Variant B; DAT Encoding B (IMS) |
| Waters Xevo G2-XS QTof | No IMS; IDX Variant B; DAT Encoding C |
| Waters Q-TOF Ultima | No IMS; IDX Variant A; DAT Encoding A |
Corpus
| Accession | Instrument | Notes |
|---|---|---|
| PXD058812 | Q-TOF (non-IMS) | 3 small files, Encoding A, 197-426 scans |
| PXD066594 | SYNAPT G2-Si IMS | WANG.raw, 590 scans, large IMS data |
| PXD068881 | SYNAPT G2-Si IMS | CtpA LC-MS, 1138 scans, has CHROMS.INF |
| PXD075602 | Xevo G2-XS QTof | DHPR LC-MS, 3 functions, Encoding C |