Scientific Reports (Mar 2022)

StackZDPD: a novel encoding scheme for mass spectrometry data optimized for speed and compression ratio

  • Jinyin Wang,
  • Miaoshan Lu,
  • Ruiming Wang,
  • Shaowei An,
  • Cong Xie,
  • Changbin Yu

DOI
https://doi.org/10.1038/s41598-022-09432-1
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 9

Abstract

Read online

Abstract As the pervasive, standardized format for interchange and deposition of raw mass spectrometry (MS) proteomics and metabolomics data, text-based mzML is inefficiently utilized on various analysis platforms due to its sheer volume of samples and limited read/write speed. Most research on compression algorithms rarely provides flexible random file reading scheme. Database-developed solution guarantees the efficiency of random file reading, but nevertheless the efforts in compression and third-party software support are insufficient. Under the premise of ensuring the efficiency of decompression, we propose an encoding scheme “Stack-ZDPD” that is optimized for storage of raw MS data, designed for the format “Aird”, a computation-oriented format with fast accessing and decoding time, where the core compression algorithm is “ZDPD”. Stack-ZDPD reduces the volume of data stored in mzML format by around 80% or more, depending on the data acquisition pattern, and the compression ratio is approximately 30% compared to ZDPD for data generated using Time of Flight technology. Our approach is available on AirdPro, for file conversion and the Java-API Aird-SDK, for data parsing.