Clinical and Translational Discovery (Dec 2023)

An efficient large‐scale whole‐genome sequencing analyses practice with an average daily analysis of 100Tbp: ZBOLT

  • Zhichao Li,
  • Yinlong Xie,
  • Wenjun Zeng,
  • Yushan Huang,
  • Shengchang Gu,
  • Ya Gao,
  • Weihua Huang,
  • Lihua Lu,
  • Xiaohong Wang,
  • Jiasheng Wu,
  • Xiaoxu Yin,
  • Rongyi Zhu,
  • Guodong Huang,
  • Lin Lu,
  • Jingbo Tang,
  • Yunping Zheng,
  • Quan Liu,
  • Xianqiang Zhou,
  • Riqiang Shan,
  • Bo Wang,
  • Mingyan Fang,
  • Xin Jin

DOI
https://doi.org/10.1002/ctd2.252
Journal volume & issue
Vol. 3, no. 6
pp. n/a – n/a

Abstract

Read online

Abstract Background With the advancement of whole‐genome sequencing (WGS) technology, massively parallel sequencing (MPS) remains the mainstream due to its accuracy, low cost, and high throughput. The development of the analytical pipeline corresponding to MPS has always been of great importance. Increasingly large population genomics studies, as a specific type of big data research, pose new challenges for analysis solutions. Results Here, we introduce ZBOLT, a comprehensive analysis system that incorporates both software and hardware advancements, making it an appropriate choice for large‐scale population genomic studies that require extensive data processing. In this study, we first evaluate ZBOLT's calling accuracy using the Genome in a Bottle (GIAB) benchmark dataset. Then we apply ZBOLT to a large‐scale population genomics study with 5,616 high sequencing depth samples totaling 1.16Pbp (base pair). As the results show, ZBOLT demonstrates exceptional efficiency and low energy consumption, processing 100Tbp per day and using 1kWh per 100Gbp sequenced sample. Conclusion This research serves as a valuable reference for analyzing sequencing data from large population cohorts and underscores the significant potential of ZBOLT in large‐scale population genomics studies.

Keywords