Biosafety and Health (Aug 2023)

On the collection and integration of SARS-CoV-2 genome data

  • Lina Ma,
  • Wei Zhao,
  • Tianhao Huang,
  • Enhui Jin,
  • Gangao Wu,
  • Wenming Zhao,
  • Yiming Bao

Journal volume & issue
Vol. 5, no. 4
pp. 204 – 210

Abstract

Read online

Genome data of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is essential for virus diagnosis, vaccine development, and variant surveillance. To archive and integrate worldwide SARS-CoV-2 genome data, a series of resources have been constructed, serving as a fundamental infrastructure for SARS-CoV-2 research, pandemic prevention and control, and coronavirus disease 2019 (COVID-19) therapy. Here we present an overview of extant SARS-CoV-2 resources that are devoted to genome data deposition and integration. We review deposition resources in data accessibility, metadata standardization, data curation and annotation; review integrative resources in data source, de-redundancy processing, data curation and quality assessment, and variant annotation. Moreover, we address issues that impede SARS-CoV-2 genome data integration, including low-complexity, inconsistency and absence of isolate name, sequence inconsistency, asynchronous update of genome data, and mismatched metadata. We finally provide insights into data standardization consensus and data submission guidelines, to promote SARS-CoV-2 genome data sharing and integration.

Keywords