Early detection and improved genomic surveillance of SARS-CoV-2 variants from deep sequencing data
Daniele Ramazzotti,
Davide Maspero,
Fabrizio Angaroni,
Silvia Spinelli,
Marco Antoniotti,
Rocco Piazza,
Alex Graudenzi
Affiliations
Daniele Ramazzotti
Department of Medicine and Surgery, University of Milan-Bicocca, Monza, Italy; Corresponding author
Davide Maspero
Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy; Institute of Molecular Bioimaging and Physiology, National Research Council (IBFM-CNR), Segrate, Milan, Italy; CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
Fabrizio Angaroni
Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
Silvia Spinelli
Department of Medicine and Surgery, University of Milan-Bicocca, Monza, Italy
Marco Antoniotti
Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy; Bicocca Bioinformatics, Biostatistics and Bioimaging Centre – B4, Milan, Italy
Rocco Piazza
Department of Medicine and Surgery, University of Milan-Bicocca, Monza, Italy; Bicocca Bioinformatics, Biostatistics and Bioimaging Centre – B4, Milan, Italy
Alex Graudenzi
Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy; Institute of Molecular Bioimaging and Physiology, National Research Council (IBFM-CNR), Segrate, Milan, Italy; Bicocca Bioinformatics, Biostatistics and Bioimaging Centre – B4, Milan, Italy; Corresponding author
Summary: A key task of genomic surveillance of infectious viral diseases lies in the early detection of dangerous variants. Unexpected help to this end is provided by the analysis of deep sequencing data of viral samples, which are typically discarded after creating consensus sequences. Such analysis allows one to detect intra-host low-frequency mutations, which are a footprint of mutational processes underlying the origination of new variants. Their timely identification may improve public-health decision-making with respect to traditional approaches exploiting consensus sequences. We present the analysis of 220,788 high-quality deep sequencing SARS-CoV-2 samples, showing that many spike and nucleocapsid mutations of interest associated to the most circulating variants, including Beta, Delta, and Omicron, might have been intercepted several months in advance. Furthermore, we show that a refined genomic surveillance system leveraging deep sequencing data might allow one to pinpoint emerging mutation patterns, providing an automated data-driven support to virologists and epidemiologists.