PredictION: a predictive model to establish the performance of Oxford sequencing reads of SARS-CoV-2

David E. Valencia-Valencia; Diana Lopez-Alvarez; Nelson Rivera-Franco; Andres Castillo; Johan S. Piña; Carlos A. Pardo; Beatriz Parra

doi:10.7717/peerj.14425

PeerJ (Nov 2022)

PredictION: a predictive model to establish the performance of Oxford sequencing reads of SARS-CoV-2

David E. Valencia-Valencia,
Diana Lopez-Alvarez,
Nelson Rivera-Franco,
Andres Castillo,
Johan S. Piña,
Carlos A. Pardo,
Beatriz Parra

Affiliations

David E. Valencia-Valencia: Laboratorio de Técnicas y Análisis Ómicos—TAOLab/CiBioFi, Facultad de Ciencias Naturales y Exactas, Universidad del Valle, Cali, Valle del Cauca, Colombia
Diana Lopez-Alvarez: Laboratorio de Técnicas y Análisis Ómicos—TAOLab/CiBioFi, Facultad de Ciencias Naturales y Exactas, Universidad del Valle, Cali, Valle del Cauca, Colombia
Nelson Rivera-Franco: Laboratorio de Técnicas y Análisis Ómicos—TAOLab/CiBioFi, Facultad de Ciencias Naturales y Exactas, Universidad del Valle, Cali, Valle del Cauca, Colombia
Andres Castillo: Laboratorio de Técnicas y Análisis Ómicos—TAOLab/CiBioFi, Facultad de Ciencias Naturales y Exactas, Universidad del Valle, Cali, Valle del Cauca, Colombia
Johan S. Piña: Department of Data Science, People Contact, Manizales, Caldas, Colombia
Carlos A. Pardo: Department of Neurology, Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
Beatriz Parra: Grupo VIREM—Virus Emergentes y Enfermedad, Escuela de Ciencias Básicas, Facultad de Salud, Universidad del Valle, Cali, Valle del Cauca, Colombia

DOI: https://doi.org/10.7717/peerj.14425
Journal volume & issue: Vol. 10
p. e14425

Abstract

Read online Read online

The optimization of resources for research in developing countries forces us to consider strategies in the wet lab that allow the reuse of molecular biology reagents to reduce costs. In this study, we used linear regression as a method for predictive modeling of coverage depth given the number of MinION reads sequenced to define the optimum number of reads necessary to obtain >200X coverage depth with a good lineage-clade assignment of SARS-CoV-2 genomes. The research aimed to create and implement a model based on machine learning algorithms to predict different variables (e.g., coverage depth) given the number of MinION reads produced by Nanopore sequencing to maximize the yield of high-quality SARS-CoV-2 genomes, determine the best sequencing runtime, and to be able to reuse the flow cell with the remaining nanopores available for sequencing in a new run. The best accuracy was −0.98 according to the R squared performance metric of the models. A demo version is available at https://genomicdashboard.herokuapp.com/.

Published in PeerJ

ISSN: 2167-8359 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Medicine; Science: Biology (General)
Website: https://peerj.com/

About the journal

Abstract

Keywords