Scientific Reports (Jun 2021)
Covid-19 diagnosis by combining RT-PCR and pseudo-convolutional machines to characterize virus sequences
Abstract
Abstract The Covid-19 pandemic, a disease transmitted by the SARS-CoV-2 virus, has already caused the infection of more than 120 million people, of which 70 million have been recovered, while 3 million people have died. The high speed of infection has led to the rapid depletion of public health resources in most countries. RT-PCR is Covid-19’s reference diagnostic method. In this work we propose a new technique for representing DNA sequences: they are divided into smaller sequences with overlap in a pseudo-convolutional approach and represented by co-occurrence matrices. This technique eliminates multiple sequence alignment. Through the proposed method, it is possible to identify virus sequences from a large database: 347,363 virus DNA sequences from 24 virus families and SARS-CoV-2. When comparing SARS-CoV-2 with virus families with similar symptoms, we obtained $$0.97 \pm 0.03$$ 0.97 ± 0.03 for sensitivity and $$0.9919 \pm 0.0005$$ 0.9919 ± 0.0005 for specificity with MLP classifier and 30% overlap. When SARS-CoV-2 is compared to other coronaviruses and healthy human DNA sequences, we obtained $$0.99 \pm 0.01$$ 0.99 ± 0.01 for sensitivity and $$0.9986 \pm 0.0002$$ 0.9986 ± 0.0002 for specificity with MLP and 50% overlap. Therefore, the molecular diagnosis of Covid-19 can be optimized by combining RT-PCR and our pseudo-convolutional method to identify DNA sequences for SARS-CoV-2 with greater specificity and sensitivity.