Scientific Reports (Jun 2021)

Covid-19 diagnosis by combining RT-PCR and pseudo-convolutional machines to characterize virus sequences

  • Juliana Carneiro Gomes,
  • Aras Ismael Masood,
  • Leandro Honorato de S. Silva,
  • Janderson Romário B. da Cruz Ferreira,
  • Agostinho Antônio Freire Júnior,
  • Allana Laís dos Santos Rocha,
  • Letícia Castro Portela de Oliveira,
  • Nathália Regina Cauás da Silva,
  • Bruno José Torres Fernandes,
  • Wellington Pinheiro dos Santos

DOI
https://doi.org/10.1038/s41598-021-90766-7
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 28

Abstract

Read online

Abstract The Covid-19 pandemic, a disease transmitted by the SARS-CoV-2 virus, has already caused the infection of more than 120 million people, of which 70 million have been recovered, while 3 million people have died. The high speed of infection has led to the rapid depletion of public health resources in most countries. RT-PCR is Covid-19’s reference diagnostic method. In this work we propose a new technique for representing DNA sequences: they are divided into smaller sequences with overlap in a pseudo-convolutional approach and represented by co-occurrence matrices. This technique eliminates multiple sequence alignment. Through the proposed method, it is possible to identify virus sequences from a large database: 347,363 virus DNA sequences from 24 virus families and SARS-CoV-2. When comparing SARS-CoV-2 with virus families with similar symptoms, we obtained $$0.97 \pm 0.03$$ 0.97 ± 0.03 for sensitivity and $$0.9919 \pm 0.0005$$ 0.9919 ± 0.0005 for specificity with MLP classifier and 30% overlap. When SARS-CoV-2 is compared to other coronaviruses and healthy human DNA sequences, we obtained $$0.99 \pm 0.01$$ 0.99 ± 0.01 for sensitivity and $$0.9986 \pm 0.0002$$ 0.9986 ± 0.0002 for specificity with MLP and 50% overlap. Therefore, the molecular diagnosis of Covid-19 can be optimized by combining RT-PCR and our pseudo-convolutional method to identify DNA sequences for SARS-CoV-2 with greater specificity and sensitivity.