SLAS Technology (Aug 2024)

Assessment and classification of COVID-19 DNA sequence using pairwise features concatenation from multi-transformer and deep features with machine learning models

  • Abdul Qayyum,
  • Abdesslam Benzinou,
  • Oumaima Saidani,
  • Fatimah Alhayan,
  • Muhammad Attique Khan,
  • Anum Masood,
  • Moona Mazher

Journal volume & issue
Vol. 29, no. 4
p. 100147

Abstract

Read online

The 2019 novel coronavirus (renamed SARS-CoV-2, and generally referred to as the COVID-19 virus) has spread to 184 countries with over 1.5 million confirmed cases. Such a major viral outbreak demands early elucidation of taxonomic classification and origin of the virus genomic sequence, for strategic planning, containment, and treatment. The emerging global infectious COVID-19 disease by novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) presents critical threats to global public health and the economy since it was identified in late December 2019 in China. The virus has gone through various pathways of evolution. Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying deep learning and machine learning approaches. In a general computational context for biomedical data analysis, DNA sequence classification is a crucial challenge. Several machine and deep learning techniques have been used in recent years to complete this task with some success. The classification of DNA sequences is a key research area in bioinformatics as it enables researchers to conduct genomic analysis and detect possible diseases. In this paper, three state-of-the-art deep learning-based models are proposed using two DNA sequence conversion methods. We also proposed a novel multi-transformer deep learning model and pairwise features fusion technique for DNA sequence classification. Furthermore, deep features are extracted from the last layer of the multi-transformer and used in machine-learning models for DNA sequence classification. The k-mer and one-hot encoding sequence conversion techniques have been presented. The proposed multi-transformer achieved the highest performance in COVID DNA sequence classification. Automatic identification and classification of viruses are essential to avoid an outbreak like COVID-19. It also helps in detecting the effect of viruses and drug design.

Keywords