Sequence Alignment Using Machine Learning-Based Needleman&#x2013;Wunsch Algorithm

Amr Ezz El-Din Rashed; Hanan M. Amer; Mervat El-Seddek; Hossam El-Din Moustafa

doi:10.1109/ACCESS.2021.3100408

IEEE Access (Jan 2021)

Sequence Alignment Using Machine Learning-Based Needleman–Wunsch Algorithm

Amr Ezz El-Din Rashed,
Hanan M. Amer,
Mervat El-Seddek,
Hossam El-Din Moustafa

Affiliations

Amr Ezz El-Din Rashed: ORCiD; Department of Communications and Electronics, Faculty of Engineering, Mansoura University, Mansoura, Egypt
Hanan M. Amer: ORCiD; Department of Communications and Electronics, Faculty of Engineering, Mansoura University, Mansoura, Egypt
Mervat El-Seddek: Higher Institute of Engineering and Technology, Mansoura, Egypt
Hossam El-Din Moustafa: Department of Communications and Electronics, Faculty of Engineering, Mansoura University, Mansoura, Egypt

DOI: https://doi.org/10.1109/ACCESS.2021.3100408
Journal volume & issue: Vol. 9
pp. 109522 – 109535

Abstract

Read online

Biological pairwise sequence alignment can be used as a method for arranging two biological sequence characters to identify regions of similarity. This operation has elicited considerable interest due to its significant influence on various critical aspects of life (e.g., identifying mutations in coronaviruses). Sequence alignment over large databases cannot yield results within a reasonable time, power, and cost. heuristic methods, such as FASTA, the BLAST family have been demonstrated to perform 40 times faster than DP-based (e.g., Needleman–Wunsch) techniques they cannot guarantee an optimum alignment result An optimized software platform of a widely used DNA sequence alignment algorithm called the Needleman–Wunsch (NW) algorithm based on a lookup table, is described in this study. This global alignment algorithm is the best approach for identifying similar regions between sequences. This study presents a new application of classical machine learning (ML) to global sequence alignment. Customized ML models are used to implement NW global alignment. An accuracy of 99.7% is achieved when using a multilayer perceptron with the ADAM optimizer, and up to 2912 Giga cell updates per second are realized on two real DNA sequences with a length of 4.1 M nucleotides. Our implementation is valid for RNA/DNA sequences. This study aims to parallelize the computation steps involved in the algorithm to accelerate its performance by using ML algorithms. All datasets used in this study are available from https://ieee-dataport.org/documents/dna-sequence-alignment-datasets-based-nw-algorithm.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords