IEEE Access (Jan 2020)

Simple and Efficient Pattern Matching Algorithms for Biological Sequences

  • Peyman Neamatollahi,
  • Montassir Hadi,
  • Mahmoud Naghibzadeh

DOI
https://doi.org/10.1109/ACCESS.2020.2969038
Journal volume & issue
Vol. 8
pp. 23838 – 23846

Abstract

Read online

The remarkable growth of biological data is a motivation to accelerate the discovery of solutions in many domains of computational bioinformatics. In different phases of the computational pipelines, pattern matching is a very practical operation. For example, pattern matching enables users to find the locations of particular DNA subsequences in a database or DNA sequence. Furthermore, in these expanding biological databases, some patterns are updated over time. To perform faster searches, high-speed pattern matching algorithms are needed. The present paper introduces three pattern matching algorithms that are specially formulated to speed up searches on large DNA sequences. The proposed algorithms raise performance by utilizing word processing (in place of the character processing presented in previous works) and also by searching the least frequent word of the pattern in the sequence. In terms of time cost, the experimental results demonstrate the superiority of the presented algorithms over the other simulated algorithms.

Keywords