BMC Bioinformatics (Sep 2022)

A fast and efficient path elimination algorithm for large-scale multiple common longest sequence problems

  • Changyong Yu,
  • Pengxi Lin,
  • Yuhai Zhao,
  • Tianmei Ren,
  • Guoren Wang

DOI
https://doi.org/10.1186/s12859-022-04906-5
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Background In various fields, searching for the Longest Common Subsequences (LCS) of Multiple (i.e., three or more) sequences (MLCS) is a classic but difficult problem to solve. The primary bottleneck in this problem is that present state-of-the-art algorithms require the construction of a huge graph (called a direct acyclic graph, or DAG), which the computer usually has not enough space to handle. Because of their massive time and space consumption, present algorithms are inapplicable to issues with lengthy and large-scale sequences. Results A mini Directed Acyclic Graph (mini-DAG) model and a novel Path Elimination Algorithm are proposed to address large-scale MLCS issues efficiently. In mini-DAG, we employ the branch and bound approach to reduce paths during DAG construction, resulting in a very mini DAG (mini-DAG), which saves memory space and search time. Conclusion Empirical experiments have been performed on a standard benchmark set of DNA sequences. The experimental results show that our model outperforms the leading algorithms, especially for large-scale MLCS problems.

Keywords