PeerJ (Sep 2024)
Automated evaluation of multiple sequence alignment methods to handle third generation sequencing errors
Abstract
Most third-generation sequencing (TGS) processing tools rely on multiple sequence alignment (MSA) methods to manage sequencing errors. Despite the broad range of MSA approaches available, a limited selection of implementations are commonly used in practice for this type of application, and no comprehensive comparative assessment of existing tools has been undertaken to date. In this context, we have developed an automatic pipeline, named MSA Limit, designed to facilitate the execution and evaluation of diverse MSA methods across a spectrum of conditions representative of TGS reads. MSA Limit offers insights into alignment accuracy, time efficiency, and memory utilization. It serves as a valuable resource for both users and developers, aiding in the assessment of algorithmic performance and assisting users in selecting the most appropriate tool for their specific experimental settings. Through a series of experiments using real and simulated data, we demonstrate the value of such exploration. Our findings reveal that in certain scenarios, popular methods may not consistently exhibit optimal efficiency and that the choice of the most effective method varies depending on factors such as sequencing depth, genome characteristics, and read error patterns. MSA Limit is an open source and freely available tool. All code and data pertaining to it and this manuscript are available at https://gitlab.cristal.univ-lille.fr/crohmer/msa-limit.
Keywords