BMC Bioinformatics (May 2012)

MergeAlign: improving multiple sequence alignment performance by dynamic reconstruction of consensus multiple sequence alignments

  • Collingridge Peter W,
  • Kelly Steven

DOI
https://doi.org/10.1186/1471-2105-13-117
Journal volume & issue
Vol. 13, no. 1
p. 117

Abstract

Read online

Abstract Background The generation of multiple sequence alignments (MSAs) is a crucial step for many bioinformatic analyses. Thus improving MSA accuracy and identifying potential errors in MSAs is important for a wide range of post-genomic research. We present a novel method called MergeAlign which constructs consensus MSAs from multiple independent MSAs and assigns an alignment precision score to each column. Results Using conventional benchmark tests we demonstrate that on average MergeAlign MSAs are more accurate than MSAs generated using any single matrix of sequence substitution. We show that MergeAlign column scores are related to alignment precision and hence provide an ab initio method of estimating alignment precision in the absence of curated reference MSAs. Using two novel and independent alignment performance tests that utilise a large set of orthologous gene families we demonstrate that increasing MSA performance leads to an increase in the performance of downstream phylogenetic analyses. Conclusion Using multiple tests of alignment performance we demonstrate that this novel method has broad general application in biological research.