FASEB BioAdvances (Jul 2021)
The multiple alignments of very short sequences
Abstract
Abstract The multiple sequence alignment (MSA) is an increasingly important task in bioinformatics as we have to deal with the constantly increasing gene‐ and protein sequence databases. MSA is applied in phylogenetic analysis, in discovering conservative protein domains, in the assignment of secondary and tertiary structural features in proteins, or in the metagenomic sample analysis and gene discovery. Usually, the focus is on the MSA of long sequences, since in the practice these tasks appear most frequently. However, the strict analysis of the optimal MSA of short sequences is an area of negligence, and findings there may contribute to better and faster algorithms for the multiple alignment of long sequences. In the present contribution, we are examining length‐1 sequences using arbitrary metric and length‐2 sequences using unit metric, and we show that the optimum of the MSA problem can be achieved by the trivial alignment in both cases.