SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data

Roman Prytuliak; Friedhelm Pfeiffer; Bianca Hermine Habermann

doi:10.1186/s12859-018-2020-x

BMC Bioinformatics (Jan 2018)

SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data

Roman Prytuliak,
Friedhelm Pfeiffer,
Bianca Hermine Habermann

Affiliations

Roman Prytuliak: Computational Biology Group, Max Planck Institute of Biochemistry
Friedhelm Pfeiffer: Computational Biology Group, Max Planck Institute of Biochemistry
Bianca Hermine Habermann: Computational Biology Group, Max Planck Institute of Biochemistry

DOI: https://doi.org/10.1186/s12859-018-2020-x
Journal volume & issue: Vol. 19, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Background Protein or nucleic acid sequences contain a multitude of associated annotations representing continuous sequence elements (CSEs). Comparing these CSEs is needed, whenever we want to match identical annotations or integrate distinctive ones. Currently, there is no ready-to-use software available that provides comprehensive statistical readout for comparing two annotations of the same type with each other, which can be adapted to the application logic of the scientific question. Results We have developed a method, SLALOM (for StatisticaL Analysis of Locus Overlap Method), to perform comparative analysis of sequence annotations in a highly flexible way. SLALOM implements six major operation modes and a number of additional options that can answer a variety of statistical questions about a pair of input annotations of a given sequence collection. We demonstrate the results of SLALOM on three different examples from biology and economics and compare our method to already existing software. We discuss the importance of carefully choosing the application logic to address specific scientific questions. Conclusion SLALOM is a highly versatile, command-line based method for comparing annotations in a collection of sequences, with a statistical read-out for performance evaluation and benchmarking of predictors and gene annotation pipelines. Abstraction from sequence content even allows SLALOM to compare other kinds of positional data including, for example, data coming from time series.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal