AIMS Medical Science (Dec 2017)

Bacteria classification using minimal absent words

  • Gabriele Fici,
  • Alessio Langiu,
  • Giosuè Lo Bosco,
  • Riccardo Rizzo

DOI
https://doi.org/10.3934/medsci.2018.1.23
Journal volume & issue
Vol. 5, no. 1
pp. 23 – 32

Abstract

Read online

Bacteria classification has been deeply investigated with different tools for many purposes,such as early diagnosis, metagenomics, phylogenetics. Classification methods based on ribosomalDNA sequences are considered a reference in this area. We present a new classificatier for bacteriaspecies based on a dissimilarity measure of purely combinatorial nature. This measure is based onthe notion of Minimal Absent Words, a combinatorial definition that recently found applications inbioinformatics. We can therefore incorporate this measure into a probabilistic neural network in orderto classify bacteria species. Our approach is motivated by the fact that there is a vast literature on thecombinatorics of Minimal Absent Words in relation with the degree of repetitiveness of a sequence.We ran our experiments on a public dataset of Ribosomal RNA Sequences from the complex 16S. Ourapproach showed a very high score in the accuracy of the classification, proving hence that our methodis comparable with the standard tools available for the automatic classification of bacteria species.

Keywords