Information (Mar 2019)

Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data

  • Milko Krachunov,
  • Maria Nisheva,
  • Dimitar Vassilev

DOI
https://doi.org/10.3390/info10030110
Journal volume & issue
Vol. 10, no. 3
p. 110

Abstract

Read online

Metagenomics studies, as well as genomics studies of polyploid species such as wheat, deal with the analysis of high variation data. Such data contain sequences from similar, but distinct genetic chains. This fact presents an obstacle to analysis and research. In particular, the detection of instrumentation errors during the digitalization of the sequences may be hindered, as they can be indistinguishable from the real biological variation inside the digital data. This can prevent the determination of the correct sequences, while at the same time make variant studies significantly more difficult. This paper details a collection of ML-based models used to distinguish a real variant from an erroneous one. The focus is on using this model directly, but experiments are also done in combination with other predictors that isolate a pool of error candidates.

Keywords