PLoS ONE (Jan 2012)

Is a genome a codeword of an error-correcting code?

  • Luzinete C B Faria,
  • Andréa S L Rocha,
  • João H Kleinschmidt,
  • Márcio C Silva-Filho,
  • Edson Bim,
  • Roberto H Herai,
  • Michel E B Yamagishi,
  • Reginaldo Palazzo

DOI
https://doi.org/10.1371/journal.pone.0036644
Journal volume & issue
Vol. 7, no. 5
p. e36644

Abstract

Read online

Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction.