PLoS ONE (Jan 2012)
Is a genome a codeword of an error-correcting code?
Abstract
Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction.