Is BCH Code Useful to DNA Classification as an Alignment-Free Method?

Milena M. Arruda; Francisco M. De Assis; Taciana A. De Souza

doi:10.1109/access.2021.3078138

IEEE Access (Jan 2021)

Is BCH Code Useful to DNA Classification as an Alignment-Free Method?

Milena M. Arruda,
Francisco M. De Assis,
Taciana A. De Souza

Affiliations

Milena M. Arruda: ORCiD; Department of Electrical Engineering, Federal University of Campina Grande, Campina Grande, Brazil
Francisco M. De Assis: Department of Electrical Engineering, Federal University of Campina Grande, Campina Grande, Brazil
Taciana A. De Souza: ORCiD; Department of Mathematics, Federal Institute of Paraíba, Cajazeiras, Brazil

DOI: https://doi.org/10.1109/access.2021.3078138
Journal volume & issue: Vol. 9
pp. 68552 – 68560

Abstract

Read online

Similarities between biological and digital communication systems have been investigated since biology also uses a discrete alphabet to represent and transmit information. The genetic information of an organism is encoded in DNA molecules by units called bases. However, there is no a definitive model and the question as what error-correcting code underlies DNA sequences remains an open problem. Recent works show that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as BCH codes. We propose improvements regarding the code construction process that resulted in a novel algorithm for searching BCH codes whose codeword differ from a given DNA sequence (mapped to finite field $\mathbb {F}_{4}$ ) in up to only one symbol. The most important improvement is to replace brute force decoding with syndrome decoding. In this sense, based on a statistical analysis, we verify whether in a collection of sequences with the same taxonomic rank there is a code that identifies most of these sequences, called dominant code. Furthermore, we check whether the dominant code can provides a biological information to DNA classification being an alignment-free method. Finally, we show that the probability of a DNA sequences with odd-length $n$ be identified by a BCH code tends to analytical probability of the same code identifying a random vector.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords