IEEE Access (Jan 2019)

Frequent Patterns Mining in DNA Sequence

  • Na Deng,
  • Xu Chen,
  • Desheng Li,
  • Caiquan Xiong

DOI
https://doi.org/10.1109/ACCESS.2019.2933044
Journal volume & issue
Vol. 7
pp. 108400 – 108410

Abstract

Read online

As a common biological sequence, DNA sequences contain important information. The discovery of frequent patterns in DNA sequences can help to study the evolution, function and variation of genes. The findings are of great significance to genetic and mutation analysis, analysis of disease causes and treatment of diseases. Traditional methods of frequent pattern discovery need to scan DNA sequences multiple times. To overcome this shortcoming, this article proposes a new method to discover frequent patterns from DNA sequences. This method is based on a two-level nested hash table data structure and set operation. All frequent patterns and their positions in DNA sequences can be found by scanning DNA sequences only once. Experimental results show that this method can correctly recognize all the frequent patterns in DNA sequences and their locations. The method can also be applied to discover frequent patterns in RNA, protein or other biological sequences.

Keywords