Efficient Mining of Interesting Patterns in Large Biological Sequences

Md. Mamunur Rashid; Md. Rezaul Karim; Byeong-Soo Jeong; Ho-Jin Choi

doi:10.5808/GI.2012.10.1.44

Genomics & Informatics (Mar 2012)

Efficient Mining of Interesting Patterns in Large Biological Sequences

Md. Mamunur Rashid,
Md. Rezaul Karim,
Byeong-Soo Jeong,
Ho-Jin Choi

Affiliations

Md. Mamunur Rashid: Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.
Md. Rezaul Karim: Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.
Byeong-Soo Jeong: Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.
Ho-Jin Choi: Department of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea.

DOI: https://doi.org/10.5808/GI.2012.10.1.44
Journal volume & issue: Vol. 10, no. 1
pp. 44 – 50

Abstract

Read online

Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.

Published in Genomics & Informatics

ISSN: 1598-866X (Print); 2234-0742 (Online)
Publisher: Korea Genome Organization
Country of publisher: Korea, Republic of
LCC subjects: Science: Biology (General): Genetics
Website: https://genominfo.org/

About the journal

Abstract

Keywords