Identification of cancer related genes using feature selection and association rule mining

Consolata Gakii; Richard Rimiru

Informatics in Medicine Unlocked (Jan 2021)

Identification of cancer related genes using feature selection and association rule mining

Consolata Gakii,
Richard Rimiru

Affiliations

Consolata Gakii: School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O Box 62000, Nairobi, Kenya; Department of Mathematics, Computing and Information Technology, University of Embu, Embu, Kenya; Corresponding author. School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O Box 62000, Nairobi, Kenya.
Richard Rimiru: School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O Box 62000, Nairobi, Kenya

Journal volume & issue: Vol. 24
p. 100595

Abstract

Read online

High throughput sequencing generates large volumes of high dimensional data. Identifying informative features from the generated big data is always a challenge. Feature selection reduces complex data into a smaller number of variables while preserving the information as much as possible. In this study, we used DaMiRseq, DESeq2, edgeR and Limma + voom to identify differentially expressed genes in 79 small cell lung cancer (sclc) and 7 normal controls. A gene network was used to identify any coexpressed genes. Association rule mining was used to identify any association between connected genes in the network. Limma + voom identified the highest number of differentially expressed genes. However, 81 genes were common in the four differential gene expression analysis methods used. After filtering out all nodes with a degree less than 5, the final network had 43 nodes and 63 edges. Association rule mining on the coexpressed genes generated 263 rules. Genes that were common in the rules were: SLC34A2, CAV2, EPAS1, CTSH, AQP1 and LRRK2. These genes have been associated with various types of cancer. Therefore, feature selection using differential gene expression analysis, co-expression networks and association rule mining could help infer relationships among genes and their possibility of having a shared biological function.

Published in Informatics in Medicine Unlocked

ISSN: 2352-9148 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.journals.elsevier.com/informatics-in-medicine-unlocked/

About the journal

Abstract

Keywords