Majallah-i Dānishgāh-i ̒Ulūm-i Pizishkī-i Qum (Dec 2015)
A Study of Expression Level of Genes Causing Lymphoma Cancer Using Fuzzy-rough Set Classifier Model
Abstract
Background and Objectives: Cancer is one the major causes of mortality in today's world, and is considered as one of the most important health problems in societies. Most of the proposed methods for classifying cancer by gene expression data act as a black box and lack biological interpretability. The aim of this study was to introduce an optimal approach with the interpretability of gene expression. Methods: In this study, the combined filter-wrapper feature selection method was used to select a subset of cancer-causing genes, which this method significantly reduced the number of samples in comparison with the number of genes. Also, in this study, data discretization, generation and reduction of rules, and evaluation of results were performed by combining the fuzzy clustering methods, rough sets theory, and K-set validation. Accordingly, a new method with biological interpretability and meaning extraction from gene expression data was introduced, which is called “Fuzzy Rough Set Classification”. Results: Using filter-wrapper feature selection method for lymphoma microarray, 6 genes were selected from 4029 genes. In fuzzy roughest classifier method, two rules were generated in order to develop a classifier model with interpretability of gene expression. Conclusion: In this method, using ranking functions, the most important fuzzy rules were selected, which in addition to generation of an efficient model, the interpretability of gene expression data is made possible. Another prominent feature of this method was successful solution of the problem of disproportion between the number of samples and genes in microarrays by the proposed filter-wrapper feature selection method.