Journal of Big Data (Oct 2017)
Dimensionality reduction and class prediction algorithm with application to microarray Big Data
Abstract
Abstract The recent technology development in the concern of microarray experiments has provided many new potentialities in terms of simultaneous measurement. But new challenges have arisen from these massive quantities of information qualified as Big Data. The challenge consists to extract the main information containing the sense from the data. To this end researchers are using various techniques as “hierarchical clustering”, “mutual information” and “self-organizing maps” to name a few. However, the management and analysis of the millions resulting dataset haven’t yet reached a satisfactory level, and there is no clear consensus about the best method/methods revealing patterns of gene expression. Thus, many efforts are required to strengthen the methodologies for optimal analysis of Big Data. In this paper, we propose a new processing approach which is structured on feature extraction and selection. The feature extraction, is based on correlation and rank analysis and leads to a reduction of the number of variables. The feature selection, consists in eliminating redundant or irrelevant variables, using some adapted techniques of discriminant analysis. Our approach is tested on three type of cancer gene expression microarray and compared with concurrent other approaches. It performs well, in terms of prediction results, computation and processing time.
Keywords