AIMS Bioengineering (Mar 2017)

Dimension reduction methods for microarray data: a review

  • Rabia Aziz,
  • C.K. Verma,
  • Namita Srivastava

DOI
https://doi.org/10.3934/bioeng.2017.1.179
Journal volume & issue
Vol. 4, no. 1
pp. 179 – 197

Abstract

Read online

Dimension reduction has become inevitable for pre-processing of high dimensional data. “Gene expression microarray data” is an instance of such high dimensional data. Gene expression microarray data displays the maximum number of genes (features) simultaneously at a molecular level with a very small number of samples. The copious numbers of genes are usually provided to a learning algorithm for producing a complete characterization of the classification task. However, most of the times the majority of the genes are irrelevant or redundant to the learning task. It will deteriorate the learning accuracy and training speed as well as lead to the problem of overfitting. Thus, dimension reduction of microarray data is a crucial preprocessing step for prediction and classification of disease. Various feature selection and feature extraction techniques have been proposed in the literature to identify the genes, that have direct impact on the various machine learning algorithms for classification and eliminate the remaining ones. This paper describes the taxonomy of dimension reduction methods with their characteristics, evaluation criteria, advantages and disadvantages. It also presents a review of numerous dimension reduction approaches for microarray data, mainly those methods that have been proposed over the past few years.

Keywords