IEEE Access (Jan 2022)

Gene Selection in Binary Classification Problems Within Functional Genomics Experiments via Robust Fisher Score

  • Muhammad Hamraz,
  • Zardad Khan,
  • Dost Muhammad Khan,
  • Naz Gul,
  • Amjad Ali,
  • Saeed Aldahmani

DOI
https://doi.org/10.1109/ACCESS.2022.3172281
Journal volume & issue
Vol. 10
pp. 51682 – 51692

Abstract

Read online

This study proposes a supervised feature selection technique for classification in high dimensional binary class problems by adding robustness in the conventional Fisher Score. The proposed method utilizes the more robust measure of location i.e. the Median and measure of dispersion known as Rousseeuw and Croux statistic ( $Q_{n}$ ). Initially minimum subset of genes is identified by the Greedy search approach, which is then combined with the top ranked genes obtained via the proposed Robust Fisher Score (RFish). Finally to remove redundancy in the selected genes, Least Absolute Shrinkage and Selection Operator (LASSO) has been applied. The proposed method is validated on five publicly available datasets. The results of the proposed method are compared with six well known feature selection methods based on prediction performance via Random Forest (RF), Support Vector Machine (SVM) and $k$ Nearest Neighbour ( $k$ -NN) classifiers. Box-plots and Bar-plots of the results of the proposed method and all the other methods considered in the manuscript are also given. The Results show that the proposed method (RFish) performs better than all the other methods in majority of the cases. The paper gives a detailed simulation study to further assess the proposed method.

Keywords