IEEE Access (Jan 2018)

Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 &lt; \text{q} &lt; 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data

  • Ziyi Yang,
  • Yong Liang,
  • Hui Zhang,
  • Hua Chai,
  • Bowen Zhang,
  • Cheng Peng

DOI
https://doi.org/10.1109/ACCESS.2018.2880198
Journal volume & issue
Vol. 6
pp. 68586 – 68595

Abstract

Read online

Microarray technology is a popular technique that has been extensively applied in cancer diagnosis. Many studies have used high-dimensional microarray data to identify informative features to classify the types of cancer, yet numerous irrelevant features that exist in microarray data may introduce the noise and decrease classification accuracy. Regularization techniques are common methods for feature selection, which can be used to reduce irrelevant features and avoid overfitting. In recent years, different regularization methods have been proposed. Theoretically, the Lq (0 <; q <; 1) type penalty function with the lower value of q would acquire better sparse solutions. In addition, the loss function in most regression models is based on least-squares minimization. However, the least-square method is sensitive to noise and has poor robustness, especially when the error has a heavy-tailed distribution. It is well known that the least absolute deviation regression is the most common method for the robust regression, which can overcome the big noise problem. In general, there is a high level of noise in microarray data, which deter the development of microarray technology. To solve the above-mentioned problems, we propose a robust logistic regression based on the Lq (0 <; q <; 1) regularization approach, which is a feasible and effective approach for feature selection in microarray classification. The Lq (0 <; q <; 1) regularization leads to a non-convex optimization problem that is difficult to be solved. In this paper, we utilize a genetic algorithm based on the global search strategy to obtain an optimal solution.

Keywords