Mathematics (Jan 2021)

Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer

  • Juan C. Laria,
  • M. Carmen Aguilera-Morillo,
  • Enrique Álvarez,
  • Rosa E. Lillo,
  • Sara López-Taruella,
  • María del Monte-Millán,
  • Antonio C. Picornell,
  • Miguel Martín,
  • Juan Romo

DOI
https://doi.org/10.3390/math9030222
Journal volume & issue
Vol. 9, no. 3
p. 222

Abstract

Read online

Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced statistics and can use an agnostic point of view or include some a priori knowledge, but overfitting remains a problem. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data and a real dataset from a triple-negative breast cancer study.

Keywords