IEEE Access (Jan 2024)

Sparse Variable Selection on High Dimensional Heterogeneous Data With Tree Structured Responses

  • Hui Liu,
  • Xiang Liu,
  • Jing Diao,
  • Wenting Ye,
  • Xueling Liu,
  • Dehui Wei

DOI
https://doi.org/10.1109/ACCESS.2024.3384309
Journal volume & issue
Vol. 12
pp. 50779 – 50791

Abstract

Read online

We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taking on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and huge quantities of response variables. The heterogeneity is likely to confound the association between explanatory variables and responses, resulting in enormous false discoveries when Lasso or its variants are naïvely applied. Therefore, developing effective confounder correction methods is a growing heat point among researchers. However, ordinarily employing recent confounder correction methods will result in undesirable performance due to the ignorance of the convoluted interdependency among response variables. To fully improve current variable selection methods, we introduce a model, the tree-guided sparse linear mixed model, that can utilize the dependency information from multiple responses to explore how specifically clusters are and select the active variables from heterogeneous data. Through extensive experiments on synthetic and real data sets, we show that our proposed model outperforms the existing methods and achieves the highest ROC area.

Keywords