IEEE Access (Jan 2021)
Efficient Distributed Learning for Large-Scale Expectile Regression With Sparsity
Abstract
High-dimensional datasets often display heterogeneity due to heteroskedasticity or other forms of non-location-scale covariance effects. When the size of datasets becomes very large, it may be infeasible to store all of the high-dimensional datasets on one machine, or at least to keep the datasets in memory. In this paper, we consider penalized expectile regression using smoothly clipped absolute deviation (SCAD) and adaptive LASSO penalties, which can effectively detect the heteroskedasticity of high-dimensional data. We propose a communication-efficient approach for distributed sparsity learning, where observations are randomly partitioned across machines. By selecting the appropriate tuning parameters, we show that the proposed estimators display oracle properties. Extensive numerical experiments on both synthetic and real data validate the theoretical results and demonstrate the superior performance of our proposed method.
Keywords