Scientific Reports (Jul 2024)
Modified sparse regression to solve heterogeneity and hybrid models for increasing the prediction accuracy of seaweed big data with outliers
Abstract
Abstract The linear regression is critical for data modelling, especially for scientists. Nevertheless, with the plenty of high-dimensional data, there are data with more explanatory variables than the number of observations. In such circumstances, traditional approaches fail. This paper proposes a modified sparse regression model that solves the problem of heterogeneity using seaweed big data as a use case. The modified heterogeneity models for ridge, LASSO and Elastic net were used to model the data. Robust estimations M Bi-Square, M Hampel, M Huber, MM and S were used. Based on the results, the hybrid model of sparse regression for before, after, and modified heterogeneity robust regression with the 45 high ranking variables and a 2-sigma limit can be used efficiently and effectively to reduce the outliers. The obtained results confirm that the hybrid model of the modified sparse LASSO with the M Bi-Square estimator for the 45 high ranking parameters performed better compared with other existing methods.
Keywords