Complexity (Jan 2021)
Comparing the Forecast Performance of Advanced Statistical and Machine Learning Techniques Using Huge Big Data: Evidence from Monte Carlo Experiments
Abstract
This research compares factor models based on principal component analysis (PCA) and partial least squares (PLS) with Autometrics, elastic smoothly clipped absolute deviation (E-SCAD), and minimax concave penalty (MCP) under different simulated schemes like multicollinearity, heteroscedasticity, and autocorrelation. The comparison is made with varying sample size and covariates. We found that in the presence of low and moderate multicollinearity, MCP often produces superior forecasts in contrast to small sample case, whereas E-SCAD remains better. In the case of high multicollinearity, the PLS-based factor model remained dominant, but asymptotically the prediction accuracy of E-SCAD significantly enhances compared to other methods. Under heteroscedasticity, MCP performs very well and most of the time beats the rival methods. In some circumstances under large samples, Autometrics provides a similar forecast as MCP. In the presence of low and moderate autocorrelation, MCP shows outstanding forecasting performance except for the small sample case, whereas E-SCAD produces a remarkable forecast. In the case of extreme autocorrelation, E-SCAD outperforms the rival techniques under both the small and medium samples, but further augmentation in sample size enables MCP forecast more accurate comparatively. To compare the predictive ability of all methods, we split the data into two halves (i.e., data over 1973–2007 as training data and data over 2008–2020 as testing data). Based on the root mean square error and mean absolute error, the PLS-based factor model outperforms the competitor models in terms of forecasting performance.