Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review

Jireh Yi-Le Chan; Steven Mun Hong Leow; Khean Thye Bea; Wai Khuen Cheng; Seuk Wai Phoong; Zeng-Wei Hong; Yen-Lin Chen

doi:10.3390/math10081283

Mathematics (Apr 2022)

Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review

Jireh Yi-Le Chan,
Steven Mun Hong Leow,
Khean Thye Bea,
Wai Khuen Cheng,
Seuk Wai Phoong,
Zeng-Wei Hong,
Yen-Lin Chen

Affiliations

Jireh Yi-Le Chan: Faculty of Business and Finance, Universiti Tunku Abdul Rahman, Kampar 31900, Malaysia
Steven Mun Hong Leow: Faculty of Business and Finance, Universiti Tunku Abdul Rahman, Kampar 31900, Malaysia
Khean Thye Bea: Faculty of Business and Finance, Universiti Tunku Abdul Rahman, Kampar 31900, Malaysia
Wai Khuen Cheng: Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, Kampar 31900, Malaysia
Seuk Wai Phoong: Department of Management, Faculty of Business and Economics, Universiti Malaya, Kuala Lumpur 50603, Malaysia
Zeng-Wei Hong: Department of Information Engineering and Computer Science, Feng Chia University, Taichung 407102, Taiwan
Yen-Lin Chen: Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 106344, Taiwan

DOI: https://doi.org/10.3390/math10081283
Journal volume & issue: Vol. 10, no. 8
p. 1283

Abstract

Read online

Technologies have driven big data collection across many fields, such as genomics and business intelligence. This results in a significant increase in variables and data points (observations) collected and stored. Although this presents opportunities to better model the relationship between predictors and the response variables, this also causes serious problems during data analysis, one of which is the multicollinearity problem. The two main approaches used to mitigate multicollinearity are variable selection methods and modified estimator methods. However, variable selection methods may negate efforts to collect more data as new data may eventually be dropped from modeling, while recent studies suggest that optimization approaches via machine learning handle data with multicollinearity better than statistical estimators. Therefore, this study details the chronological developments to mitigate the effects of multicollinearity and up-to-date recommendations to better mitigate multicollinearity.

Published in Mathematics

ISSN: 2227-7390 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/mathematics

About the journal

Abstract

Keywords