Advances in Civil Engineering (Jan 2023)
Hybridization of Machine Learning Algorithms and an Empirical Regression Model for Predicting Debris-Flow-Endangered Areas
Abstract
Accurate delineation of debris-flow-endangered areas (e.g., the maximum runout distance) is a necessary prerequisite for the debris-flow risk assessment and countermeasures design. Recently, machine-learning models have been proved to be an effective tool in predicting debris-flow parameters. However, existing machine-learning models are generally developed based on a very limited number of observation data, which may result in the predictive model overfitting or underfitting. How to develop a robust model for accurate forecasting of debris-flow-endangered areas still remains a difficult task. This paper proposes a hybrid method for predicting debris-flow hazard zone by integrating machine-learning algorithms and an empirical regression model. The proposed method takes the calculated maximum runout distance obtained from the empirical model as supplementary inputs to increase the amount of training data to construct hybrid machine-learning models. Three commonly used machine-learning models (i.e., multivariate adaptive regression splines (MARS), random forest (RF), and support vector machine (SVM)) are developed based on the training datasets of a specific debris basin. Then, these three machine-learning models are combined with an empirical relationship developed using the same training datasets to generate corresponding hybrid models. Finally, the performance metrics (i.e., coefficient of determination (R2), root-mean-square error (RMSE), and mean absolute error (MAE)) of the proposed hybrid models are comprehensively investigated and compared with the single predictive model (i.e., MARS, RF, SVM, and the empirical model) under fivefold cross-validation. The proposed method is illustrated using 134 channelized debris-flow events in Sichuan province, China. Results show that compared with the three individual machine-learning models, hybridization of machine-learning algorithms and the empirical model results in R2, RMSE, and MAE improved by 70.5%, 32.9%, and 41.1%, respectively. In contrast to the empirical model, the R2, RMSE, and MAE value of the proposed hybrid models are improved by 29.6%, 22.3%, and 32.5%, respectively. The proposed hybrid models generally perform better than the single machine-learning and the empirical model, providing a promising tool for accurate forecasting of a debris-flow-endangered area.