Ecological Indicators (Sep 2024)

FC-StackGNB: A novel machine learning modeling framework for forest fire risk prediction combining feature crosses and model fusion algorithm

  • Ye Su,
  • Longlong Zhao,
  • Xiaoli Li,
  • Hongzhong Li,
  • Yuankai Ge,
  • Jinsong Chen

Journal volume & issue
Vol. 166
p. 112577

Abstract

Read online

Forest fire risk prediction is a crucial link in maintaining forest ecological security. Machine learning, due to its powerful non-linear modeling capabilities, has been widely applied in forest fire risk prediction research. However, existing studies often focus on the direct information provided by multiple environmental factor features when constructing the feature space, while overlooking the deeper information conveyed by feature cross-correlations. Additionally, fire risk prediction predominantly relies on single-model forecasting, exhibiting slightly insufficient generalization and stability in models. Model fusion algorithms (MFA) can combine the advantages of multiple models to compensate for this limitation. In this study, a machine learning framework, FC-StackGNB, combining feature crosses (FC) and model fusion, is proposed. This framework employs the FC method to analyze the temporal trends of various environmental factors influencing fire occurrence, constructing multiple seasonal cross features (SCFs) capable of effectively capturing the non-linear relationship between environmental factors and time. Moreover, the framework develops a Gaussian Naive Bayes (GNB) optimized stacking MFA to fully leverage the strengths of different ML algorithms. Results demonstrate that the introduction of SCFs effectively enhances the prediction performance of six machine learning models, with the mean values of five evaluation metrics (Accuracy, Precision, Recall, F1-score, and ROC_AUC) increasing by 1.58% to 6.30%. The fusion model constructed based on the StackGNB algorithm can effectively handle the multicollinearity issue of features, exhibiting significantly better prediction performance than single models, particularly in improving the Recall metric (increasing by around 3% and 5% compared to the top two ranked single models respectively), which signifies the model’s ability to predict positive samples (i.e., high-risk fire areas). The proposed modeling framework effectively enhances the robustness and prediction performance of the models, offering new modeling insights for subsequent research. This study holds significant importance for enhancing the level of forest fire risk warning.

Keywords