Gong-kuang zidonghua (Dec 2020)

Multi-decision tree prediction model for coal seam floor water inrush based on cost-sensitive theory

  • LI Yanmin,
  • ZHOU Chenyang,
  • LI Fenglian

DOI
https://doi.org/10.13272/j.issn.1671-251x.2020060071
Journal volume & issue
Vol. 46, no. 12
pp. 76 – 83

Abstract

Read online

When predicting coal seam floor water inrush, the situation is generally divided into two states: safe state and water inrush state. The state data has non-equilibrium characteristics. The existing coal seam floor water inrush prediction models are mainly suitable for balanced data. In the context of processing unbalanced data sets, the results often show 'one-sided' phenomenon which means that the accuracy of safe state prediction is significantly higher than the accuracy of water inrush state, therefore the overall prediction performance is low. To address this problem, the multi-decision tree prediction model for coal seam floor water inrush based on cost-sensitive theory is established. In this model, each decision tree selects different water inrush factors as the root node of the single decision tree, and the node attribute selection criterion of single decision tree combines the cost-sensitive theory and Gini index, thus increasing the penalty for false prediction of water inrush data (minority of cases) and improving the prediction performance of water inrush. The rule set of single decision tree water inrush prediction model is obtained, and the rule set of the multi-decision tree water inrush prediction models are obtained by combining all the rules sets of single decision tree water inrush prediction models. The rule set of the multi-decision tree water inrush prediction models is used to obtain the prediction results of multiple water inrush data. Hence, the final prediction results are obtained based on the voting method and the minority obeying the majority principle. The experimental results show that as the penalty factors of the model increasing, the prediction result of the true positive rate presents a trend of first increasing and then decreasing. Compared with the single decision tree water inrush prediction model based on the classification regression tree (CART) algorithm, the true positive rate of the model can reach 93.06%, and the true negative class rate can reach 97.85%, and the accuracy rate is 96.25% with the data imbalance rate of 2 and the classification error penalty factor of 4. The performance is better than the performance of the water inrush prediction model based on the CART algorithm.When the data imbalance rate is increased to 6 and the penalty factor for classification error is set to 20, the positive class rate of both models reaches 100%. The negative class rate of this algorithm is 99.37% and the accuracy rate is 99.47%, which is still better than the performance of the CART-based water inrush prediction model. The experimental results validate the effectiveness of this model.

Keywords