Minerals (Dec 2022)

Automated Hyperparameter Optimization of Gradient Boosting Decision Tree Approach for Gold Mineral Prospectivity Mapping in the Xiong’ershan Area

  • Mingjing Fan,
  • Keyan Xiao,
  • Li Sun,
  • Shuai Zhang,
  • Yang Xu

DOI
https://doi.org/10.3390/min12121621
Journal volume & issue
Vol. 12, no. 12
p. 1621

Abstract

Read online

The weak classifier ensemble algorithms based on the decision tree model, mainly include bagging (e.g., fandom forest-RF) and boosting (e.g., gradient boosting decision tree, eXtreme gradient boosting), the former reduces the variance for the overall generalization error reduction while the latter focuses on reducing the overall bias to that end. Because of its straightforward idea, it is prevalent in MPM (mineral prospectivity mapping). However, an inevitable problem in the application of such methods is the hyperparameters tuning which is a laborious and time-consuming task. The selection of hyperparameters suitable for a specific task is worth investigating. In this paper, a tree Parzen estimator-based GBDT (gradient boosting decision tree) model (TPE-GBDT) was introduced for hyperparameters tuning (e.g., loss criterion, n_estimators, learning_rate, max_features, subsample, max_depth, min_impurity_decrease). Then, the geological data of the gold deposit in the Xiong ‘ershan area was used to create training data for MPM and to compare the TPE-GBDT and random search-GBDT training results. Results showed that the TPE-GBDT model can obtain higher accuracy than random search-GBDT in a shorter time for the same parameter space, which proves that this algorithm is superior to random search in principle and more suitable for complex hyperparametric tuning. Subsequently, the validation measures, five-fold cross-validation, confusion matrix and success rate curves were employed to evaluate the overall performance of the hyperparameter optimization models. The results showed good scores for the predictive models. Finally, according to the maximum Youden index as the threshold to divide metallogenic potential areas and non-prospective areas, the high metallogenic prospect area (accounts for 10.22% of the total study area) derived by the TPE-GBDT model contained > 90% of the known deposits and provided a preferred range for future exploration work.

Keywords