Heliyon (Jan 2024)

Prediction of gross calorific value from coal analysis using decision tree-based bagging and boosting techniques

  • Tanveer Alam Munshi,
  • Labiba Nusrat Jahan,
  • M. Farhad Howladar,
  • Mahamudul Hashan

Journal volume & issue
Vol. 10, no. 1
p. e23395

Abstract

Read online

The calorific value of any fuel is one of the crucial parameters to grade fuel's burning capability. The bomb calorimeter has historically been used to calculate coal's gross calorific value (GCV). However, for many years, engineers and scientists were trying to measure coal's GCV without a bomb calorimeter, using only laboratory-derived ultimate and/or proximate analyses to eliminate tedious and time-consuming laboratory analyses. In this study, Extra trees, Bagging, Decision tree, and Adaptive boosting are developed for the first time in coal's GCV modeling. In addition, the prediction and computational efficiency of previously applied decision tree-based algorithms, such as Random forest, Gradient boosting, and XGBoost are investigated. Well-established empirical models, namely Schuster, Mazumdar, Channiwala and Parikh, Parikh et al. and Central Fuel Research Institute of India are examined to compare their efficiency with newly developed algorithms. Proximate and ultimate analysis parameters are ranked based on their significance in GCV modeling. The studied models are tuned using an exhaustive grid search technique. Statistical indexes, such as explained variance (EV), mean absolute error (MAE), coefficient of determinant (R2), mean squared error (MSE), maximum error, minimum error, and mean absolute percentage error (MAPE) are used to critique these models. To accomplish the goals, 7430 data points containing ten coal features, such as ash, moisture, fixed carbon, volatile matter, hydrogen, carbon, sulfur, nitrogen, oxygen, and GCV are selected from the U.S. Geological Survey Coal Quality (COALQUAL) database. It has been found that, due to simplicity and location-specific constraints, empirical models could not correlate proximate and/or ultimate analyses with GCV. Bagging and boosting techniques tested here performed well with the coefficient of determinant (R2) of over 0.97. The XGBoost model outperforms other tree-based algorithms with the most significant coefficient of determinant (R2 of 0.9974) and lowest error values (MSE of 14703.3, max_error of 1027.2, MAE of 89.2, MAPE of 0.009). The studied models' ranking (highest to lowest) based on their performance are XGBoost, Extra trees, Random forest, Bagging, Gradient boosting, Decision tree, and Adaptive boosting. The correlation heatmap and scatterplots used here clearly indicate that oxygen and carbon are the utmost significant, whereas volatile matter and sulfur are the least essential rank parameters for GCV modeling. The strategy suggested in this research can aid engineers/operators in obtaining a rapid and accurate determination of the GCV with a few coal features, thus lessening complicated, tedious, expensive, and time-consuming laboratory efforts.

Keywords