Environmental Sciences Europe (Sep 2024)

Application of bagging and boosting ensemble machine learning techniques for groundwater potential mapping in a drought-prone agriculture region of eastern India

  • Krishnagopal Halder,
  • Amit Kumar Srivastava,
  • Anitabha Ghosh,
  • Ranajit Nabik,
  • Subrata Pan,
  • Uday Chatterjee,
  • Dipak Bisai,
  • Subodh Chandra Pal,
  • Wenzhi Zeng,
  • Frank Ewert,
  • Thomas Gaiser,
  • Chaitanya Baliram Pande,
  • Abu Reza Md. Towfiqul Islam,
  • Edris Alam,
  • Md Kamrul Islam

DOI
https://doi.org/10.1186/s12302-024-00981-y
Journal volume & issue
Vol. 36, no. 1
pp. 1 – 28

Abstract

Read online

Abstract Groundwater is a primary source of drinking water for billions worldwide. It plays a crucial role in irrigation, domestic, and industrial uses, and significantly contributes to drought resilience in various regions. However, excessive groundwater discharge has left many areas vulnerable to potable water shortages. Therefore, assessing groundwater potential zones (GWPZ) is essential for implementing sustainable management practices to ensure the availability of groundwater for present and future generations. This study aims to delineate areas with high groundwater potential in the Bankura district of West Bengal using four machine learning methods: Random Forest (RF), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), and Voting Ensemble (VE). The models used 161 data points, comprising 70% of the training dataset, to identify significant correlations between the presence and absence of groundwater in the region. Among the methods, Random Forest (RF) and Extreme Gradient Boosting (XGBoost) proved to be the most effective in mapping groundwater potential, suggesting their applicability in other regions with similar hydrogeological conditions. The performance metrics for RF are very good with a precision of 0.919, recall of 0.971, F1-score of 0.944, and accuracy of 0.943. This indicates a strong capability to accurately predict groundwater zones with minimal false positives and negatives. Adaptive Boosting (AdaBoost) demonstrated comparable performance across all metrics (precision: 0.919, recall: 0.971, F1-score: 0.944, accuracy: 0.943), highlighting its effectiveness in predicting groundwater potential areas accurately; whereas, Extreme Gradient Boosting (XGBoost) outperformed the other models slightly, with higher values in all metrics: precision (0.944), recall (0.971), F1-score (0.958), and accuracy (0.957), suggesting a more refined model performance. The Voting Ensemble (VE) approach also showed enhanced performance, mirroring XGBoost's metrics (precision: 0.944, recall: 0.971, F1-score: 0.958, accuracy: 0.957). This indicates that combining the strengths of individual models leads to better predictions. The groundwater potentiality zoning across the Bankura district varied significantly, with areas of very low potentiality accounting for 41.81% and very high potentiality at 24.35%. The uncertainty in predictions ranged from 0.0 to 0.75 across the study area, reflecting the variability in groundwater availability and the need for targeted management strategies. In summary, this study highlights the critical need for assessing and managing groundwater resources effectively using advanced machine learning techniques. The findings provide a foundation for better groundwater management practices, ensuring sustainable use and conservation in Bankura district and beyond.

Keywords