Hydrology and Earth System Sciences (Jan 2022)

Preprocessing approaches in machine-learning-based groundwater potential mapping: an application to the Koulikoro and Bamako regions, Mali

  • V. Gómez-Escalonilla,
  • P. Martínez-Santos,
  • M. Martín-Loeches

DOI
https://doi.org/10.5194/hess-26-221-2022
Journal volume & issue
Vol. 26
pp. 221 – 243

Abstract

Read online

Groundwater is crucial for domestic supplies in the Sahel, where the strategic importance of aquifers will increase in the coming years due to climate change. Groundwater potential mapping is a valuable tool to underpin water management in the region and, hence, to improve drinking water access. This paper presents a machine learning method to map groundwater potential. This is illustrated through its application in two administrative regions of Mali. A set of explanatory variables for the presence of groundwater is developed first. Scaling methods (standardization, normalization, maximum absolute value and max–min scaling) are used to avoid the pitfalls associated with reclassification. Noisy, collinear and counterproductive variables are identified and excluded from the input dataset. A total of 20 machine learning classifiers are then trained and tested on a large borehole database (n=3345) in order to find meaningful correlations between the presence or absence of groundwater and the explanatory variables. Maximum absolute value and standardization proved the most efficient scaling techniques, while tree-based algorithms (accuracy >0.85) consistently outperformed other classifiers. The borehole flow rate data were then used to calibrate the results beyond standard machine learning metrics, thereby adding robustness to the predictions. The southern part of the study area presents the better groundwater prospect, which is consistent with the geological and climatic setting. Outcomes lead to three major conclusions: (1) picking the best performers out of a large number of machine learning classifiers is recommended as a good methodological practice, (2) standard machine learning metrics should be complemented with additional hydrogeological indicators whenever possible and (3) variable scaling contributes to minimize expert bias.