Industrial Artificial Intelligence (Nov 2024)
Employing machine learning in water infrastructure management: predicting pipeline failures for improved maintenance and sustainable operations
Abstract
Abstract This study explores techniques for managing class imbalance in predictive modeling to forecast water pipe failures using XGBoost and logistic regression. Given the significant challenges posed by water pipeline failures—such as service disruptions, costly repairs, and environmental hazards—there is a pressing need for effective predictive models. Using a dataset from 2015 to 2022 that includes features like pipe age, material, diameter, and maintenance history, the study applies methods such as random oversampling and undersampling to improve model performance. Results show that XGBoost outperforms logistic regression in recall (0.795 vs. 0.683), a critical metric for managing water infrastructure. Although logistic regression has slightly better precision (0.695), XGBoost demonstrates superior overall performance with higher Matthews correlation coefficient (MCC) and F1 score, effectively balancing precision and recall. This research is essential as it addresses the need for robust predictive models to anticipate and mitigate water pipeline failures. By offering a comprehensive framework for managing large-scale datasets and showcasing how accurate predictions can reduce maintenance costs and water wastage, this study contributes to more efficient and sustainable water infrastructure management.
Keywords