BIO Web of Conferences (Jan 2024)
Optimizing water quality classification using random forest and machine learning
Abstract
Water is the most precious and essential resource among all natural resources. With the increase in industrialization and human activities over recent decades, the state of water resources has been significantly impacted. Effective water quality monitoring has become a priority for cities worldwide. Modern technologies such as cloud computing, artificial intelligence, remote sensing, and the Internet of Things provide new opportunities to enhance water resource monitoring systems. This paper explores the application of the random forest model for water quality classification based on chemical attributes. The study includes three experiments: using the full set of features, excluding the pH feature, and using only the top three significant features. The random forest model trained on the full dataset achieved 100% accuracy. When the pH feature was excluded, the model maintained an accuracy of 76%, highlighting the importance of this feature but also showing the potential for compensation by other parameters. Using only the top three significant features (pH, conductivity, and nitrate), the model again achieved 100% accuracy. The results demonstrate that feature optimization without significant loss of model accuracy is a promising approach to improve water quality monitoring and assessment processes. This approach allows for reduced data collection time and costs while maintaining high predictive accuracy. The findings confirm that machine learning, particularly random forest models, can be effectively used for water quality classification, ultimately supporting better management and conservation of water resources.