Fundamental Research (Jul 2024)
An ensemble machine learning model to uncover potential sites of hazardous waste illegal dumping based on limited supervision experience
Abstract
With the soaring generation of hazardous waste (HW) during industrialization and urbanization, HW illegal dumping continues to be an intractable global issue. Particularly in developing regions with lax regulations, it has become a major source of soil and groundwater contamination. One dominant challenge for HW illegal dumping supervision is the invisibility of dumping sites, which makes HW illegal dumping difficult to be found, thereby causing a long-term adverse impact on the environment. How to utilize the limited historic supervision records to screen the potential dumping sites in the whole region is a key challenge to be addressed. In this study, a novel machine learning model based on the positive-unlabeled (PU) learning algorithm was proposed to resolve this problem through the ensemble method which could iteratively mine the features of limited historic cases. Validation of the random forest-based PU model showed that the predicted top 30% of high-risk areas could cover 68.1% of newly reported cases in the studied region, indicating the reliability of the model prediction. This novel framework will also be promising in other environmental management scenarios to deal with numerous unknown samples based on limited prior experience.