iScience (Feb 2024)
Machine learning coupled with causal inference to identify COVID-19 related chemicals that pose a high concern to drinking water
Abstract
Summary: Various synthetic substances were utilized in large quantities during the recent coronavirus pandemic, COVID-19. Some of these chemicals could potentially enter drinking water sources. Persistent, mobile, and toxic (PMT) substances have been recognized as a threat to drinking water resources. It has not yet been assessed how many COVID-19 related substances could be considered PMT substances. One reason is the lack of high-quality experimental data for the identification of PMT substances. To solve this problem, we applied a machine learning model to identify the PMT substances among COVID-19 related chemicals. The optimal model achieved an accuracy of 90.6% based on external test data. The model interpretation and causal inference indicated that our approach understood causation between PMT properties and molecular descriptors. Notably, the screening results showed that over 60% of the COVID-19 chemicals considered are candidate PMT substances, which should be prioritized to prevent undue pollution of water resources.