Ecotoxicology and Environmental Safety (Sep 2022)
Database examination, multivariate analysis, and machine learning: Predictions of vapor intrusion attenuation factors
Abstract
Traditional soil vapor intrusion (VI) models usually rely on preset conceptual scenarios, simplifying the influences of limiting environmental covariates in determining indoor attenuation factors relative to subsurface sources. This study proposed a technical framework and applied it to predict VI attenuation factors based on site-specific parameters recorded in the United States Environmental Protection Agency (USEPA)’s and the California Environmental Protection Agency (CalEPA)’s VI databases, which can overcome the limitations of traditional VI models. We examined the databases with multivariate analysis of variance to identify effective covariates, which were then employed to develop VI models with three machine learning algorithms. The results of multivariate analysis show that the effective covariates include soil texture, source depth, foundation type, lateral separation, surface cover, and land use. Based on these covariates, the predicted attenuation factors by these new models are generally within one order of magnitude of the observations recorded in the databases. Then the developed models were employed to generate the generic indoor attenuation factors to subsurface vapor (i.e., the 95th percentile of selected dataset), the values of which are different between the USEPA’s and CalEPA’s databases by one order of magnitude, although comparable to recommendations by the USEPA and literature, respectively. Such a difference may reflect the significant regional disparity in factors such as building structures or operational conditions (e.g., indoor air exchange rates), which necessitates generating generic VI attenuation factors on a state-specific basis. This study provides an alternative for VI risk screens on a site-specific basis, especially in states with a good collection of datasets. Although the proposed technical framework is used for the VI databases, it can be equally applied to other environmental science problems.