PLoS ONE (Jan 2024)
A novel maximum likelihood based probabilistic behavioral data fusion algorithm for modeling residential energy consumption.
Abstract
The current research effort is focused on improving the effective use of the multiple disparate sources of data available by proposing a novel maximum likelihood based probabilistic data fusion approach for modeling residential energy consumption. To demonstrate our data fusion algorithm, we consider energy usage by fuel type variables (for electricity and natural gas) in residential dwellings as our dependent variable of interest, drawn from residential energy consumption survey (RECS) data. The national household travel survey (NHTS) dataset was considered to incorporate additional variables that are not available in the RECS data. With a focus on improving the model for the residential energy use by fuel type, our proposed research provides a probabilistic mechanism for appropriately fusing records from the NHTS data with the RECS data. Specifically, instead of strictly matching records with only common attributes, we propose a flexible differential weighting method (probabilistic) based on attribute similarity (or dissimilarity) across the common attributes for the two datasets. The fused dataset is employed to develop an updated model of residential energy use with additional independent variables contributed from the NHTS dataset. The newly estimated energy use model is compared with models estimated RECS data exclusively to see if there is any improvement offered by the newly fused variables. In our analysis, the model fit measures provide strong evidence for model improvement via fusion as well as weighted contribution estimation, thus highlighting the applicability of our proposed fusion algorithm. The analysis is further augmented through a validation exercise that provides evidence that the proposed algorithm offers enhanced explanatory power and predictive capability for the modeling energy use. Our proposed data fusion approach can be widely applied in various sectors including the use of location-based smartphone data to analyze mobility and ridehailing patterns that are likely to influence energy consumption with increasing electric vehicle (EV) adoption.