Environmental Data Science (Jan 2024)
Informing synthetic passive microwave predictions through Bayesian deep learning with uncertainty decomposition
Abstract
Space-borne passive microwave (PMW) data provide rich information on atmospheric state, including cloud structure and underlying surface properties. However, PMW data are sparse and limited due to low Earth orbit collection, resulting in coarse Earth system sampling. This study demonstrates that Bayesian deep learning (BDL) is a promising technique for predicting synthetic microwave (MW) data and its uncertainties from more ubiquitously available geostationary infrared observations. Our BDL models decompose predicted uncertainty into aleatoric (irreducible) and epistemic (reducible) components, providing insights into uncertainty origin and guiding model improvement. Low and high aleatoric uncertainty values are characteristic of clear sky and cloudy regions, respectively, suggesting that expanding the input feature vector to allow richer information content could improve model performance. The initially high average epistemic uncertainty metrics quantified by most models indicate that the training process would benefit from a greater data volume, leading to improved performance at most studied MW frequencies. Using quantified epistemic uncertainty to select the most useful additional training data (a training dataset size increase of 3.6%), the study reduced the mean absolute error and root mean squared error by 1.74% and 1.38%, respectively. The broader impact of this study is the demonstration of how predicted epistemic uncertainty can be used to select targeted training data. This allows for the curation of smaller, more optimized training datasets and also allows for future active learning studies.
Keywords