Nature Communications (Nov 2023)
Assessing the value of integrating national longitudinal shopping data into respiratory disease forecasting models
Abstract
Abstract The COVID-19 pandemic led to unparalleled pressure on healthcare services. Improved healthcare planning in relation to diseases affecting the respiratory system has consequently become a key concern. We investigated the value of integrating sales of non-prescription medications commonly bought for managing respiratory symptoms, to improve forecasting of weekly registered deaths from respiratory disease at local levels across England, by using over 2 billion transactions logged by a UK high street retailer from March 2016 to March 2020. We report the results from the novel AI (Artificial Intelligence) explainability variable importance tool Model Class Reliance implemented on the PADRUS model (Prediction of Amount of Deaths by Respiratory disease Using Sales). PADRUS is a machine learning model optimised to predict registered deaths from respiratory disease in 314 local authority areas across England through the integration of shopping sales data and focused on purchases of non-prescription medications. We found strong evidence that models incorporating sales data significantly out-perform other models that solely use variables traditionally associated with respiratory disease (e.g. sociodemographics and weather data). Accuracy gains are highest (increases in R2 (coefficient of determination) between 0.09 to 0.11) in periods of maximum risk to the general public. Results demonstrate the potential to utilise sales data to monitor population health with information at a high level of geographic granularity.