PLoS ONE (Jan 2022)

Inclusion of environmentally themed search terms improves Elastic net regression nowcasts of regional Lyme disease rates

  • Eric Kontowicz,
  • Grant Brown,
  • James Torner,
  • Margaret Carrel,
  • Kelly K. Baker,
  • Christine A. Petersen

Journal volume & issue
Vol. 17, no. 3

Abstract

Read online

Lyme disease is the most widely reported vector-borne disease in the United States. 95% of confirmed human cases are reported in the Northeast and upper Midwest (25,778 total confirmed cases from Northeast and upper Midwest / 27,203 total US confirmed cases). Human cases typically occur in the spring and summer months when an infected nymph Ixodid tick takes a blood meal. Current federal surveillance strategies report data on an annual basis, leading to nearly a year lag in national data reporting. These lags in reporting make it difficult for public health agencies to assess and plan for the current burden of Lyme disease. Implementation of a nowcasting model, using historical data to predict current trends, provides a means for public health agencies to evaluate current Lyme disease burden and make timely priority-based budgeting decisions. The objective of the study was to develop and compare the performance of nowcasting models using free data from Google Trends and Centers of Disease Control and Prevention surveillance reports. We developed two sets of elastic net models for five regions of the United States: 1. Using only monthly proportional hit data from the 21 disease symptoms and tick related terms, and 2. Using monthly proportional hit data from terms identified via Google correlate and the disease symptom and vector terms. Elastic net models using the full-term list were highly accurate (Root Mean Square Error: 0.74, Mean Absolute Error: 0.52, R2: 0.97) for four of the five regions of the United States and improved accuracy 1.33-fold while reducing error 0.5-fold compared to predictions from models using disease symptom and vector terms alone. Many of the terms included and found to be important for model performance were environmentally related. These models can be implemented to help local and state public health agencies accurately monitor Lyme disease burden during times of reporting lag from federal public health reporting agencies.