PLoS ONE (Jan 2020)

Machine learning for syndromic surveillance using veterinary necropsy reports.

  • Nathan Bollig,
  • Lorelei Clarke,
  • Elizabeth Elsmo,
  • Mark Craven

DOI
https://doi.org/10.1371/journal.pone.0228105
Journal volume & issue
Vol. 15, no. 2
p. e0228105

Abstract

Read online

The use of natural language data for animal population surveillance represents a valuable opportunity to gather information about potential disease outbreaks, emerging zoonotic diseases, or bioterrorism threats. In this study, we evaluate machine learning methods for conducting syndromic surveillance using free-text veterinary necropsy reports. We train a system to detect if a necropsy report from the Wisconsin Veterinary Diagnostic Laboratory contains evidence of gastrointestinal, respiratory, or urinary pathology. We evaluate the performance of several machine learning algorithms including deep learning with a long short-term memory network. Although no single algorithm was superior, random forest using feature vectors of TF-IDF statistics ranked among the top-performing models with F1 scores of 0.923 (gastrointestinal), 0.960 (respiratory), and 0.888 (urinary). This model was applied to over 33,000 necropsy reports and was used to describe temporal and spatial features of diseases within a 14-year period, exposing epidemiological trends and detecting a potential focus of gastrointestinal disease from a single submitting producer in the fall of 2016.