International Journal of Digital Earth (Jan 2019)

An ontology-based framework for extracting spatio-temporal influenza data using Twitter

  • Udaya K. Jayawardhana,
  • Pece V. Gorsevski

DOI
https://doi.org/10.1080/17538947.2017.1411535
Journal volume & issue
Vol. 12, no. 1
pp. 2 – 24

Abstract

Read online

Early detection of influenza outbreaks is one of the key priorities on a national level for preparedness and planning. This study presents the design and implementation of Fluwitter, which is a spatio-temporal web-based prototype framework for pseudo real-time detection of influenza outbreaks from Twitter. Specifically, the framework integrates PostgreSQL database server with PostGIS spatial extension, Twitter streaming client, pre-processor, tagger and similarity calculator for semantic information extraction (IE). The IE of tagged terms is supported by Natural Language Processing (NLP) techniques, DBpediaSpotlight and WordNet Similarity for Java (WS4J), while data analytics, visualization, and mapping are supported by GeoServer and other GIS Free Open Source Software (FOSS). The prototype was calibrated to maximize detection of influenza using rules developed from ontology-based semantic similarity scores. The Twitter-generated influenza cases were validated by weekly hospitalization records issued by Ohio Department of Health (ODH). The optimized rule produced a final F-measure value of 0.72 and accuracy (ACC) value of 94.4%. The validation suggested the existence of moderate correlations for the beginning of the time period Southeast region (r = 0.52), the Northwestern region (r = 0.38), and the Central region (r = 0.33) and weak correlations for the entire time period. The potential strengths and benefits of the prototype are shown through spatio-temporal assessment and visualization of influenza potential in Ohio.

Keywords