IEEE Access (Jan 2020)

A Hybrid Machine Learning Pipeline for Automated Mapping of Events and Locations From Social Media in Disasters

  • Chao Fan,
  • Fangsheng Wu,
  • Ali Mostafavi

DOI
https://doi.org/10.1109/ACCESS.2020.2965550
Journal volume & issue
Vol. 8
pp. 10478 – 10490

Abstract

Read online

The objective of this study is to propose and test a hybrid machine learning pipeline to uncover the unfolding of disaster events corresponding to different locations from social media posts during disasters. Effective disaster response and recovery require a comprehensive understanding of disaster situations, i.e., unfolding of disaster events and geographic distribution of the disruptions. Existing studies have employed machine learning methods to conduct coarse-grained event detection and analyze the geographical location information from geotagged social media data. However, only a very small fraction of the entire set of social media data includes geotagged information, which may not directly correspond to events described in the content of posts. In addition, the coarse-grained information detected by existing approaches is token-based, which does not provide sufficient information for situation awareness. Hence, the detection of location and finer-grained event information could significantly improve the utility, credibility, and interpretability of social media data for situation awareness. To address these limitations, this study proposed a hybrid machine learning pipeline that makes use of all relevant tweets to uncover the evolution of disaster events across different locations. The pipeline integrates Named Entity Recognition for detecting locations mentioned in the posts, location fusion approach to extract coordinates of the locations and remove noise information, fine-tuned BERT model for classifying posts with humanitarian categories, and graph-based clustering to identify credible situational information. The application of the study is demonstrated using the data set collected from Twitter during the 2017 Hurricane Harvey in Houston. The results show the capability of the proposed hybrid pipeline for automated mapping of events across time and space from social media posts with considerable accuracy. The findings also suggest that the potential for forensic analysis of disasters using mapped events and their evolution, and based on the variation of social media attention to different locations in disasters. Hence, this method could provide a useful tool to support emergency managers, public officials, residents, first responders, and other stakeholders in rapid situation awareness across time and space.

Keywords