Machine Learning with Applications (Mar 2022)
Comparative analysis of contextual and context-free embeddings in disaster prediction from Twitter data
Abstract
Twitter is a social media site where people post their personal experiences, opinions, and news. Due to the ubiquitous real-time data availability, many rescue agencies monitor this data regularly to identify disasters, reduce risk, and save lives. However, it is impossible for humans to manually check the mass amount of data and identify disasters in real-time. For this purpose, many research have been proposed to present words in machine-understandable representations and apply machine learning methods on the word representations to identify the sentiment of a text. The previous research methods provide a single vector representation or embedding of a word from a given document. However, the recent advanced contextual embedding method (BERT — Bidirectional Encoder Representations from Transformers) constructs different vectors for the same word in different contexts. The BERT embeddings have been used successfully in various Natural Language Processing (NLP) tasks, yet there is no concrete analysis of how these representations are helpful in disaster-type tweet analysis. This research study explores the efficacy of the BERT embeddings on predicting disaster from Twitter data and compares these to traditional context-free word embedding methods. We provide both quantitative and qualitative results for this study. The results show that the contextual embeddings have the best results in disaster prediction task than the traditional word embeddings. Furthermore, we discuss the opportunities and challenges of contextual embeddings on sentiment analysis of Twitter data.