Public Awareness and Sentiment Analysis of COVID-Related Discussions Using BERT-Based Infoveillance

Tianyi Xie; Yaorong Ge; Qian Xu; Shi Chen

doi:10.3390/ai4010016

AI (Mar 2023)

Public Awareness and Sentiment Analysis of COVID-Related Discussions Using BERT-Based Infoveillance

Tianyi Xie,
Yaorong Ge,
Qian Xu,
Shi Chen

Affiliations

Tianyi Xie: Department of Software and Information Systems, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
Yaorong Ge: Department of Software and Information Systems, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
Qian Xu: School of Communications, Elon University, Elon, NC 27244, USA
Shi Chen: Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA

DOI: https://doi.org/10.3390/ai4010016
Journal volume & issue: Vol. 4, no. 1
pp. 333 – 347

Abstract

Read online

Understanding different aspects of public concerns and sentiments during large health emergencies, such as the COVID-19 pandemic, is essential for public health agencies to develop effective communication strategies, deliver up-to-date and accurate health information, and mitigate potential impacts of emerging misinformation. Current infoveillance systems generally focus on discussion intensity (i.e., number of relevant posts) as an approximation of public awareness, while largely ignoring the rich and diverse information in texts with granular information of varying public concerns and sentiments. In this study, we address this grand challenge by developing a novel natural language processing (NLP) infoveillance workflow based on bidirectional encoder representation from transformers (BERT). We first used a smaller COVID-19 tweet sample to develop a content classification and sentiment analysis model using COVID-Twitter-BERT. The classification accuracy was between 0.77 and 0.88 across the five identified topics. In the sentiment analysis with a three-class classification task (positive/negative/neutral), BERT achieved decent accuracy, 0.7. We then applied the content topic and sentiment classifiers to a much larger dataset with more than 4 million tweets in a 15-month period. We specifically analyzed non-pharmaceutical intervention (NPI) and social issue content topics. There were significant differences in terms of public awareness and sentiment towards the overall COVID-19, NPI, and social issue content topics across time and space. In addition, key events were also identified to associate with abrupt sentiment changes towards NPIs and social issues. This novel NLP-based AI workflow can be readily adopted for real-time granular content topic and sentiment infoveillance beyond the health context.

Published in AI

ISSN: 2673-2688 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/ai

About the journal

Abstract

Keywords