COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter

Martin Müller; Marcel Salathé; Per E. Kummervold

doi:10.3389/frai.2023.1023281

Frontiers in Artificial Intelligence (Mar 2023)

COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter

Martin Müller,
Marcel Salathé,
Per E. Kummervold

Affiliations

Martin Müller: Digital Epidemiology Lab, EPFL, Geneva, Switzerland
Marcel Salathé: Digital Epidemiology Lab, EPFL, Geneva, Switzerland
Per E. Kummervold: FISABIO-Public Health, Vaccine Research Department, Valencia, Spain

DOI: https://doi.org/10.3389/frai.2023.1023281
Journal volume & issue: Vol. 6

Abstract

Read online

IntroductionThis study presents COVID-Twitter-BERT (CT-BERT), a transformer-based model that is pre-trained on a large corpus of COVID-19 related Twitter messages. CT-BERT is specifically designed to be used on COVID-19 content, particularly from social media, and can be utilized for various natural language processing tasks such as classification, question-answering, and chatbots. This paper aims to evaluate the performance of CT-BERT on different classification datasets and compare it with BERT-LARGE, its base model.MethodsThe study utilizes CT-BERT, which is pre-trained on a large corpus of COVID-19 related Twitter messages. The authors evaluated the performance of CT-BERT on five different classification datasets, including one in the target domain. The model's performance is compared to its base model, BERT-LARGE, to measure the marginal improvement. The authors also provide detailed information on the training process and the technical specifications of the model.ResultsThe results indicate that CT-BERT outperforms BERT-LARGE with a marginal improvement of 10-30% on all five classification datasets. The largest improvements are observed in the target domain. The authors provide detailed performance metrics and discuss the significance of these results.DiscussionThe study demonstrates the potential of pre-trained transformer models, such as CT-BERT, for COVID-19 related natural language processing tasks. The results indicate that CT-BERT can improve the classification performance on COVID-19 related content, especially on social media. These findings have important implications for various applications, such as monitoring public sentiment and developing chatbots to provide COVID-19 related information. The study also highlights the importance of using domain-specific pre-trained models for specific natural language processing tasks. Overall, this work provides a valuable contribution to the development of COVID-19 related NLP models.

Published in Frontiers in Artificial Intelligence

ISSN: 2624-8212 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.frontiersin.org/journals/artificial-intelligence#

About the journal

Abstract

Keywords