COVID-19 Tweets Classification Based on a Hybrid Word Embedding Method

Yosra Didi; Ahlam Walha; Ali Wali

doi:10.3390/bdcc6020058

Big Data and Cognitive Computing (May 2022)

COVID-19 Tweets Classification Based on a Hybrid Word Embedding Method

Yosra Didi,
Ahlam Walha,
Ali Wali

Affiliations

Yosra Didi: Department of Computer Science, Umm Al-Qura University, Makkah 24243, Saudi Arabia
Ahlam Walha: Department of Computer Science, Umm Al-Qura University, Makkah 24243, Saudi Arabia
Ali Wali: REsearch Groups in Intelligent Machines (REGIM-Lab), National Engineering School of Sfax, University of Sfax, Sfax 3038, Tunisia

DOI: https://doi.org/10.3390/bdcc6020058
Journal volume & issue: Vol. 6, no. 2
p. 58

Abstract

Read online

In March 2020, the World Health Organisation declared that COVID-19 was a new pandemic. This deadly virus spread and affected many countries in the world. During the outbreak, social media platforms such as Twitter contributed valuable and massive amounts of data to better assess health-related decision making. Therefore, we propose that users’ sentiments could be analysed with the application of effective supervised machine learning approaches to predict disease prevalence and provide early warnings. The collected tweets were prepared for preprocessing and categorised into: negative, positive, and neutral. In the second phase, different features were extracted from the posts by applying several widely used techniques, such as TF-IDF, Word2Vec, Glove, and FastText to capture features’ datasets. The novelty of this study is based on hybrid features extraction, where we combined syntactic features (TF-IDF) with semantic features (FastText and Glove) to represent posts accurately, which helps in improving the classification process. Experimental results show that FastText combined with TF-IDF performed better with SVM than the other models. SVM outperformed the other models by 88.72%, as well as for XGBoost, with an 85.29% accuracy score. This study shows that the hybrid methods proved their capability of extracting features from the tweets and increasing the performance of classification.

Published in Big Data and Cognitive Computing

ISSN: 2504-2289 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology
Website: http://www.mdpi.com/journal/BDCC

About the journal

Abstract

Keywords