Analysis of sentiments on the onset of Covid-19 using Machine Learning Techniques

Vishakha Arya; Amit Kumar Mishra Mishra; Alfonso González-Briones

doi:10.14201/adcaij.27348

Advances in Distributed Computing and Artificial Intelligence Journal (Jun 2022)

Analysis of sentiments on the onset of Covid-19 using Machine Learning Techniques

Vishakha Arya,
Amit Kumar Mishra Mishra,
Alfonso González-Briones

Affiliations

Vishakha Arya: School of Computing, DIT University, Dehradun
Amit Kumar Mishra Mishra: School of Computing, DIT University, Dehradun
Alfonso González-Briones: Research Group on Agent-Based, Social and Interdisciplinary Applications (GRASIA), Complutense University of Madrid

DOI: https://doi.org/10.14201/adcaij.27348
Journal volume & issue: Vol. 11, no. 1
pp. 45 – 63

Abstract

Read online

The novel coronavirus (Covid-19) pandemic has struck the whole world and is one of the most striking topics on social media platforms. Sentiment outbreak on social media enduring various thoughts, opinions, and emotions about the Covid-19 disease, expressing views they are feeling presently. Analyzing sentiments helps to yield better results. Gathering data from different blogging sites like Facebook, Twitter, Weibo, YouTube, Instagram, etc., and Twitter is the largest repository. Videos, text, and audio were also collected from repositories. Sentiment analysis uses opinion mining to acquire the sentiments of its users and categorizes them accordingly as positive, negative, and neutral. Analytical and machine learning classification is implemented to 3586 tweets collected in different time frames. In this paper, sentiment analysis was performed on tweets accumulated during the Covid-19 pandemic, Coronavirus disease. Tweets are collected from the Twitter database using Hydrator a web-based application. Data-preprocessing removes all the noise, outliers from the raw data. With Natural Language Toolkit (NLTK), text classification for sentiment analysis and calculate the score subjective polarity, counts, and sentiment distribution. N-gram is used in textual mining -and Natural Language Processing for a continuous sequence of words in a text or document applying uni-gram, bi-gram, and tri-gram for statistical computation. Term frequency and Inverse document frequency (TF-IDF) is a feature extraction technique that converts textual data into numeric form. Vectorize data feed to our model to obtain insights from linguistic data. Linear SVC, MultinomialNB, GBM, and Random Forest classifier with Tfidf classification model applied to our proposed model. Linear Support Vector classification performs better than the other two classifiers. Results depict that RF performs better.

Published in Advances in Distributed Computing and Artificial Intelligence Journal

ISSN: 2255-2863 (Online)
Publisher: Ediciones Universidad de Salamanca
Country of publisher: Spain
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://revistas.usal.es/index.php/2255-2863/

About the journal

Abstract

Keywords