Journal of Informatics and Web Engineering (Mar 2022)
Performance of Sentiment Classification on Tweets of Clothing Brands
Abstract
Social media such as Facebook, Instagram, LinkedIn, and Twitter ease the sharing of ideas, thoughts, videos, and photos and information through the building of virtual networks and communities. This has allowed companies and products to reach a wider audience in terms of marketing and advertising, and to gauge feedback from the public. This research investigates clothing brand mentions on Twitter to perform sentiment analysis on users’ thoughts on three clothing brands, namely Asos, Uniqlo and Topshop. The data is collected by applying python libraries, Tweepy to access data from the Twitter streaming API. Following that, data pre-processing such as tokenization, filtering, stemming, and case normalization are performed to remove outliers. Then, the TextBlob algorithm is applied to label the tweet data into three classes; Positive, Negative and Neutral based on the polarity of the tweets. Word embeddings are also created using Word2Vec with TF-IDF. The word embeddings are fed into classification models namely Support Vector Machine (SVM), Naïve Bayes (NB), Random Forest (RF), Logistic Regression (LR) and Multilayer Perceptron (MLP) by comparing their accuracy performances. The models went through training and testing process on a curated tweet dataset comprising 24000 records with three clothing brands (Asos, Uniqlo, Topshop). The classification process was carried out by SVM, NB, RF, LR and MLP with a ratio of 50-50 and 70-30 train-test splits. Hyperparameter tuning was implemented by GridSearchCV to find the best parameters of classification models in order to optimize the best results. The evaluation of performance was measured with accuracy, precision, recall and F1-Score. In the 50-50 train-test splits, LR achieved the highest accuracy by scoring 82%, 87% and 87% on Asos, Uniqlo and Topshop respectively. In the 70-30 train-test splits, LR also achieved highest accuracy by scoring 85%, 90% and 90% for the three clothing brands respectively.
Keywords