Urdu Sentiment Analysis With Deep Learning Methods

Lal Khan; Ammar Amjad; Noman Ashraf; Hsien-Tsung Chang; Alexander Gelbukh

doi:10.1109/ACCESS.2021.3093078

IEEE Access (Jan 2021)

Urdu Sentiment Analysis With Deep Learning Methods

Lal Khan,
Ammar Amjad,
Noman Ashraf,
Hsien-Tsung Chang,
Alexander Gelbukh

Affiliations

Lal Khan: ORCiD; Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan
Ammar Amjad: Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan
Noman Ashraf: Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional, Ciudad de México, Mexico
Hsien-Tsung Chang: ORCiD; Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan
Alexander Gelbukh: ORCiD; Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional, Ciudad de México, Mexico

DOI: https://doi.org/10.1109/ACCESS.2021.3093078
Journal volume & issue: Vol. 9
pp. 97803 – 97812

Abstract

Read online

Although over 169 million people in the world are familiar with the Urdu language and a large quantity of Urdu data is being generated on different social websites daily, very few research studies and efforts have been completed to build language resources for the Urdu language and examine user sentiments. The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment. To find the best technique, we compare two modes of text representation: count-based, where the text is represented using word $n$ -gram feature vectors and the second one is based on fastText pre-trained word embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP, LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types. Our study shows that the combination of word $n$ -gram features with LR outperformed other classifiers for sentiment analysis task, obtaining the highest F1 score of 82.05% using combination of features.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords