Deep Learning Based Cross Domain Sentiment Classification for Urdu Language

Amna Altaf; Muhammad Waqas Anwar; Muhammad Hasan Jamal; Sana Hassan; Usama Ijaz Bajwa; Gyu Sang Choi; Imran Ashraf

doi:10.1109/ACCESS.2022.3208164

IEEE Access (Jan 2022)

Deep Learning Based Cross Domain Sentiment Classification for Urdu Language

Amna Altaf,
Muhammad Waqas Anwar,
Muhammad Hasan Jamal,
Sana Hassan,
Usama Ijaz Bajwa,
Gyu Sang Choi,
Imran Ashraf

Affiliations

Amna Altaf: ORCiD; Department of Computer Science, COMSATS University Islamabad–Lahore Campus, Lahore, Pakistan
Muhammad Waqas Anwar: ORCiD; Department of Computer Science, COMSATS University Islamabad–Lahore Campus, Lahore, Pakistan
Muhammad Hasan Jamal: ORCiD; Department of Computer Science, COMSATS University Islamabad–Lahore Campus, Lahore, Pakistan
Sana Hassan: ORCiD; Department of Computer Science, COMSATS University Islamabad–Lahore Campus, Lahore, Pakistan
Usama Ijaz Bajwa: ORCiD; Department of Computer Science, COMSATS University Islamabad–Lahore Campus, Lahore, Pakistan
Gyu Sang Choi: ORCiD; Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, South Korea
Imran Ashraf: ORCiD; Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, South Korea

DOI: https://doi.org/10.1109/ACCESS.2022.3208164
Journal volume & issue: Vol. 10
pp. 102135 – 102147

Abstract

Read online

Sentiment analysis is a widely researched area due to its various applications in customer services, brand monitoring, and market research. Automatic sentiment classification is an important but challenging task. Contrary to the English language, sentiment analysis for low-resource languages like Urdu is an under-explored research area. Most of the work on sentiment analysis in the Urdu language is domain-dependent where models are mostly trained and tested on the same dataset on limited domains. However, sentiments in different domains are expressed differently, and manually annotating the datasets for all possible domains is unfeasible. Training a sentiment classifier using annotated data on one domain and testing it on another domain results in poor performance as the terms appearing in the source domain (training data) might not appear in the target (testing data) domain. In this paper, we present a baseline method for cross-domain sentiment analysis in the Urdu language using two different domains. Feature extraction is performed using n-grams and word embedding techniques. Sentiment classification is performed using machine learning and deep learning classifiers. The proposed method achieves an accuracy, precision, recall, and F1 scores of 0.77, 0.83, 0.68, and 0.75, respectively.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords