IEEE Access (Jan 2022)

Deep Learning Based Cross Domain Sentiment Classification for Urdu Language

  • Amna Altaf,
  • Muhammad Waqas Anwar,
  • Muhammad Hasan Jamal,
  • Sana Hassan,
  • Usama Ijaz Bajwa,
  • Gyu Sang Choi,
  • Imran Ashraf

DOI
https://doi.org/10.1109/ACCESS.2022.3208164
Journal volume & issue
Vol. 10
pp. 102135 – 102147

Abstract

Read online

Sentiment analysis is a widely researched area due to its various applications in customer services, brand monitoring, and market research. Automatic sentiment classification is an important but challenging task. Contrary to the English language, sentiment analysis for low-resource languages like Urdu is an under-explored research area. Most of the work on sentiment analysis in the Urdu language is domain-dependent where models are mostly trained and tested on the same dataset on limited domains. However, sentiments in different domains are expressed differently, and manually annotating the datasets for all possible domains is unfeasible. Training a sentiment classifier using annotated data on one domain and testing it on another domain results in poor performance as the terms appearing in the source domain (training data) might not appear in the target (testing data) domain. In this paper, we present a baseline method for cross-domain sentiment analysis in the Urdu language using two different domains. Feature extraction is performed using n-grams and word embedding techniques. Sentiment classification is performed using machine learning and deep learning classifiers. The proposed method achieves an accuracy, precision, recall, and F1 scores of 0.77, 0.83, 0.68, and 0.75, respectively.

Keywords