Cross Lingual Sentiment Analysis: A Clustering-Based Bee Colony Instance Selection and Target-Based Feature Weighting Approach

Mohammed Abbas Mohammed Almansor; Chongfu Zhang; Wasiq Khan; Abir Hussain; Naji Alhusaini

doi:10.3390/s20185276

Sensors (Sep 2020)

Cross Lingual Sentiment Analysis: A Clustering-Based Bee Colony Instance Selection and Target-Based Feature Weighting Approach

Mohammed Abbas Mohammed Almansor,
Chongfu Zhang,
Wasiq Khan,
Abir Hussain,
Naji Alhusaini

Affiliations

Mohammed Abbas Mohammed Almansor: School of Information and Communication Engineering, Zhongshan Institute, University of Electronic Science and Technology of China, Chengdu 611731, China
Chongfu Zhang: School of Information and Communication Engineering, Zhongshan Institute, University of Electronic Science and Technology of China, Chengdu 611731, China
Wasiq Khan: Department of Computer Science, Liverpool John Moores University, Liverpool L33AF, UK
Abir Hussain: Department of Computer Science, Liverpool John Moores University, Liverpool L33AF, UK
Naji Alhusaini: Department of Computer Science, School of Computer Science and Technology, University of Science and Technology of China (USTC), Hefei 230026, China

DOI: https://doi.org/10.3390/s20185276
Journal volume & issue: Vol. 20, no. 18
p. 5276

Abstract

Read online

The lack of sentiment resources in poor resource languages poses challenges for the sentiment analysis in which machine learning is involved. Cross-lingual and semi-supervised learning approaches have been deployed to represent the most common ways that can overcome this issue. However, performance of the existing methods degrades due to the poor quality of translated resources, data sparseness and more specifically, language divergence. An integrated learning model that uses a semi-supervised and an ensembled model while utilizing the available sentiment resources to tackle language divergence related issues is proposed. Additionally, to reduce the impact of translation errors and handle instance selection problem, we propose a clustering-based bee-colony-sample selection method for the optimal selection of most distinguishing features representing the target data. To evaluate the proposed model, various experiments are conducted employing an English-Arabic cross-lingual data set. Simulations results demonstrate that the proposed model outperforms the baseline approaches in terms of classification performances. Furthermore, the statistical outcomes indicate the advantages of the proposed training data sampling and target-based feature selection to reduce the negative effect of translation errors. These results highlight the fact that the proposed approach achieves a performance that is close to in-language supervised models.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords