Tsetlin Machine for Sentiment Analysis and Spam Review Detection in Chinese

Xuanyu Zhang; Hao Zhou; Ke Yu; Xiaofei Wu; Anis Yazidi

doi:10.3390/a16020093

Algorithms (Feb 2023)

Tsetlin Machine for Sentiment Analysis and Spam Review Detection in Chinese

Xuanyu Zhang,
Hao Zhou,
Ke Yu,
Xiaofei Wu,
Anis Yazidi

Affiliations

Xuanyu Zhang: School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
Hao Zhou: School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
Ke Yu: School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
Xiaofei Wu: School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
Anis Yazidi: Department of Computer Science, Oslo Metropolitan University, 0176 Oslo, Norway

DOI: https://doi.org/10.3390/a16020093
Journal volume & issue: Vol. 16, no. 2
p. 93

Abstract

Read online

In Natural Language Processing (NLP), deep-learning neural networks have superior performance but pose transparency and explainability barriers, due to their black box nature, and, thus, there is lack of trustworthiness. On the other hand, classical machine learning techniques are intuitive and easy to understand but often cannot perform satisfactorily. Fortunately, many research studies have recently indicated that the newly introduced model, Tsetlin Machine (TM), has reliable performance and, at the same time, enjoys human-level interpretability by nature, which is a promising approach to trade off effectiveness and interpretability. However, nearly all of the related works so far have concentrated on the English language, while research on other languages is relatively scarce. So, we propose a novel method, based on the TM model, in which the learning process is transparent and easily-understandable for Chinese NLP tasks. Our model can learn semantic information in the Chinese language by clauses. For evaluation, we conducted experiments in two domains, namely sentiment analysis and spam review detection. The experimental results showed thatm for both domains, our method could provide higher accuracy and a higher F1 score than complex, but non-transparent, deep-learning models, such as BERT and ERINE.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords