Constructing Japanese Bullying Expression Dictionary for Automated Cyberbullying Detection on Twitter

Jianwei Zhang; Lin Li; Shinsuke Nakajima

doi:10.1142/s2196888822500373

Vietnam Journal of Computer Science (May 2023)

Constructing Japanese Bullying Expression Dictionary for Automated Cyberbullying Detection on Twitter

Jianwei Zhang,
Lin Li,
Shinsuke Nakajima

Affiliations

Jianwei Zhang: Faculty of Science and Engineering, Iwate University, Morioka 020-8551, Japan
Lin Li: School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, P. R. China
Shinsuke Nakajima: Faculty of Information Science and Engineering, Kyoto Sangyo University, Kyoto 603-8555, Japan

DOI: https://doi.org/10.1142/s2196888822500373
Journal volume & issue: Vol. 10, no. 02
pp. 135 – 158

Abstract

Read online

Cyberbullying has become a serious problem with the spread of personal computers, smartphones and SNS. In this paper, for automated cyberbullying detection on Twitter, we construct a Japanese bullying expression dictionary, which registers bullying words and their degrees related to bullying. The words registered in the dictionary are those that appear in the collected bullying-related tweets, and the bullying degrees attached to the words are calculated using Semantic Orientation Using Pointwise Mutual Information (SO-PMI). We also construct models to automatically classify bullying and non-bullying tweets by extracting multiple features including those drawn from the bullying expression dictionary and combining them with multiple machine learning algorithms. We evaluate the classification performance of bullying and non-bullying tweets using the constructed models. The experimental results show that the bullying expression dictionary can contribute to cyberbullying detection in most of the machine learning algorithms and that the best model can achieve an [Formula: see text]-measure value exceeding 0.9. We further investigate whether the periods of constructing bullying expression dictionaries affect the classification performance. The experimental results indicate that in contrast to the period of dictionary construction, the number of registered words has more immediate impact on classification performance.

Published in Vietnam Journal of Computer Science

ISSN: 2196-8888 (Print); 2196-8896 (Online)
Publisher: World Scientific Publishing
Country of publisher: Singapore
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.worldscientific.com/worldscinet/vjcs

About the journal

Abstract

Keywords