Does Part of Speech Have an Influence on Cyberbullying Detection?

Jingxiu Huang; Ruofei Ding; Yunxiang Zheng; Xiaomin Wu; Shumin Chen; Xiunan Jin

doi:10.3390/analytics3010001

Analytics (Dec 2023)

Does Part of Speech Have an Influence on Cyberbullying Detection?

Jingxiu Huang,
Ruofei Ding,
Yunxiang Zheng,
Xiaomin Wu,
Shumin Chen,
Xiunan Jin

Affiliations

Jingxiu Huang: School of Information Technology in Education, South China Normal University, Guangzhou 510631, China
Ruofei Ding: School of Information Technology in Education, South China Normal University, Guangzhou 510631, China
Yunxiang Zheng: School of Information Technology in Education, South China Normal University, Guangzhou 510631, China
Xiaomin Wu: School of Information Technology in Education, South China Normal University, Guangzhou 510631, China
Shumin Chen: School of Information Technology in Education, South China Normal University, Guangzhou 510631, China
Xiunan Jin: School of Information Technology in Education, South China Normal University, Guangzhou 510631, China

DOI: https://doi.org/10.3390/analytics3010001
Journal volume & issue: Vol. 3, no. 1
pp. 1 – 13

Abstract

Read online

With the development of the Internet, the issue of cyberbullying on social media has gained significant attention. Cyberbullying is often expressed in text. Methods of identifying such text via machine learning have been growing, most of which rely on the extraction of part-of-speech (POS) tags to improve their performance. However, the current study only arbitrarily used part-of-speech labels that it considered reasonable, without investigating whether the chosen part-of-speech labels can better enhance the effectiveness of the cyberbullying detection task. In other words, the effectiveness of different part-of-speech labels in the automatic cyberbullying detection task was not proven. This study aimed to investigate the part of speech in statements related to cyberbullying and explore how three classification models (random forest, naïve Bayes, and support vector machine) are sensitive to parts of speech in detecting cyberbullying. We also examined which part-of-speech combinations are most appropriate for the models mentioned above. The results of our experiments showed that the predictive performance of different models differs when using different part-of-speech tags as inputs. Random forest showed the best predictive performance, and naive Bayes and support vector machine followed, respectively. Meanwhile, across the different models, the sensitivity to different part-of-speech tags was consistent, with greater sensitivity shown towards nouns, verbs, and measure words, and lower sensitivity shown towards adjectives and pronouns. We also found that the combination of different parts of speech as inputs had an influence on the predictive performance of the models. This study will help researchers to determine which combination of part-of-speech categories is appropriate to improve the accuracy of cyberbullying detection.

Published in Analytics

ISSN: 2813-2203 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Science: Mathematics: Probabilities. Mathematical statistics
Website: https://www.mdpi.com/journal/analytics

About the journal

Abstract

Keywords