Applied Sciences (May 2023)
Comparison between Machine Learning and Deep Learning Approaches for the Detection of Toxic Comments on Social Networks
Abstract
The way we communicate has been revolutionised by the widespread use of social networks. Any kind of online message can reach anyone in the world almost instantly. The speed with which information spreads is undoubtedly the strength of social networks, but at the same time, any user of these platforms can see how toxic messages spread in parallel with likes, comments and ratings about any person or entity. In such cases, the victim feels even more helpless and defenceless as a result of the rapid spread. For this reason, we have implemented an automatic detector of toxic messages on social media. This allows us to stop toxicity in its tracks and protect victims. In particular, the aim of the survey is to demonstrate how traditional Machine Learning methods of Natural Language Processing (NLP) work on equal terms with Deep Learning methods represented by a Transformer architecture and characterised by a higher computational cost. In particular, the paper describes the results obtained by testing different supervised Machine Learning classifiers (Logistic Regression, Random Forest and Support Vector Machine) combined with two topic-modelling techniques of NLP, (Latent Semantic Analysis and Latent Dirichlet Allocation). A pre-trained Transformer named BERTweet was also tested. All models performed well in this task, so much so that values close to or above 90% were achieved in terms of the F1 score evaluation metric. The best result achieved by Transformer BERTweet, 91.40%, was therefore not impressive in this context, as the performance gains are too small compared to the computational overhead.
Keywords