Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text

Gonzalo Molpeceres Barrientos; Rocío Alaiz-Rodríguez; Víctor González-Castro; Andrew C. Parnell

doi:10.2991/ijcis.d.200519.003

International Journal of Computational Intelligence Systems (Jun 2020)

Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text

Gonzalo Molpeceres Barrientos,
Rocío Alaiz-Rodríguez,
Víctor González-Castro,
Andrew C. Parnell

Affiliations

Gonzalo Molpeceres Barrientos
Rocío Alaiz-Rodríguez
Víctor González-Castro
Andrew C. Parnell

DOI: https://doi.org/10.2991/ijcis.d.200519.003
Journal volume & issue: Vol. 13, no. 1

Abstract

Read online

Nowadays, children have access to Internet on a regular basis. Just like the real world, the Internet has many unsafe locations where kids may be exposed to inappropriate content in the form of obscene, aggressive, erotic or rude comments. In this work, we address the problem of detecting erotic/sexual content on text documents using Natural Language Processing (NLP) techniques. Following an approach based on Machine Learning techniques, we have assessed twelve models resulting from the combination of three text encoders (Bag of Words, Term Frequency-Inverse Document Frequency and Word2vec) together with four classifiers (Support Vector Machines (SVMs), Logistic Regression, k-Nearest Neighbors and Random Forests). We evaluated these alternatives on a new created dataset extracted from public data on the Reddit Website. The best performance result was achieved by the combination of the text encoder TF-IDF and the SVM classifier with linear kernel with an accuracy of 0.97 and F-score 0.96 (precision 0.96/recall 0.95). This study demonstrates that it is possible to detect erotic content on text documents and therefore, develop filters for minors or according to user's preferences.

Published in International Journal of Computational Intelligence Systems

ISSN: 1875-6891 (Print); 1875-6883 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.springer.com/journal/44196

About the journal

Abstract

Keywords