IEEE Access (Jan 2022)

Detection of Anorexic Girls-In Blog Posts Written in Hebrew Using a Combined Heuristic AI and NLP Method

  • Yaakov Hacohen-Kerner,
  • Natan Manor,
  • Michael Goldmeier,
  • Eytan Bachar

DOI
https://doi.org/10.1109/ACCESS.2022.3162685
Journal volume & issue
Vol. 10
pp. 34800 – 34814

Abstract

Read online

In this study, we aim to detect in social media texts written in Hebrew girls who are suspected of being anorexic. We constructed a dataset containing 100 blog posts written by females who are probably anorexic, and 100 blog posts written by females who are likely to be non-anorexic. The construction of this dataset was supervised and approved by an international expert on anorexia. We tested several text classification (TC) methods, using various feature sets (content-based and style-based), five machine learning (ML) methods, three RNN models, four BERT models, three basic preprocessing methods, three feature filtering methods, and parameter tuning. Several insights were found as follows. A set of 50-word n-grams (mostly word unigrams) given by an expert was found as a good basic detector. A heuristic process based on the random forest ML method has overcome a combinatorial explosion and led to significant improvement over a baseline result at a level of $\text{P}\,{=}$ .01. Application of an iterative process that tests combinations of “k out of $\text{n}'$ ” where $\text{n}'\,{ < }$ n (n is the number of feature sets) lead to a result of 90.63%, using a combination of 300 features from ten feature sets.

Keywords