Machine Learning and Knowledge Extraction (Mar 2022)

Comparison of Text Mining Models for Food and Dietary Constituent Named-Entity Recognition

  • Nadeesha Perera,
  • Thi Thuy Linh Nguyen,
  • Matthias Dehmer,
  • Frank Emmert-Streib

DOI
https://doi.org/10.3390/make4010012
Journal volume & issue
Vol. 4, no. 1
pp. 254 – 275

Abstract

Read online

Biomedical Named-Entity Recognition (BioNER) has become an essential part of text mining due to the continuously increasing digital archives of biological and medical articles. While there are many well-performing BioNER tools for entities such as genes, proteins, diseases or species, there is very little research into food and dietary constituent named-entity recognition. For this reason, in this paper, we study seven BioNER models for food and dietary constituents recognition. Specifically, we study a dictionary-based model, a conditional random fields (CRF) model and a new hybrid model, called FooDCoNER (Food and Dietary Constituents Named-Entity Recognition), which we introduce combining the former two models. In addition, we study deep language models including BERT, BioBERT, RoBERTa and ELECTRA. As a result, we find that FooDCoNER does not only lead to the overall best results, comparable with the deep language models, but FooDCoNER is also much more efficient with respect to run time and sample size requirements of the training data. The latter has been identified via the study of learning curves. Overall, our results not only provide a new tool for food and dietary constituent NER but also shed light on the difference between classical machine learning models and recent deep language models.

Keywords