CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources

Gordana Ispirova; Gjorgjina Cenikj; Matevž Ogrinc; Eva Valenčič; Riste Stojanov; Peter Korošec; Ermanno Cavalli; Barbara Koroušić Seljak; Tome Eftimov

doi:10.3390/foods11172684

Foods (Sep 2022)

CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources

Gordana Ispirova,
Gjorgjina Cenikj,
Matevž Ogrinc,
Eva Valenčič,
Riste Stojanov,
Peter Korošec,
Ermanno Cavalli,
Barbara Koroušić Seljak,
Tome Eftimov

Affiliations

Gordana Ispirova: Computer Systems Department, Jožef Stefan Institute, 1000 Ljubljana, Slovenia
Gjorgjina Cenikj: Computer Systems Department, Jožef Stefan Institute, 1000 Ljubljana, Slovenia
Matevž Ogrinc: Computer Systems Department, Jožef Stefan Institute, 1000 Ljubljana, Slovenia
Eva Valenčič: Computer Systems Department, Jožef Stefan Institute, 1000 Ljubljana, Slovenia
Riste Stojanov: Faculty of Computer Science and Engineering, “Ss. Cyril and Methodius” University in Skopje, 1000 Skopje, North Macedonia
Peter Korošec: Computer Systems Department, Jožef Stefan Institute, 1000 Ljubljana, Slovenia
Ermanno Cavalli: Resources and Support Department, European Food Safety Authority, 43126 Parma, Italy
Barbara Koroušić Seljak: Computer Systems Department, Jožef Stefan Institute, 1000 Ljubljana, Slovenia
Tome Eftimov: Computer Systems Department, Jožef Stefan Institute, 1000 Ljubljana, Slovenia

DOI: https://doi.org/10.3390/foods11172684
Journal volume & issue: Vol. 11, no. 17
p. 2684

Abstract

Read online

Besides the numerous studies in the last decade involving food and nutrition data, this domain remains low resourced. Annotated corpuses are very useful tools for researchers and experts of the domain in question, as well as for data scientists for analysis. In this paper, we present the annotation process of food consumption data (recipes) with semantic tags from different semantic resources—Hansard taxonomy, FoodOn ontology, SNOMED CT terminology and the FoodEx2 classification system. FoodBase is an annotated corpus of food entities—recipes—which includes a curated version of 1000 instances, considered a gold standard. In this study, we use the curated version of FoodBase and two different approaches for annotating—the NCBO annotator (for the FoodOn and SNOMED CT annotations) and the semi-automatic StandFood method (for the FoodEx2 annotations). The end result is a new version of the golden standard of the FoodBase corpus, called the CafeteriaFCD (Cafeteria Food Consumption Data) corpus. This corpus contains food consumption data—recipes—annotated with semantic tags from the aforementioned four different external semantic resources. With these annotations, data interoperability is achieved between five semantic resources from different domains. This resource can be further utilized for developing and training different information extraction pipelines using state-of-the-art NLP approaches for tracing knowledge about food safety applications.

Published in Foods

ISSN: 2304-8158 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/foods

About the journal

Abstract

Keywords