Zero-Shot Food Image Detection Based on Transformer

Jingru SONG; Weiqing MIN; Pengfei ZHOU; Quanrui RAO; Guorui SHENG; Yancun YANG; Lili WANG; Shuqiang JIANG

doi:10.13386/j.issn1002-0306.2024030027

Shipin gongye ke-ji (Nov 2024)

Zero-Shot Food Image Detection Based on Transformer

Jingru SONG,
Weiqing MIN,
Pengfei ZHOU,
Quanrui RAO,
Guorui SHENG,
Yancun YANG,
Lili WANG,
Shuqiang JIANG

Affiliations

Jingru SONG: School of Information and Electrical Engineering, Ludong University, Yantai 264025, China
Weiqing MIN: Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
Pengfei ZHOU: Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
Quanrui RAO: School of Information and Electrical Engineering, Ludong University, Yantai 264025, China
Guorui SHENG: School of Information and Electrical Engineering, Ludong University, Yantai 264025, China
Yancun YANG: School of Information and Electrical Engineering, Ludong University, Yantai 264025, China
Lili WANG: School of Information and Electrical Engineering, Ludong University, Yantai 264025, China
Shuqiang JIANG: Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

DOI: https://doi.org/10.13386/j.issn1002-0306.2024030027
Journal volume & issue: Vol. 45, no. 22
pp. 18 – 26

Abstract

Read online

As a fundamental task in food computing, food detection played a crucial role in locating and identifying food items from input images, particularly in applications such as intelligent canteen settlement and dietary health management. However, food categories were constantly updating in practical scenarios, making it difficult for food detectors trained on fixed categories to accurately detect previously unseen food categories. To address this issue, this paper proposed a zero-shot food image detection method. Firstly, a Transformer-based food primitive generator was constructed, where each primitive contained fine-grained attributes relevant to food categories. These primitives could be selectively assembled based on the food characteristics to synthesize new food features. Secondly, an enhancement component of visual feature disentanglement was proposed in order to impose more constraints on the visual features of unseen food categories. The visual features of food images were decomposed into semantically related features and semantically unrelated features, thereby better transferring semantic knowledge of food categories to their visual features. The proposed method was extensively evaluated on the ZSFooD and UEC-FOOD256 datasets through numerous experiments and ablation studies. Under the zero-shot detection (ZSD) setting, optimal average precision on unseen classes reached 4.9% and 24.1%, respectively, demonstrating the effectiveness of the proposed approach. Under the generalized zero-shot detection (GZSD) setting, the harmonic mean of visible and unseen classes reaches 5.8% and 22.0%, respectively, further validating the effectiveness of the proposed method.

Published in Shipin gongye ke-ji

ISSN: 1002-0306 (Print)
Publisher: The editorial department of Science and Technology of Food Industry
Country of publisher: China
LCC subjects: Technology: Chemical technology: Food processing and manufacture
Website: http://www.spgykj.com/indexen.htm

About the journal

Abstract

Keywords