Semantic Interaction Meta-Learning Based on Patch Matching Metric

Baoguo Wei; Xinyu Wang; Yuetong Su; Yue Zhang; Lixin Li

doi:10.3390/s24175620

Sensors (Aug 2024)

Semantic Interaction Meta-Learning Based on Patch Matching Metric

Baoguo Wei,
Xinyu Wang,
Yuetong Su,
Yue Zhang,
Lixin Li

Affiliations

Baoguo Wei: School of Electronic Information, Northwestern Polytechnical University, Xi’an 710129, China
Xinyu Wang: School of Electronic Information, Northwestern Polytechnical University, Xi’an 710129, China
Yuetong Su: School of Electronic Information, Northwestern Polytechnical University, Xi’an 710129, China
Yue Zhang: School of Electronic Information, Northwestern Polytechnical University, Xi’an 710129, China
Lixin Li: School of Electronic Information, Northwestern Polytechnical University, Xi’an 710129, China

DOI: https://doi.org/10.3390/s24175620
Journal volume & issue: Vol. 24, no. 17
p. 5620

Abstract

Read online

Metric-based meta-learning methods have demonstrated remarkable success in the domain of few-shot image classification. However, their performance is significantly contingent upon the choice of metric and the feature representation for the support classes. Current approaches, which predominantly rely on holistic image features, may inadvertently disregard critical details necessary for novel tasks, a phenomenon known as “supervision collapse”. Moreover, relying solely on visual features to characterize support classes can prove to be insufficient, particularly in scenarios involving limited sample sizes. In this paper, we introduce an innovative framework named Patch Matching Metric-based Semantic Interaction Meta-Learning (PatSiML), designed to overcome these challenges. To counteract supervision collapse, we have developed a patch matching metric strategy based on the Transformer architecture to transform input images into a set of distinct patch embeddings. This approach dynamically creates task-specific embeddings, facilitated by a graph convolutional network, to formulate precise matching metrics between the support classes and the query image patches. To enhance the integration of semantic knowledge, we have also integrated a label-assisted channel semantic interaction strategy. This strategy merges word embeddings with patch-level visual features across the channel dimension, utilizing a sophisticated language model to combine semantic understanding with visual information. Our empirical findings across four diverse datasets reveal that the PatSiML method achieves a classification accuracy improvement of 0.65% to 21.15% over existing methodologies, underscoring its robustness and efficacy.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords