Applied Sciences (Nov 2023)

Boosting Lightweight Sentence Embeddings with Knowledge Transfer from Advanced Models: A Model-Agnostic Approach

  • Kadir Gunel,
  • Mehmet Fatih Amasyali

DOI
https://doi.org/10.3390/app132312586
Journal volume & issue
Vol. 13, no. 23
p. 12586

Abstract

Read online

In this study, we investigate knowledge transfer between two distinct sentence embedding models: a computationally demanding, highly performant model and a lightweight model derived from word vector averaging. Our objective is to augment the representational power of the lightweight model by exploiting the sophisticated features of the robust model. Diverging from traditional knowledge distillation methods that align logits or hidden states of teacher and student models, our approach uses only the output sentence vectors of the teacher model for the alignment with the student models’s word vector representations. We implement two minimization techniques for this purpose: distance minimization and distance and perplexity minimization Our methodology uses WMT datasets for training, and the enhanced embeddings are validated via Google’s Analogy tasks and Meta’s SentEval datasets. We found that our proposed models intriguingly retained and conveyed information in a model-specific fashion.

Keywords