Benefits from Variational Regularization in Language Models

Cornelia Ferner; Stefan Wegenkittl

doi:10.3390/make4020025

Machine Learning and Knowledge Extraction (Jun 2022)

Benefits from Variational Regularization in Language Models

Cornelia Ferner,
Stefan Wegenkittl

Affiliations

Cornelia Ferner: Information Technology and Systems Management, Salzburg University of Applied Sciences, Urstein Sued 1, 5412 Puch/Hallein, Austria
Stefan Wegenkittl: Information Technology and Systems Management, Salzburg University of Applied Sciences, Urstein Sued 1, 5412 Puch/Hallein, Austria

DOI: https://doi.org/10.3390/make4020025
Journal volume & issue: Vol. 4, no. 2
pp. 542 – 555

Abstract

Read online

Representations from common pre-trained language models have been shown to suffer from the degeneration problem, i.e., they occupy a narrow cone in latent space. This problem can be addressed by enforcing isotropy in latent space. In analogy with variational autoencoders, we suggest applying a token-level variational loss to a Transformer architecture and optimizing the standard deviation of the prior distribution in the loss function as the model parameter to increase isotropy. The resulting latent space is complete and interpretable: any given point is a valid embedding and can be decoded into text again. This allows for text manipulations such as paraphrase generation directly in latent space. Surprisingly, features extracted at the sentence level also show competitive results on benchmark classification tasks.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords