Efficient Latent Space Compression for Lightning-Fast Fine-Tuning and Inference of Transformer-Based Models

Ala Alam Falaki; Robin Gras

doi:10.3390/make5030045

Machine Learning and Knowledge Extraction (Jul 2023)

Efficient Latent Space Compression for Lightning-Fast Fine-Tuning and Inference of Transformer-Based Models

Ala Alam Falaki,
Robin Gras

Affiliations

Ala Alam Falaki: School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada
Robin Gras: School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada

DOI: https://doi.org/10.3390/make5030045
Journal volume & issue: Vol. 5, no. 3
pp. 847 – 867

Abstract

Read online

This paper presents a technique to reduce the number of parameters in a transformer-based encoder–decoder architecture by incorporating autoencoders. To discover the optimal compression, we trained different autoencoders on the embedding space (encoder’s output) of several pre-trained models. The experiments reveal that reducing the embedding size has the potential to dramatically decrease the GPU memory usage while speeding up the inference process. The proposed architecture was included in the BART model and tested for summarization, translation, and classification tasks. The summarization results show that a 60% decoder size reduction (from 96 M to 40 M parameters) will make the inference twice as fast and use less than half of GPU memory during fine-tuning process with only a 4.5% drop in R-1 score. The same trend is visible for translation and partially for classification tasks. Our approach reduces the GPU memory usage and processing time of large-scale sequence-to-sequence models for fine-tuning and inference. The implementation and checkpoints are available on GitHub.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords