How Does a Transformer Learn Compression? An Attention Study on Huffman and LZ4

Beomseok Seo; Albert No

doi:10.1109/ACCESS.2023.3341512

IEEE Access (Jan 2023)

How Does a Transformer Learn Compression? An Attention Study on Huffman and LZ4

Beomseok Seo,
Albert No

Affiliations

Beomseok Seo: ORCiD; Department of Electronic and Electrical Engineering, Hongik University, Seoul, South Korea
Albert No: ORCiD; Department of Electronic and Electrical Engineering, Hongik University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2023.3341512
Journal volume & issue: Vol. 11
pp. 140559 – 140568

Abstract

Read online

Transformers have excelled in natural language processing and vision domains. This leads to the intriguing proposition: can Transformers be adapted to a more generalized framework, such as understanding general finite state machines? To explore this, we trained a Transformer network on compression algorithms such as Huffman and LZ4, viewing them as stepping stones towards mastering general finite state machines. Our analysis indicates that Transformers can adeptly internalize these methods and replicate essential states, echoing human-like interpretation via their attention mechanisms. This provides evidence of their capability to decipher practical finite state machines. Examining the attention maps offers insights into the Transformer’s methodology when engaging with these compression techniques. In Huffman encodings, the encoder predominantly focuses on input statistics to define the present state, which the decoder subsequently leverages to produce output bits. Remarkably, with a 2nd-order Markov input, the encoder’s attention is prominently directed at the previous two symbols, underscoring its proficiency in summarizing input statistics. The cross-attention maps further elucidate the exact association between input symbols and output bits. For the more complex LZ4 compression, the attention maps vividly portray the Transformer’s processing of input sequences and the close linkage between the input and resulting output bit stream. This study underscores the Transformer’s proficiency in comprehending compression algorithms and its keen ability to grasp input statistics, implying that its mechanisms, as illustrated by attention maps, provide profound interpretability.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords