Fractal Self-Similarity in Semantic Convergence: Gradient of Embedding Similarity across Transformer Layers

Minhyeok Lee

doi:10.3390/fractalfract8100552

Fractal and Fractional (Sep 2024)

Fractal Self-Similarity in Semantic Convergence: Gradient of Embedding Similarity across Transformer Layers

Minhyeok Lee

Affiliations

Minhyeok Lee: School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea

DOI: https://doi.org/10.3390/fractalfract8100552
Journal volume & issue: Vol. 8, no. 10
p. 552

Abstract

Read online

This paper presents a mathematical analysis of semantic convergence in transformer-based language models, drawing inspiration from the concept of fractal self-similarity. We introduce and prove a novel theorem characterizing the gradient of embedding similarity across layers. Specifically, we establish that there exists a monotonically increasing function that provides a lower bound on the rate at which the average cosine similarity between token embeddings at consecutive layers and the final layer increases. This establishes a fundamental property: semantic alignment of token representations consistently increases through the network, exhibiting a pattern of progressive refinement, analogous to fractal self-similarity. The key challenge addressed is the quantification and generalization of semantic convergence across diverse model architectures and input contexts. To validate our findings, we conduct experiments on BERT and DistilBERT models, analyzing embedding similarities for diverse input types. While our experiments are limited to these models, we empirically demonstrate consistent semantic convergence within these architectures. Quantitatively, we find that the average rates of semantic convergence are approximately 0.0826 for BERT and 0.1855 for DistilBERT. We observe that the rate of convergence varies based on token frequency and model depth, with rare words showing slightly higher similarities (differences of approximately 0.0167 for BERT and 0.0120 for DistilBERT). This work advances our understanding of transformer models’ internal mechanisms and provides a mathematical framework for comparing and optimizing model architectures.

Published in Fractal and Fractional

ISSN: 2504-3110 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Physics: Heat: Thermodynamics; Science: Mathematics: Analysis
Website: http://www.mdpi.com/journal/fractalfract

About the journal

Abstract

Keywords