Extending Context Window in Large Language Models with Segmented Base Adjustment for Rotary Position Embeddings

Rongsheng Li; Jin Xu; Zhixiong Cao; Hai-Tao Zheng; Hong-Gee Kim

doi:10.3390/app14073076

Applied Sciences (Apr 2024)

Extending Context Window in Large Language Models with Segmented Base Adjustment for Rotary Position Embeddings

Rongsheng Li,
Jin Xu,
Zhixiong Cao,
Hai-Tao Zheng,
Hong-Gee Kim

Affiliations

Rongsheng Li: Shenzhen International Graduate School, Tsinghua University, Shenzhen 518071, China
Jin Xu: Shenzhen International Graduate School, Tsinghua University, Shenzhen 518071, China
Zhixiong Cao: Shenzhen International Graduate School, Tsinghua University, Shenzhen 518071, China
Hai-Tao Zheng: Shenzhen International Graduate School, Tsinghua University, Shenzhen 518071, China
Hong-Gee Kim: School of Dentistry, Seoul National University, Seoul 03080, Republic of Korea

DOI: https://doi.org/10.3390/app14073076
Journal volume & issue: Vol. 14, no. 7
p. 3076

Abstract

Read online

In the realm of large language models (LLMs), extending the context window for long text processing is crucial for enhancing performance. This paper introduces SBA-RoPE (Segmented Base Adjustment for Rotary Position Embeddings), a novel approach designed to efficiently extend the context window by segmentally adjusting the base of rotary position embeddings (RoPE). Unlike existing methods, such as Position Interpolation (PI), NTK, and YaRN, SBA-RoPE modifies the base of RoPE across different dimensions, optimizing the encoding of positional information for extended sequences. Through experiments on the Pythia model, we demonstrate the effectiveness of SBA-RoPE in extending context windows, particularly for texts exceeding the original training lengths. We fine-tuned the Pythia-2.8B model on the PG-19 dataset and conducted passkey retrieval and perplexity (PPL) experiments on the Proof-pile dataset to evaluate model performance. Results show that SBA-RoPE maintains or improves model performance when extending the context window, especially on longer text sequences. Compared to other methods, SBA-RoPE exhibits superior or comparable performance across various lengths and tasks, highlighting its potential as an effective technique for context window extension in LLMs.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords