Skipformer: Evolving Beyond Blocks for Extensively Searching On-Device Language Models With Learnable Attention Window

Matthew Bodenham; Jaeha Kung

doi:10.1109/ACCESS.2024.3420232

IEEE Access (Jan 2024)

Skipformer: Evolving Beyond Blocks for Extensively Searching On-Device Language Models With Learnable Attention Window

Matthew Bodenham,
Jaeha Kung

Affiliations

Matthew Bodenham: ORCiD; Department of Electrical Engineering and Computer Science, DGIST, Daegu, Republic of Korea
Jaeha Kung: ORCiD; School of Electrical Engineering, Korea University, Seoul, Republic of Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3420232
Journal volume & issue: Vol. 12
pp. 124428 – 124439

Abstract

Read online

Deployment of language models to resource-constrained edge devices is an uphill battle against their ever-increasing size. The task transferability of language models makes deployment to the edge an attractive application. Prior neural architecture search (NAS) works have produced hardware-efficient transformers, but often overlook some architectural features in favor of efficient NAS. We propose a novel evolutionary NAS with large and flexible search space to encourage the exploration of previously unexplored transformer architectures. Our search space allows architectures to vary through their depth and skip connections to transfer information anywhere inside the architecture; Skipformer, the top searched model, displays these novel architectural features. To further increase Skipformer efficiency, we learn a CUDA-accelerated attention window size at each self-attention layer during training. Skipformer achieves 23.3% speed up and requires 19.2% less memory on NVIDIA Jetson Nano with negligible accuracy loss on GLEU benchmark compared to GPT-2 Small.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords