IEEE Access (Jan 2024)

Skipformer: Evolving Beyond Blocks for Extensively Searching On-Device Language Models With Learnable Attention Window

  • Matthew Bodenham,
  • Jaeha Kung

DOI
https://doi.org/10.1109/ACCESS.2024.3420232
Journal volume & issue
Vol. 12
pp. 124428 – 124439

Abstract

Read online

Deployment of language models to resource-constrained edge devices is an uphill battle against their ever-increasing size. The task transferability of language models makes deployment to the edge an attractive application. Prior neural architecture search (NAS) works have produced hardware-efficient transformers, but often overlook some architectural features in favor of efficient NAS. We propose a novel evolutionary NAS with large and flexible search space to encourage the exploration of previously unexplored transformer architectures. Our search space allows architectures to vary through their depth and skip connections to transfer information anywhere inside the architecture; Skipformer, the top searched model, displays these novel architectural features. To further increase Skipformer efficiency, we learn a CUDA-accelerated attention window size at each self-attention layer during training. Skipformer achieves 23.3% speed up and requires 19.2% less memory on NVIDIA Jetson Nano with negligible accuracy loss on GLEU benchmark compared to GPT-2 Small.

Keywords