Efficient visual transformer transferring from neural ODE perspective

Hao Niu; Fengming Luo; Bo Yuan; Yi Zhang; Jianyong Wang

doi:10.1049/ell2.70015

Electronics Letters (Sep 2024)

Efficient visual transformer transferring from neural ODE perspective

Hao Niu,
Fengming Luo,
Bo Yuan,
Yi Zhang,
Jianyong Wang

Affiliations

Hao Niu: College of Computer Science Sichuan University Chengdu China
Fengming Luo: Department of Pulmonary and Critical Care Medicine, West China Hospital Sichuan University Chengdu China
Bo Yuan: General Practice Medical Center, West China Hospital Sichuan University Chengdu China
Yi Zhang: College of Computer Science Sichuan University Chengdu China
Jianyong Wang: College of Computer Science Sichuan University Chengdu China

DOI: https://doi.org/10.1049/ell2.70015
Journal volume & issue: Vol. 60, no. 17
pp. n/a – n/a

Abstract

Read online

Abstract Recently, the Visual Image Transformer (ViT) has revolutionized various domains in computer vision. The transfer of pre‐trained ViT models on large‐scale datasets has proven to be a promising method for downstream tasks. However, traditional transfer methods introduce numerous additional parameters in transformer blocks, posing new challenges in learning downstream tasks. This article proposes an efficient transfer method from the perspective of neural Ordinary Differential Equations (ODEs) to address this issue. On the one hand, the residual connections in the transformer layers can be interpreted as the numerical integration of differential equations. Therefore, the transformer block can be described as two explicit Euler method equations. By dynamically learning the step size in the explicit Euler equation, a highly lightweight method for transferring the transformer block is obtained. On the other hand, a new learnable neural memory ODE block is proposed by taking inspiration from the self‐inhibition mechanism in neural systems. It increases the diversity of dynamical behaviours of the neurons to transfer the head block efficiently and enhances non‐linearity simultaneously. Experimental results in image classification demonstrate that the proposed approach can effectively transfer ViT models and outperform state‐of‐the‐art methods.

Published in Electronics Letters

ISSN: 0013-5194 (Print); 1350-911X (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ietresearch.onlinelibrary.wiley.com/journal/1350911X

About the journal

Abstract

Keywords