SGLFormer: Spiking Global-Local-Fusion Transformer with high performance

Han Zhang; Han Zhang; Chenlin Zhou; Liutao Yu; Liwei Huang; Liwei Huang; Zhengyu Ma; Xiaopeng Fan; Xiaopeng Fan; Huihui Zhou; Yonghong Tian; Yonghong Tian

doi:10.3389/fnins.2024.1371290

Frontiers in Neuroscience (Mar 2024)

SGLFormer: Spiking Global-Local-Fusion Transformer with high performance

Han Zhang,
Han Zhang,
Chenlin Zhou,
Liutao Yu,
Liwei Huang,
Liwei Huang,
Zhengyu Ma,
Xiaopeng Fan,
Xiaopeng Fan,
Huihui Zhou,
Yonghong Tian,
Yonghong Tian

Affiliations

Han Zhang: AI Department, Peng Cheng Laboratory, Shenzhen, China
Han Zhang: Faculty of Computing, Harbin Institute of Technology, Harbin, China
Chenlin Zhou: AI Department, Peng Cheng Laboratory, Shenzhen, China
Liutao Yu: AI Department, Peng Cheng Laboratory, Shenzhen, China
Liwei Huang: AI Department, Peng Cheng Laboratory, Shenzhen, China
Liwei Huang: National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China
Zhengyu Ma: AI Department, Peng Cheng Laboratory, Shenzhen, China
Xiaopeng Fan: AI Department, Peng Cheng Laboratory, Shenzhen, China
Xiaopeng Fan: Faculty of Computing, Harbin Institute of Technology, Harbin, China
Huihui Zhou: AI Department, Peng Cheng Laboratory, Shenzhen, China
Yonghong Tian: AI Department, Peng Cheng Laboratory, Shenzhen, China
Yonghong Tian: National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China

DOI: https://doi.org/10.3389/fnins.2024.1371290
Journal volume & issue: Vol. 18

Abstract

Read online

IntroductionSpiking Neural Networks (SNNs), inspired by brain science, offer low energy consumption and high biological plausibility with their event-driven nature. However, the current SNNs are still suffering from insufficient performance.MethodsRecognizing the brain's adeptness at information processing for various scenarios with complex neuronal connections within and across regions, as well as specialized neuronal architectures for specific functions, we propose a Spiking Global-Local-Fusion Transformer (SGLFormer), that significantly improves the performance of SNNs. This novel architecture enables efficient information processing on both global and local scales, by integrating transformer and convolution structures in SNNs. In addition, we uncover the problem of inaccurate gradient backpropagation caused by Maxpooling in SNNs and address it by developing a new Maxpooling module. Furthermore, we adopt spatio-temporal block (STB) in the classification head instead of global average pooling, facilitating the aggregation of spatial and temporal features.ResultsSGLFormer demonstrates its superior performance on static datasets such as CIFAR10/CIFAR100, and ImageNet, as well as dynamic vision sensor (DVS) datasets including CIFAR10-DVS and DVS128-Gesture. Notably, on ImageNet, SGLFormer achieves a top-1 accuracy of 83.73% with 64 M parameters, outperforming the current SOTA directly trained SNNs by a margin of 6.66%.DiscussionWith its high performance, SGLFormer can support more computer vision tasks in the future. The codes for this study can be found in https://github.com/ZhangHanN1/SGLFormer.

Published in Frontiers in Neuroscience

ISSN: 1662-4548 (Print); 1662-453X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: http://www.frontiersin.org/neuroscience

About the journal

Abstract

Keywords