IEEE Access (Jan 2023)

Re-Introducing BN Into Transformers for Vision Tasks

  • Xue-Song Tang,
  • Xian-Lin Xie

DOI
https://doi.org/10.1109/ACCESS.2023.3283612
Journal volume & issue
Vol. 11
pp. 58462 – 58469

Abstract

Read online

In recent years, Transformer-based models have exhibited significant advancements over previous models in natural language processing and vision tasks. This powerful methodology has also been extended to the 3D point cloud domain, where it can mitigate the inherent difficulties posed by the irregular and disorderly nature of the point clouds. However, the attention mechanism within the Transformer presents challenges for utilizing Batch Normalization (BN), as statistical information cannot be extracted efficiently from the data set. Thus, this study proposes a novel residual structure, ResBN, which can effectively handle 3D data. Additionally, to replace BN in the transformer for 2D image processing, we introduce the Patch Normalization (PN) technique. ResBN and PN are evaluated on 3D point cloud and 2D image datasets respectively through statistical experiments, demonstrating their efficacy in enhancing classification performance.

Keywords