Sensors (Feb 2024)

FV-MViT: Mobile Vision Transformer for Finger Vein Recognition

  • Xiongjun Li,
  • Jin Feng,
  • Jilin Cai,
  • Guowen Lin

DOI
https://doi.org/10.3390/s24041331
Journal volume & issue
Vol. 24, no. 4
p. 1331

Abstract

Read online

In addressing challenges related to high parameter counts and limited training samples for finger vein recognition, we present the FV-MViT model. It serves as a lightweight deep learning solution, emphasizing high accuracy, portable design, and low latency. The FV-MViT introduces two key components. The Mul-MV2 Block utilizes a dual-path inverted residual connection structure for multi-scale convolutions, extracting additional local features. Simultaneously, the Enhanced MobileViT Block eliminates the large-scale convolution block at the beginning of the original MobileViT Block. It converts the Transformer’s self-attention into separable self-attention with linear complexity, optimizing the back end of the original MobileViT Block with depth-wise separable convolutions. This aims to extract global features and effectively reduce parameter counts and feature extraction times. Additionally, we introduce a soft target center cross-entropy loss function to enhance generalization and increase accuracy. Experimental results indicate that the FV-MViT achieves a recognition accuracy of 99.53% and 100.00% on the Shandong University (SDU) and Universiti Teknologi Malaysia (USM) datasets, with equal error rates of 0.47% and 0.02%, respectively. The model has a parameter count of 5.26 million and exhibits a latency of 10.00 milliseconds from the sample input to the recognition output. Comparison with state-of-the-art (SOTA) methods reveals competitive performance for FV-MViT.

Keywords