IEEE Access (Jan 2024)

Liveness Detection in Computer Vision: Transformer-Based Self-Supervised Learning for Face Anti-Spoofing

  • Arman Keresh,
  • Pakizar Shamoi

DOI
https://doi.org/10.1109/ACCESS.2024.3513795
Journal volume & issue
Vol. 12
pp. 185673 – 185685

Abstract

Read online

Face recognition systems are increasingly used in biometric security for convenience and effectiveness. However, they remain vulnerable to spoofing attacks, where attackers use photos, videos, or masks to impersonate legitimate users. This research addresses these vulnerabilities by exploring the Vision Transformer (ViT) architecture, fine-tuned with the DINO framework utilizing CelebA-Spoof, CASIA SURF, and a proprietary dataset. The DINO framework facilitates self-supervised learning, enabling the model to learn distinguishing features from unlabeled data. We compared the performance of the proposed fine-tuned ViT model using the DINO framework against traditional models, including CNN Model EfficientNet b2, EfficientNet b2 (Noisy Student), and Mobile ViT on the face anti-spoofing task. Numerous tests on standard datasets show that the ViT model performs better than other models in terms of accuracy and resistance to different spoofing methods. Our model’s superior performance, particularly in APCER (1.6%), the most critical metric in this domain, underscores its improved ability to detect spoofing relative to other models. Additionally, we collected our own dataset from a biometric application to validate our findings further. This study highlights the superior performance of transformer-based architecture in identifying complex spoofing cues, leading to significant advancements in biometric security.

Keywords