LOTR: Face Landmark Localization Using Localization Transformer

Ukrit Watchareeruetai; Benjaphan Sommana; Sanjana Jain; Pavit Noinongyao; Ankush Ganguly; Aubin Samacoits; Samuel W. F. Earp; Nakarin Sritrakool

doi:10.1109/ACCESS.2022.3149380

IEEE Access (Jan 2022)

LOTR: Face Landmark Localization Using Localization Transformer

Ukrit Watchareeruetai,
Benjaphan Sommana,
Sanjana Jain,
Pavit Noinongyao,
Ankush Ganguly,
Aubin Samacoits,
Samuel W. F. Earp,
Nakarin Sritrakool

Affiliations

Ukrit Watchareeruetai: ORCiD; Sertis Vision Laboratory, Bangkok, Thailand
Benjaphan Sommana: ORCiD; Sertis Vision Laboratory, Bangkok, Thailand
Sanjana Jain: ORCiD; Sertis Vision Laboratory, Bangkok, Thailand
Pavit Noinongyao: Sertis Vision Laboratory, Bangkok, Thailand
Ankush Ganguly: ORCiD; Sertis Vision Laboratory, Bangkok, Thailand
Aubin Samacoits: ORCiD; Sertis Vision Laboratory, Bangkok, Thailand
Samuel W. F. Earp: Sertis Vision Laboratory, Bangkok, Thailand
Nakarin Sritrakool: ORCiD; Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Pathum Wan, Bangkok, Thailand

DOI: https://doi.org/10.1109/ACCESS.2022.3149380
Journal volume & issue: Vol. 10
pp. 16530 – 16543

Abstract

Read online

This paper presents a novel Transformer-based facial landmark localization network named Localization Transformer (LOTR). The proposed framework is a direct coordinate regression approach leveraging a Transformer network to better utilize the spatial information in a feature map. An LOTR model consists of three main modules: 1) a visual backbone that converts an input image into a feature map, 2) a Transformer module that improves the feature representation from the visual backbone, and 3) a landmark prediction head that directly predicts landmark coordinates from the Transformer’s representation. Given cropped-and-aligned face images, the proposed LOTR can be trained end-to-end without requiring any post-processing steps. This paper also introduces a loss function named smooth-Wing loss, which addresses the gradient discontinuity of the Wing loss, leading to better convergence than standard loss functions such as L1, L2, and Wing loss. Experimental results on the JD landmark dataset provided by the First Grand Challenge of 106-Point Facial Landmark Localization indicate the superiority of LOTR over the existing methods on the leaderboard and two recent heatmap-based approaches. On the WFLW dataset, the proposed LOTR framework demonstrates promising results compared with several state-of-the-art methods. Additionally, we report an improvement in the performance of state-of-the-art face recognition systems when using our proposed LOTRs for face alignment.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords