Head pose estimation with particle swarm optimization‐based contrastive learning and multimodal entangled GCN

Yuanfeng Lian; Yinliang Shi; Zhaonian Liu; Bin Jiang; Xingtao Li

doi:10.1049/ipr2.13142

IET Image Processing (Sep 2024)

Head pose estimation with particle swarm optimization‐based contrastive learning and multimodal entangled GCN

Yuanfeng Lian,
Yinliang Shi,
Zhaonian Liu,
Bin Jiang,
Xingtao Li

Affiliations

Yuanfeng Lian: College of Artificial Intelligence China University of Petroleum Beijing China
Yinliang Shi: College of Artificial Intelligence China University of Petroleum Beijing China
Zhaonian Liu: Research Institute Ltd.China National Offshore Oil CorporationBeijing China
Bin Jiang: Research Institute Ltd.China National Offshore Oil CorporationBeijing China
Xingtao Li: China National Oil and Gas Exploration and Development Co.China National Petroleum CorporationBeijing China

DOI: https://doi.org/10.1049/ipr2.13142
Journal volume & issue: Vol. 18, no. 11
pp. 2899 – 2917

Abstract

Read online

Abstract Head pose estimation is an especially challenging task due to the complexity nonlinear mapping from 2D feature space to 3D pose space. To address the above issue, this paper presents a novel and efficient head pose estimation framework based on particle swarm optimized contrastive learning and multimodal entangled graph convolution network. Firstly, a new network, the region and difference‐aware feature pyramid network (RD‐FPN), is proposed for 2D keypoints detection to alleviate the background interference and enhance the feature expressiveness. Then, particle swarm optimized contrastive learning is constructed to alternatively match 2D and 3D keypoints, which takes the multimodal keypoints matching accuracy as the optimization objective, while considering the similarity of cross‐modal positive and negative sample pairs from contrastive learning as a local contrastive constraint. Finally, multimodal entangled graph convolution network is designed to enhance the ability of establishing geometric relationships between keypoints and head pose angles based on second‐order bilinear attention, in which point‐edge attention is introduced to improve the representation of geometric features between multimodal keypoints. Compared with other methods, the average error of our method is reduced by 8.23%, indicating the accuracy, generalization, and efficiency of our method on the 300W‐LP, AFLW2000, BIWI datasets.

Published in IET Image Processing

ISSN: 1751-9659 (Print); 1751-9667 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Photography; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519667

About the journal

Abstract

Keywords