Spatial-Transformed Regional Quality Estimation Network for Large-Variance Person Re-Identification

Ming Jiang; Biao Leng; Zhijun Meng; Guanglu Song

doi:10.1109/ACCESS.2019.2935097

IEEE Access (Jan 2019)

Spatial-Transformed Regional Quality Estimation Network for Large-Variance Person Re-Identification

Ming Jiang,
Biao Leng,
Zhijun Meng,
Guanglu Song

Affiliations

Ming Jiang: ORCiD; School of Computer Science and Engineering, Beihang University, Beijing, China
Biao Leng: The Shenzhen Institute of Beihang University, Shenzhen, China
Zhijun Meng: School of Aeronautic Science and Engineering, Beihang University, Beijing, China
Guanglu Song: School of Computer Science and Engineering, Beihang University, Beijing, China

DOI: https://doi.org/10.1109/ACCESS.2019.2935097
Journal volume & issue: Vol. 7
pp. 118101 – 118111

Abstract

Read online

The video-based person re-identification aims to search a query video in a large video gallery. The major restrictions on the performance of this task are image misalignment and partial noises caused by detection error, occlusion, blur, and illumination. The misalignment between the different frames in a video caused by excessive background or part missing may play the devil with pedestrian matching. In addition, the influence of partial noises on pedestrian matching performance is also unfavorable. Since different spatial regions of a single frame have various qualities, and the quality of the same region also varies across frames in a tracklet. A good way to address the problem is to effectively aggregate complementary information from all frames in a sequence, using better regions from other frames to compensate for the influence of an image region with poor quality. To achieve this, we propose a novel Spatial-transformed Regional Quality Estimation Network (SRQEN), where a well-designed spatial-transformed unit is used to automatically learn the alignment from an identification procedure and another ingenious training mechanism enables the effective learning to extract the complementary region-based information between different frames. Visual examples indicate that pedestrians are better aligned with SRQEN and the proposed method can learn complementary information. Extensive experiments show that compared with other feature extraction methods, we achieved comparable results of 93.5%, 79.8%, and 74.85% on the PRID 2011, iLIDS-VID, and MARS, respectively.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords