IET Computer Vision (Sep 2020)
Spatial–temporal representation for video re‐identification via key images
Abstract
Video‐based person re‐identification aims to verify the pedestrian identity from image sequences. The sequences are captured by cameras located in different directions at different times. Existing studies have certain limitations in the case of occlusions and pose variations. To solve the aforementioned problems, this study proposes a new two‐stage framework, from which the key‐image‐based fusion spatial–temporal feature (KISTF) of the pedestrian can be extracted from the video. The image‐level features at all timestamps are aggregated into the sequence‐level feature representation of the video by using an long short‐term memory network. Additionally, the concept of key image is defined for the image sequence, and the frame‐level feature of the pedestrian is extracted from these key images. The proposed spatial–temporal feature, KISTF, is obtained by fusing the sequence‐level feature and the frame‐level feature. It aims to solve the problem of pedestrian representation in small video data sets. Experiments are conducted on the iLIDS‐VID and PRID2011 data sets. The results demonstrate that the proposed approach outperforms state‐of‐the‐art video‐based re‐identification methods.
Keywords