IEEE Access (Jan 2025)
Complementarity-Oriented Feature Fusion for Face-Phone Trajectory Matching
Abstract
CCTVs and telecom base stations act as sensors, and collect massive face and phone related data. When used for person localization and trajectory characterization, they each present quite different spatiotemporal characteristics: CCTV is associated with slowly sampled face ID trajectories with spatial resolution of approximately 20 meters, while telecom readings provide fast sampled phone ID trajectories with spatial uncertainty of a few hundred meters. The face or phone trajectory can be seen as an observation of the real trajectory of a moving pedestrian. It is useful to identify the correspondence between face and phone trajectories to reconstruct the trajectory of moving persons. To this end, we propose a complementarity-oriented feature fusion mechanism (COFFM) to model and utilize the common embedding and complementarity of these two measurement modalities. Specifically, a Cycle Heterogeneous Trajectory Translation Network (CCTTN) is proposed to realize a TFE (Trajectory Feature Extractor) which captures the latent transforming relationships between the face and phone modalities. The latent features from both transforming directions are concatenated in the Feature Unifying (FU) module and fed into a binary face-phone trajectory matching discriminator (FPTPMD) to infer whether a face-phone trajectory pair corresponds to the same underlying motion trajectory. We evaluated our method on a large real-world face-phone trajectory dataset and showed promising results with the accuracy of 97.1% which exceeds the comparable similarity-based methods. The developed principle and framework generalize well to other multi-modality trajectory matching tasks.
Keywords