Sensors (Nov 2024)

A Tensor Space for Multi-View and Multitask Learning Based on Einstein and Hadamard Products: A Case Study on Vehicle Traffic Surveillance Systems

  • Fernando Hermosillo-Reynoso,
  • Deni Torres-Roman

DOI
https://doi.org/10.3390/s24237463
Journal volume & issue
Vol. 24, no. 23
p. 7463

Abstract

Read online

Since multi-view learning leverages complementary information from multiple feature sets to improve model performance, a tensor-based data fusion layer for neural networks, called Multi-View Data Tensor Fusion (MV-DTF), is used. It fuses M feature spaces X1,⋯,XM, referred to as views, in a new latent tensor space, S, of order P and dimension J1×⋯×JP, defined in the space of affine mappings composed of a multilinear map T:X1×⋯×XM→S—represented as the Einstein product between a (P+M)-order tensor A anda rank-one tensor, X=x(1)⊗⋯⊗x(M), where x(m)∈Xm is the m-th view—and a translation. Unfortunately, as the number of views increases, the number of parameters that determine the MV-DTF layer grows exponentially, and consequently, so does its computational complexity. To address this issue, we enforce low-rank constraints on certain subtensors of tensor A using canonical polyadic decomposition, from which M other tensors U(1),⋯,U(M), called here Hadamard factor tensors, are obtained. We found that the Einstein product A⊛MX can be approximated using a sum of R Hadamard products of M Einstein products encoded as U(m)⊛1x(m), where R is related to the decomposition rank of subtensors of A. For this relationship, the lower the rank values, the more computationally efficient the approximation. To the best of our knowledge, this relationship has not previously been reported in the literature. As a case study, we present a multitask model of vehicle traffic surveillance for occlusion detection and vehicle-size classification tasks, with a low-rank MV-DTF layer, achieving up to 92.81% and 95.10% in the normalized weighted Matthews correlation coefficient metric in individual tasks, representing a significant 6% and 7% improvement compared to the single-task single-view models.

Keywords