IEEE Access (Jan 2020)

A Multimodal Clustering Framework With Cross Reconstruction Autoencoders

  • Qianli Zhao,
  • Linlin Zong,
  • Xianchao Zhang,
  • Yuangang Li,
  • Xiaorui Tang

DOI
https://doi.org/10.1109/ACCESS.2020.3040644
Journal volume & issue
Vol. 8
pp. 218433 – 218443

Abstract

Read online

Multimodal clustering algorithms partitions a multimodal dataset into disjoint clusters. Common feature extraction is a key part in multimodal clustering algorithms. Recently, deep neural networks shows high performance on latent feature extraction. However, existing works did not completely explore the cross-model distribution similarity utilizing deep neural networks. We present a deep multimodal clustering framework with cross reconstruction. Feature extraction apply global cross reconstruction and local cross reconstruction respectively to enforce early fusion among different modalities. Analysis shows that the both cross reconstruction networks reduces the Wasserstein distance of latent feature distributions, which indicates that the proposed framework ensures the distribution similarity of common latent features. Experimental results on benchmark datasets demonstrate superiority beyond existing works.

Keywords