IEEE Access (Jan 2022)
A Multimedia Graph Collaborative Filter
Abstract
The Multimedia recommendation has always been an active research field in the area of personalized recommendation, and the core of multimedia recommenders is learning multimodal representations for the users and items. Traditional multimedia recommenders first extract multimodal (such as visual, acoustic, and textual) features by pre-trained networks, and then incorporate ID embeddings with the features to enrich the representations of both users and items. However, these methods only employ multimodal information of the directly interacting items. Recent graph-based efforts utilize high-order connectivities and the message pass mechanism of the graph. The multimodal information of high-hop items is propagated along with the high-order connectivities in a graph and aggregated to enrich the representations of users and items, such as adopting a parallel graph structure to model user preferences on different modalities. However, the users’ preferences for different modalities are unknown. The bipartite graph structure for the entire modalities should differ from the single modality. In this work, we devise a new multimedia recommendation framework, a Multimedia Graph Collaborative Filter (MGCF). In MGCF, a light graph framework with a fusion component is developed to integrate and distill the useful multimodal information. Moreover, the attention mechanism is adopted to enhance the importance of aggregated information. Extensive experiments are conducted on two public datasets: Tiktok and MovieLens. The results show that our model outperforms several state-of-art multimedia recommenders. Further analysis demonstrates the importance of modeling multimodal information for better user and item representations. The code will be open soon.
Keywords