A Multimedia Graph Collaborative Filter

Jingzhou Sun; He Chang; Wenxuan Zhao; Youjian Yu; Lifang Yang; Xianglin Huang

doi:10.1109/ACCESS.2022.3174212

IEEE Access (Jan 2022)

A Multimedia Graph Collaborative Filter

Jingzhou Sun,
He Chang,
Wenxuan Zhao,
Youjian Yu,
Lifang Yang,
Xianglin Huang

Affiliations

Jingzhou Sun: ORCiD; Key Laboratory of Convergent Media and Intelligent Technology, Communication University of China, Ministry of Education, Beijing, China
He Chang: ORCiD; Key Laboratory of Convergent Media and Intelligent Technology, Communication University of China, Ministry of Education, Beijing, China
Wenxuan Zhao: ORCiD; Key Laboratory of Convergent Media and Intelligent Technology, Communication University of China, Ministry of Education, Beijing, China
Youjian Yu: School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin, China
Lifang Yang: ORCiD; Key Laboratory of Convergent Media and Intelligent Technology, Communication University of China, Ministry of Education, Beijing, China
Xianglin Huang: ORCiD; Key Laboratory of Convergent Media and Intelligent Technology, Communication University of China, Ministry of Education, Beijing, China

DOI: https://doi.org/10.1109/ACCESS.2022.3174212
Journal volume & issue: Vol. 10
pp. 50892 – 50902

Abstract

Read online

The Multimedia recommendation has always been an active research field in the area of personalized recommendation, and the core of multimedia recommenders is learning multimodal representations for the users and items. Traditional multimedia recommenders first extract multimodal (such as visual, acoustic, and textual) features by pre-trained networks, and then incorporate ID embeddings with the features to enrich the representations of both users and items. However, these methods only employ multimodal information of the directly interacting items. Recent graph-based efforts utilize high-order connectivities and the message pass mechanism of the graph. The multimodal information of high-hop items is propagated along with the high-order connectivities in a graph and aggregated to enrich the representations of users and items, such as adopting a parallel graph structure to model user preferences on different modalities. However, the users’ preferences for different modalities are unknown. The bipartite graph structure for the entire modalities should differ from the single modality. In this work, we devise a new multimedia recommendation framework, a Multimedia Graph Collaborative Filter (MGCF). In MGCF, a light graph framework with a fusion component is developed to integrate and distill the useful multimodal information. Moreover, the attention mechanism is adopted to enhance the importance of aggregated information. Extensive experiments are conducted on two public datasets: Tiktok and MovieLens. The results show that our model outperforms several state-of-art multimedia recommenders. Further analysis demonstrates the importance of modeling multimodal information for better user and item representations. The code will be open soon.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords