Applied Sciences (Oct 2024)

A Multimodal Recommender System Using Deep Learning Techniques Combining Review Texts and Images

  • Euiju Jeong,
  • Xinzhe Li,
  • Angela (Eunyoung) Kwon,
  • Seonu Park,
  • Qinglong Li,
  • Jaekyeong Kim

DOI
https://doi.org/10.3390/app14209206
Journal volume & issue
Vol. 14, no. 20
p. 9206

Abstract

Read online

Online reviews that consist of texts and images are an essential source of information for alleviating data sparsity in recommender system studies. Although texts and images provide different types of information, they can provide complementary or substitutive advantages. However, most studies are limited in introducing the complementary effect between texts and images in the recommender systems. Specifically, they have overlooked the informational value of images and proposed recommender systems solely based on textual representations. To address this research gap, this study proposes a novel recommender model that captures the dependence between texts and images. This study uses the RoBERTa and VGG-16 models to extract textual and visual information from online reviews and applies a co-attention mechanism to capture the complementarity between the two modalities. Extensive experiments were conducted using Amazon datasets, confirming the superiority of the proposed model. Our findings suggest that the complementarity of texts and images is crucial for enhancing recommendation accuracy and performance.

Keywords