A Multimodal Recommender System Using Deep Learning Techniques Combining Review Texts and Images

Euiju Jeong; Xinzhe Li; Angela (Eunyoung) Kwon; Seonu Park; Qinglong Li; Jaekyeong Kim

doi:10.3390/app14209206

Applied Sciences (Oct 2024)

A Multimodal Recommender System Using Deep Learning Techniques Combining Review Texts and Images

Euiju Jeong,
Xinzhe Li,
Angela (Eunyoung) Kwon,
Seonu Park,
Qinglong Li,
Jaekyeong Kim

Affiliations

Euiju Jeong: Department of Big Data Analytics, Kyung Hee University, 26, Kyungheedae-ro, Dongdaemun-gu, Seoul 02447, Republic of Korea
Xinzhe Li: Department of Big Data Analytics, Kyung Hee University, 26, Kyungheedae-ro, Dongdaemun-gu, Seoul 02447, Republic of Korea
Angela (Eunyoung) Kwon: Sauder School of Business, University of British Columbia, 2053 Main Mall, Vancouver, BC V6T 1Z2, Canada
Seonu Park: Department of Big Data Analytics, Kyung Hee University, 26, Kyungheedae-ro, Dongdaemun-gu, Seoul 02447, Republic of Korea
Qinglong Li: Department of Big Data Analytics, Kyung Hee University, 26, Kyungheedae-ro, Dongdaemun-gu, Seoul 02447, Republic of Korea
Jaekyeong Kim: Department of Big Data Analytics, Kyung Hee University, 26, Kyungheedae-ro, Dongdaemun-gu, Seoul 02447, Republic of Korea

DOI: https://doi.org/10.3390/app14209206
Journal volume & issue: Vol. 14, no. 20
p. 9206

Abstract

Read online

Online reviews that consist of texts and images are an essential source of information for alleviating data sparsity in recommender system studies. Although texts and images provide different types of information, they can provide complementary or substitutive advantages. However, most studies are limited in introducing the complementary effect between texts and images in the recommender systems. Specifically, they have overlooked the informational value of images and proposed recommender systems solely based on textual representations. To address this research gap, this study proposes a novel recommender model that captures the dependence between texts and images. This study uses the RoBERTa and VGG-16 models to extract textual and visual information from online reviews and applies a co-attention mechanism to capture the complementarity between the two modalities. Extensive experiments were conducted using Amazon datasets, confirming the superiority of the proposed model. Our findings suggest that the complementarity of texts and images is crucial for enhancing recommendation accuracy and performance.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords