Applied Sciences (Mar 2024)

Enhancing Sequence Movie Recommendation System Using Deep Learning and KMeans

  • Sophort Siet,
  • Sony Peng,
  • Sadriddinov Ilkhomjon,
  • Misun Kang,
  • Doo-Soon Park

DOI
https://doi.org/10.3390/app14062505
Journal volume & issue
Vol. 14, no. 6
p. 2505

Abstract

Read online

A flood of information has occurred, making it challenging for people to find and filter their favorite items. Recommendation systems (RSs) have emerged as a solution to this problem; however, traditional Appenrecommendation systems, including collaborative filtering, and content-based filtering, face significant challenges such as data scalability, data scarcity, and the cold-start problem, all of which require advanced solutions. Therefore, we propose a ranking and enhancing sequence movie recommendation system that utilizes the combination model of deep learning to resolve the existing issues. To mitigate these challenges, we design an RSs model that utilizes user information (age, gender, occupation) to analyze new users and match them with others who have similar preferences. Initially, we construct sequences of user behavior to effectively predict the potential next target movie of users. We then incorporate user information and movie sequence embeddings as input features to reduce the dimensionality, before feeding them into a transformer architecture and multilayer perceptron (MLP). Our model integrates a transformer layer with positional encoding for user behavior sequences and multi-head attention mechanisms to enhance prediction accuracy. Furthermore, the system applies KMeans clustering to movie genre embeddings, grouping similar movies and integrating this clustering information with predicted ratings to ensure diversity in the personalized recommendations for target users. Evaluating our model on two MovieLens datasets (100 Kand 1 M) demonstrated significant improvements, achieving RMSE, MAE, precision, recall, and F1 scores of 1.0756, 0.8741, 0.5516, 0.3260, and 0.4098 for the 100 K dataset, and 0.9927, 0.8007, 0.5838, 0.4723, and 0.5222 for the 1 M dataset, respectively. This approach not only effectively mitigates cold-start and scalability issues but also surpasses baseline techniques in Top-N item recommendations, highlighting its efficacy in the contemporary environment of abundant data.

Keywords