Jisuanji kexue yu tansuo (May 2022)
Research on User Similarity Calculation of Collaborative Filtering for Sparse Data
Abstract
User-based collaborative filtering achieves recommendation for target users based on the preferences of their nearest neighbors, in which how to calculate user similarity is critical. The traditional rating similarity calculation relies on the scores of common scoring items. With the intensification of the sparsity of user-item scoring matrix, traditional rating similarity calculation is difficult to accurately measure the similarity between users. Along this line, traditional rating similarity calculation is difficult in selecting reliable nearest neighbors for the target user, which affects the final recommendation performance. Besides, structural similarity is another commonly used similarity calculation method in recommendation task, which is mostly measured by the proportion of users’ common scoring items. This kind of method is easy to calculate and less affected by data sparseness. However, its outputs are usually close, leading to the result that different user-pairs cannot be distinguished obviously. To solve the similarity calculation difficulty for collaborative filtering caused by data sparseness, a sparse cosine similarity is proposed in this paper. Firstly, this paper formulates a new structural similarity, sparse set simil-arity to differentiate users into two groups, high-correlation users and low-correlation users. Then, this paper deve-lops different rating similarity calculation methods for different kinds of users, which can eliminate the misleading produced by traditional rating similarity when the data is sparse. Finally, the sparse cosine similarity is constructed by combining the raised rating similarity and structural similarity. Experimental results show that compared with seven similarity calculation methods, the presented sparse cosine similarity can yield more accurate user similarity and improve the performance of recommendation task, overcoming the limitations that traditional rating methods are affected by data sparseness severely and the results produced by structural methods are not distinct significantly.
Keywords