IEEE Access (Jan 2020)
Sentiment Enhanced Multi-Modal Hashtag Recommendation for Micro-Videos
Abstract
Recommending hashtags for micro-videos is a challenging task due to the following two reasons: 1) micro-video is a unity of multi-modalities, including the visual, acoustic, and textual modalities. Therefore, how to effectively extract features from multi-modalities and utilize them to express the micro-video is of great significance; 2) micro-videos usually include moods and feelings, which may provide crucial cues for recommending proper hashtags. However, most of the existing works have not considered the sentiment of media data for hashtag recommendation. In this paper, the senTiment enhanced multi-mOdal Attentive haShtag recommendaTion (TOAST) model is proposed for micro-video hashtag recommendation. Different from previous hashtag recommendation models, which merely consider content features, sentiment features of modalities are further incorporated in TOAST to improve the recommendation performance of the sentiment hashtags (e.g., #funny, #sad). Specifically, the multi-modal content features and the multi-modal sentiment features are modeled by a content common space learning branch based on self-attention and a sentiment common space learning branch, respectively. Furthermore, the varying importance of the multi-modal sentiment and content features are dynamically captured via an attention neural network according to their consistency with the hashtag semantic embedding by an attention neural network. Extensive experiments on a real-world dataset have demonstrated the effectiveness of the proposed method compared with the baseline methods. Meanwhile, the findings from the experiments may provide new insight for future developments of micro-video hashtag recommendation.
Keywords