Cross-modality representation learning from transformer for hashtag prediction

Mian Muhammad Yasir Khalil; Qingxian Wang; Bo Chen; Weidong Wang

doi:10.1186/s40537-023-00824-2

Journal of Big Data (Sep 2023)

Cross-modality representation learning from transformer for hashtag prediction

Mian Muhammad Yasir Khalil,
Qingxian Wang,
Bo Chen,
Weidong Wang

Affiliations

Mian Muhammad Yasir Khalil: School of Information and Software Engineering, University of Electronic Science and Technology of China
Qingxian Wang: School of Information and Software Engineering, University of Electronic Science and Technology of China
Bo Chen: School of Information and Software Engineering, University of Electronic Science and Technology of China
Weidong Wang: School of Information and Software Engineering, University of Electronic Science and Technology of China

DOI: https://doi.org/10.1186/s40537-023-00824-2
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Hashtags are the keywords that describe the theme of social media content and have become very popular in influence marketing and trending topics. In recent years, hashtag prediction has become a hot topic in AI research to help users with automatic hashtag recommendations by capturing the theme of the post. Most of the previous work mainly focused only on textual information, but many microblog posts contain not only text but also the corresponding images. This work explores both image-text features of the microblog post. Inspired by the self-attention mechanism of the transformer in natural language processing, the visual-linguistics pre-train model with transfer learning also outperforms many downstream tasks that require image and text inputs. However, most of the existing models for multimodal hashtag recommendation are based on the traditional co-attention mechanism. This paper investigates the cross-modality transformer LXMERT for multimodal hashtag prediction for developing LXMERT4Hashtag, a cross-modality representation learning transformer model for hashtag prediction. It is a large-scale transformer model that consists of three encoders: a language encoder, an object encoder, and a cross-modality encoder. We evaluate the presented approach on dataset InstaNY100K. Experimental results show that our model is competitive and achieves impressive results, including precision of 50.5% vs 46.12%, recall of 44.02% vs 38.93%, and F1-score of 47.04% vs 42.22% compared to the existing state-of-the-art baseline model.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords