IEEE Access (Jan 2020)

Forwarding Behavior Prediction Based on Microblog User Features

  • Chunlong Fu,
  • Yajun Du,
  • Binyan Lyu,
  • Qiaoyu Zhou,
  • Ruilin Hu,
  • Peng Jia,
  • Yujian Zhou

DOI
https://doi.org/10.1109/ACCESS.2020.2995411
Journal volume & issue
Vol. 8
pp. 95170 – 95187

Abstract

Read online

In microblog networks, when a user posts a microblog, other users may forward the post, and then the forwarding process will bring about the rapid dissemination and diffusion of information. In this paper, we propose a comprehensive and novel approach to predict user forwarding behavior. Firstly, we build the feature sets that affect the microblog forwarding, such as interest topic,geographic location, user aggregation coefficient,neighborhood overlap and so on. These features are classified into four categories: user characteristics, microblog features, network structure features, and interactive behavior characteristics. Secondly, we establish a feature selection model based on Filtering and Wrapping for predicting the forwarding behavior of users. The model includes three aspects: (1)ANOVA(Analysis of variance): The value of each feature is analyzed by variance analysis. If the feature variance is small, the feature provides less information. (2)$\chi ^{2}$ test and point-two-column correlation analysis: They filter discrete and continuous features, respectively. (3)Wrapper analysis: In order to solve the strong correlations between the features, we use LVW(Las vagas wrapper) algorithm to analyze the above feature sets, and then obtain the optimal feature combination. Finally, we propose the forwarding prediction model based on AdaBoost(Adaptive boosting) algorithm. Experimental results demonstrate that the model has the highest precision and F1 score than Naive Bayes, Logistic Regression, Random Forest and SVM(Support vector machine), and the F1 score reached 0.885. Among different topics, our proposed AdaBoost prediction model has good recall and F1 scores for different topics. In addition, by using different feature sets for comparison experiments, it is found that the optimal features selected in this paper are very effective.

Keywords