Array (Dec 2021)

DIFCURV: A unified framework for Diffusion Curve Fitting and prediction in Online Social Networks

  • Charalambos Christoforou,
  • Kalliopi Malerou,
  • Nikolaos L. Tsitsas,
  • Athena Vakali

Journal volume & issue
Vol. 12
p. 100100

Abstract

Read online

Information propagation analysis in Online Social Networks (OSNs) sparks great interest due to its impact across different business sectors. In the wide range of OSNs, the famous micro-blogging service Twitter stands out for a plethora of reasons, such as the platform popularity and the ease of access to data. Activities like retweeting in the popular OSN micro-blogging Twitter service, constitute fundamental mechanisms for information diffusion. The form of such cascading activities (like retweet) in time plays a crucial role in identifying the influence and the life duration of an information source (like tweet). In this paper, we propose an integral framework with a dual functionality to: (i) examine the effectiveness of robust mathematical models in the fitting of the curves produced by the number of retweets over a period of time, and (ii) employ these mathematical models to predict the behavior of the examined retweets using only a small fraction of them (as input data). The examined mathematical models stem from simple mathematical functions or are based on the Diffusion of Innovation theory, an important theory for examining spreading phenomena which has not yet been used thoroughly in OSNs Diffusion prediction. The proposed Framework (so called DIFCURV) encapsulates proper data preprocessing procedures as well as explanatory Analysis augmented with Visualization and Statistical Analysis. In the curve fitting part of the DIFCURV Framework, an optimization method, which depends upon the curve’s slope, is deployed to the tweet stories having an error above the defined threshold, resulting in a significant reduction of the error. To predict the retweets temporal evolution, the non-linear least squares curve-fitting method was selected after detailed exploration and examination of different methods. Furthermore, for the approximation of the growth-rate variable, three methods are proposed and Mean Growth Rate is showcased as the most suitable approach for the OSNs domain. The effectiveness of the DIFCURV Framework is exhibited by presenting results of several numerical experiments for a large dataset consisting of over two million retweets in total for all examined stories. DIFCURV prediction results were also compared with similar existing works and comparisons showed that the Proposed Framework can predict Information Diffusion with higher accuracy and efficiency.

Keywords