IEEE Access (Jan 2022)

Time Series Impact Through Topic Modeling

  • Julian Cendrero,
  • Julio Gonzalo,
  • Marcos Galletero,
  • Ivar Zapata

DOI
https://doi.org/10.1109/ACCESS.2022.3202960
Journal volume & issue
Vol. 10
pp. 97327 – 97347

Abstract

Read online

A time-series of numerical data and a sequence of time-ordered documents are often correlated. This paper aims at modeling the impact that the underlying themes discussed in the text data have on the time series. To do so, we introduce an original topic model, Time Series Impact Through Topic Modeling (TSITM), that includes contextual data by coupling Latent Dirichlet Allocation (LDA) with linear regression, using an elastic net prior to set to zero the impact of uncorrelated topics. The resulting topics act as explanatory variables for the regression of the numerical time series, which allows us to understand the time series movements based on the events described on the text data. We have tested our model on two datasets: first, we used political news to explain the US president’s disapproval ratings; then, we considered a corpus of economic news to explain the financial returns of 4 different multinational corporations. Our experiments show that an appropriate selection of hyperparameters (via repeated random subsampling validation and Bayesian optimization) leads to significant correlations: both an intrinsic baseline and state of the art methods were significantly outperformed by TSITM in MSE, MAE and out-of-sample $R^{2}$ , according to our hypothesis tests. We believe that this framework can be useful in the context of reputational risk management.

Keywords