Explainable Machine Learning Exploiting News and Domain-Specific Lexicon for Stock Market Forecasting

Salvatore M. Carta; Sergio Consoli; Luca Piras; Alessandro Sebastian Podda; Diego Reforgiato Recupero

doi:10.1109/ACCESS.2021.3059960

IEEE Access (Jan 2021)

Explainable Machine Learning Exploiting News and Domain-Specific Lexicon for Stock Market Forecasting

Salvatore M. Carta,
Sergio Consoli,
Luca Piras,
Alessandro Sebastian Podda,
Diego Reforgiato Recupero

Affiliations

Salvatore M. Carta: ORCiD; Department of Mathematics and Computer Science, University of Cagliari, Cagliari, Italy
Sergio Consoli: ORCiD; European Commission, Joint Research Centre (DG-JRC), Ispra, Italy
Luca Piras: Department of Mathematics and Computer Science, University of Cagliari, Cagliari, Italy
Alessandro Sebastian Podda: ORCiD; Department of Mathematics and Computer Science, University of Cagliari, Cagliari, Italy
Diego Reforgiato Recupero: ORCiD; Department of Mathematics and Computer Science, University of Cagliari, Cagliari, Italy

DOI: https://doi.org/10.1109/ACCESS.2021.3059960
Journal volume & issue: Vol. 9
pp. 30193 – 30205

Abstract

Read online

In this manuscript, we propose a Machine Learning approach to tackle a binary classification problem whose goal is to predict the magnitude (high or low) of future stock price variations for individual companies of the S&P 500 index. Sets of lexicons are generated from globally published articles with the goal of identifying the most impactful words on the market in a specific time interval and within a certain business sector. A feature engineering process is then performed out of the generated lexicons, and the obtained features are fed to a Decision Tree classifier. The predicted label (high or low) represents the underlying company's stock price variation on the next day, being either higher or lower than a certain threshold. The performance evaluation we have carried out through a walk-forward strategy, and against a set of solid baselines, shows that our approach clearly outperforms the competitors. Moreover, the devised Artificial Intelligence (AI) approach is explainable, in the sense that we analyze the white-box behind the classifier and provide a set of explanations on the obtained results.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords