International Journal of Information Management Data Insights (Jun 2025)
Sentiment works in small-cap stocks: Japanese stock’s sentiment with language models
Abstract
We calculate sentiment from the Japanese Company Handbook, which contains a compact overview of Japanese companies’ business situation and financial data, using multiple methods, including large language models. Language models such as BERT and ChatGPT are advancing the application of natural language processing (NLP) to financial fields. We construct multiple sentiment calculation methods using sentiment dictionaries, models trained on existing sentiment datasets, ChatGPT, and GPT-4. Our analysis shows that stocks with higher sentiment scores tend to have higher excess returns, while those with lower scores tend to have lower excess returns. This feature is enhanced particularly in small-cap stocks. Comparisons between the models showed higher returns at high sentiment for the model trained with the existing sentiment dataset and lower returns at low sentiment for ChatGPT. The DeBERTaV2 model trained on Economy Watchers Survey data performed best in terms of returns at the highest sentiment quantile.