Political Research Exchange (Dec 2022)

Dictionary-based and machine learning classification approaches: a comparison for tonality and frame detection on Twitter data

  • Maud Reveilhac,
  • Davide Morselli

DOI
https://doi.org/10.1080/2474736X.2022.2029217
Journal volume & issue
Vol. 4, no. 1

Abstract

Read online

Automated text analysis methods have made it possible to classify large corpora of text by measures such as frames and tonality, with a growing popularity in social, political and psychological science. These methods often demand a training dataset of sufficient size to generate accurate models that can be applied to unseen texts. In practice, however, there are no clear recommendations about how big the training samples should be. This issue becomes especially acute when dealing with texts skewed toward categories and when researchers cannot afford large samples of annotated texts. Leveraging on the case of support for democracy, we provide a guide to help researchers navigate decisions when producing measures of tonality and frames from a small sample of annotated social media posts. We find that supervised machine learning algorithms outperform dictionaries for tonality classification tasks. However, custom dictionaries are useful complements of these algorithms when identifying latent democracy dimensions in social media messages, especially as the method of elaborating these dictionaries is guided by word embedding techniques and human validation. Therefore, we provide easily implementable recommendations to increase estimation accuracy under non-optimal condition.

Keywords