Machine Learning of Concepts Hard Even for Humans: The Case of Online Depression Forums

Renáta Németh; Domonkos Sik; Fanni Máté

doi:10.1177/1609406920949338

International Journal of Qualitative Methods (Aug 2020)

Machine Learning of Concepts Hard Even for Humans: The Case of Online Depression Forums

Renáta Németh,
Domonkos Sik,
Fanni Máté

Affiliations

Renáta Németh: Faculty of Social Sciences, Research Center for Computational Social Science, , Budapest, Hungary
Domonkos Sik: Faculty of Social Sciences, Research Center for Computational Social Science, , Budapest, Hungary
Fanni Máté: Faculty of Social Sciences, Research Center for Computational Social Science, , Budapest, Hungary

DOI: https://doi.org/10.1177/1609406920949338
Journal volume & issue: Vol. 19

Abstract

Read online

Social scientists of mixed-methods research have traditionally used human annotators to classify texts according to some predefined knowledge. The “big data” revolution, the fast growth of digitized texts in recent years brings new opportunities but also new challenges. In our research project, we aim to examine the potential for natural language processing (NLP) techniques to understand the individual framing of depression in online forums. In this paper, we introduce a part of this project experimenting with NLP classification (supervised machine learning) method, which is capable of classifying large digital corpora according to various discourses on depression. Our question was whether an automated method can be applied to sociological problems outside the scope of hermeneutically more trivial business applications. The present article introduces our learning path from the difficulties of human annotation to the hermeneutic limitations of algorithmic NLP methods. We faced our first failure when we experienced significant inter-annotator disagreement. In response to the failure, we moved to the strategy of intersubjective hermeneutics (interpretation through consensus). The second failure arose because we expected the machine to effectively learn from the human-annotated sample despite its hermeneutic limitations. The machine learning seemed to work appropriately in predicting bio-medical and psychological framing, but it failed in case of sociological framing. These results show that the sociological discourse about depression is not as well founded as the biomedical and the psychological discourses—a conclusion which requires further empirical study in the future. An increasing part of machine learning solution is based on human annotation of semantic interpretation tasks, and such human-machine interactions will probably define many more applications in the future. Our paper shows the hermeneutic limitations of “big data” text analytics in the social sciences, and highlights the need for a better understanding of the use of annotated textual data and the annotation process itself.

Published in International Journal of Qualitative Methods

ISSN: 1609-4069 (Online)
Publisher: SAGE Publishing
Country of publisher: United States
LCC subjects: Social Sciences: Social sciences (General)
Website: https://journals.sagepub.com/home/ijq

About the journal