PLoS ONE (Jan 2024)

Data science and automation in the process of theorizing: Machine learning's power of induction in the co-duction cycle.

  • Daan Kolkman,
  • Gwendolyn K Lee,
  • Arjen van Witteloostuijn

DOI
https://doi.org/10.1371/journal.pone.0309318
Journal volume & issue
Vol. 19, no. 11
p. e0309318

Abstract

Read online

Recent calls to take up data science either revolve around the superior predictive performance associated with machine learning or the potential of data science techniques for exploratory data analysis. Many believe that these strengths come at the cost of explanatory insights, which form the basis for theorization. In this paper, we show that this trade-off is false. When used as a part of a full research process, including inductive, deductive and abductive steps, machine learning can offer explanatory insights and provide a solid basis for theorization. We present a systematic five-step theory-building and theory-testing cycle that consists of: 1. Element identification (reduction); 2. Exploratory analysis (induction); 3. Hypothesis development (retroduction); 4. Hypothesis testing (deduction); and 5. Theorization (abduction). We demonstrate the usefulness of this approach, which we refer to as co-duction, in a vignette where we study firm growth with real-world observational data.