SSM: Population Health (Dec 2020)

Application of machine learning to understand child marriage in India

  • Anita Raj,
  • Nabamallika Dehingia,
  • Abhishek Singh,
  • Lotus McDougal,
  • Julian McAuley

Journal volume & issue
Vol. 12
p. 100687

Abstract

Read online

Background: Prior research documents that India has the greatest number of girls married as minors of any nation in the world, increasing social and health risks for both these young wives and their children. While the prevalence of child marriage has declined in the nation, more work is needed to accelerate this decline and the negative consequences of the practice. Expanded targets for intervention require greater identification of these targets. Machine learning can offer insight into identification of novel factors associated with child marriage that can serve as targets for intervention. Methods: We applied machine learning methods to retrospective cross-sectional survey data from India on demographics and health, the nationally-representative National Family Health Survey, conducted in 2015–16. We analyzed data using a traditional regression model, with child marriage as the dependent variable, and 4000+ variables from the survey as the independent variables. We also used three commonly used machine learning algorithms– Least Absolute Shrinkage and Selection Operator (lasso) or L-1 regularized logistic regression models; L2 regularized logistic regression or ridge models; and neural network models. Finally, we developed and applied a novel and rigorous approach involving expert qualitative review and coding of variables generated from an iterative series of regularized models to assess thematically key variable groupings associated with child marriage. Findings: Analyses revealed that regularized logistic and neural network applications demonstrated better accuracy and lower error rates than traditional logistic regression, with a greater number of features and variables generated. Regularized models highlight higher fertility and contraception, longer duration of marriage, geographic, and socioeconomic vulnerabilities as key correlates; findings shown in prior research. However, our novel method involving expert qualitative coding of variables generated from iterative regularized models and resultant thematic generation offered clarity on variables not focused upon in prior research, specifically non-utilization of health system benefits related to nutrition for mothers and infants. Interpretation: Machine learning appears to be a valid means of identifying key correlates of child marriage in India and, via our innovative iterative thematic approach, can be useful to identify novel variables associated with this outcome. Findings related to low nutritional service uptake also demonstrate the need for more focus on public health outreach for nutritional programs tailored to this population.

Keywords