BMC Psychiatry (Apr 2018)

Improving risk prediction accuracy for new soldiers in the U.S. Army by adding self-report survey data to administrative data

  • Samantha L. Bernecker,
  • Anthony J. Rosellini,
  • Matthew K. Nock,
  • Wai Tat Chiu,
  • Peter M. Gutierrez,
  • Irving Hwang,
  • Thomas E. Joiner,
  • James A. Naifeh,
  • Nancy A. Sampson,
  • Alan M. Zaslavsky,
  • Murray B. Stein,
  • Robert J. Ursano,
  • Ronald C. Kessler

DOI
https://doi.org/10.1186/s12888-018-1656-4
Journal volume & issue
Vol. 18, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background High rates of mental disorders, suicidality, and interpersonal violence early in the military career have raised interest in implementing preventive interventions with high-risk new enlistees. The Army Study to Assess Risk and Resilience in Servicemembers (STARRS) developed risk-targeting systems for these outcomes based on machine learning methods using administrative data predictors. However, administrative data omit many risk factors, raising the question whether risk targeting could be improved by adding self-report survey data to prediction models. If so, the Army may gain from routinely administering surveys that assess additional risk factors. Methods The STARRS New Soldier Survey was administered to 21,790 Regular Army soldiers who agreed to have survey data linked to administrative records. As reported previously, machine learning models using administrative data as predictors found that small proportions of high-risk soldiers accounted for high proportions of negative outcomes. Other machine learning models using self-report survey data as predictors were developed previously for three of these outcomes: major physical violence and sexual violence perpetration among men and sexual violence victimization among women. Here we examined the extent to which this survey information increases prediction accuracy, over models based solely on administrative data, for those three outcomes. We used discrete-time survival analysis to estimate a series of models predicting first occurrence, assessing how model fit improved and concentration of risk increased when adding the predicted risk score based on survey data to the predicted risk score based on administrative data. Results The addition of survey data improved prediction significantly for all outcomes. In the most extreme case, the percentage of reported sexual violence victimization among the 5% of female soldiers with highest predicted risk increased from 17.5% using only administrative predictors to 29.4% adding survey predictors, a 67.9% proportional increase in prediction accuracy. Other proportional increases in concentration of risk ranged from 4.8% to 49.5% (median = 26.0%). Conclusions Data from an ongoing New Soldier Survey could substantially improve accuracy of risk models compared to models based exclusively on administrative predictors. Depending upon the characteristics of interventions used, the increase in targeting accuracy from survey data might offset survey administration costs.

Keywords