BMJ Open Sport & Exercise Medicine (Mar 2024)
Integrative data analysis to identify persistent post-concussion deficits and subsequent musculoskeletal injury risk: project structure and methods
Abstract
Concussions are a serious public health problem, with significant healthcare costs and risks. One of the most serious complications of concussions is an increased risk of subsequent musculoskeletal injuries (MSKI). However, there is currently no reliable way to identify which individuals are at highest risk for post-concussion MSKIs. This study proposes a novel data analysis strategy for developing a clinically feasible risk score for post-concussion MSKIs in student-athletes. The data set consists of one-time tests (eg, mental health questionnaires), relevant information on demographics, health history (including details regarding the concussion such as day of the year and time lost) and athletic participation (current sport and contact level) that were collected at a single time point as well as multiple time points (baseline and follow-up time points after the concussion) of the clinical assessments (ie, cognitive, postural stability, reaction time and vestibular and ocular motor testing). The follow-up time point measurements were treated as individual variables and as differences from the baseline. Our approach used a weight-of-evidence (WoE) transformation to handle missing data and variable heterogeneity and machine learning methods for variable selection and model fitting. We applied a training-testing sample splitting scheme and performed variable preprocessing with the WoE transformation. Then, machine learning methods were applied to predict the MSKI indicator prediction, thereby constructing a composite risk score for the training-testing sample. This methodology demonstrates the potential of using machine learning methods to improve the accuracy and interpretability of risk scores for MSKI.