International Journal of Population Data Science (Jul 2022)
Lessons in Linkage: Combining Administrative Data Using Deterministic Linkage for Surveillance of Sports and Recreation Injuries in Florida, United States
Abstract
Objectives Previous and ongoing epidemiological surveillance of sports and recreation injuries (SRI) has been cross-sectional in nature, utilised a subset of injuries based on athletic trainer availability, or focused on elite and professional athletes. In the United States, surveillance is often prohibitively expensive and not well funded by national organisations or agencies, except for the case of some professional and elite sports. This paper details the methodology, barriers, and successes of using deterministic linkage to combine emergency department and hospitalisation data with a single identifier for use in surveilling sports injuries for persons aged 5 to 18 years. Design Data linkage of a population cohort. Methods We performed deterministic linkage of administrative emergency department and hospitalisation data from the state of Florida in the US. Data was acquired from the Florida Agency for Health Care Administration. With limited identifiers available due to privacy, we combined data across multiple years using a near universal identifier. We identified sport and recreation injuries using a modified External Cause of Injury Morbidity Matrix and ICD codes across all possible diagnoses. Finally, we obtained descriptive statistics of records that were successfully linked and those that were not to assess similarities between the groups. Results We found 384,731 visits for SRI over a seven-year period. We were able to link approximately 70% of the records using a single identifier. There were statistically significant differences by age, sex, payer, and race/ethnicity for the records that were linked compared to the records that were not linked. Conclusions This study is significant because while similar methods have been used to examine other conditions (e.g. asthma), few have linked multiple types of administrative data especially with nearly no identifiers to examine sports and recreation injuries. This method was found useful to identify injuries over time for the same individuals seeking care in emergency departments, or in hospital inpatient settings, though future work will need to address the limitations of this method. If we expect to move health surveillance forward as budgets for it become even more limited, we must develop and improve methods to do it with fewer resources, including using data that has great limitations. Keywords deterministic linkage; epidemiological methods; epidemiology; surveillance; public health; data cleaning; data linkage; reformatting; SAS; Python; injury surveillance; disease surveillance; health surveillance; biostatistics
Keywords