JMIR Public Health and Surveillance (May 2022)

Identifying Cases of Shoulder Injury Related to Vaccine Administration (SIRVA) in the United States: Development and Validation of a Natural Language Processing Method

  • Chengyi Zheng,
  • Jonathan Duffy,
  • In-Lu Amy Liu,
  • Lina S Sy,
  • Ronald A Navarro,
  • Sunhea S Kim,
  • Denison S Ryan,
  • Wansu Chen,
  • Lei Qian,
  • Cheryl Mercado,
  • Steven J Jacobsen

DOI
https://doi.org/10.2196/30426
Journal volume & issue
Vol. 8, no. 5
p. e30426

Abstract

Read online

BackgroundShoulder injury related to vaccine administration (SIRVA) accounts for more than half of all claims received by the National Vaccine Injury Compensation Program. However, due to the difficulty of finding SIRVA cases in large health care databases, population-based studies are scarce. ObjectiveThe goal of the research was to develop a natural language processing (NLP) method to identify SIRVA cases from clinical notes. MethodsWe conducted the study among members of a large integrated health care organization who were vaccinated between April 1, 2016, and December 31, 2017, and had subsequent diagnosis codes indicative of shoulder injury. Based on a training data set with a chart review reference standard of 164 cases, we developed an NLP algorithm to extract shoulder disorder information, including prior vaccination, anatomic location, temporality and causality. The algorithm identified 3 groups of positive SIRVA cases (definite, probable, and possible) based on the strength of evidence. We compared NLP results to a chart review reference standard of 100 vaccinated cases. We then applied the final automated NLP algorithm to a broader cohort of vaccinated persons with a shoulder injury diagnosis code and performed manual chart confirmation on a random sample of NLP-identified definite cases and all NLP-identified probable and possible cases. ResultsIn the validation sample, the NLP algorithm had 100% accuracy for identifying 4 SIRVA cases and 96 cases without SIRVA. In the broader cohort of 53,585 vaccinations, the NLP algorithm identified 291 definite, 124 probable, and 52 possible SIRVA cases. The chart-confirmation rates for these groups were 95.5% (278/291), 67.7% (84/124), and 17.3% (9/52), respectively. ConclusionsThe algorithm performed with high sensitivity and reasonable specificity in identifying positive SIRVA cases. The NLP algorithm can potentially be used in future population-based studies to identify this rare adverse event, avoiding labor-intensive chart review validation.