Clinical Epidemiology (Oct 2018)
A phenotyping algorithm to identify acute ischemic stroke accurately from a national biobank: the Million Veteran Program
Abstract
Tasnim F Imran,1–3,* Daniel Posner,1,4,* Jacqueline Honerlaw,1 Jason L Vassy,1,2 Rebecca J Song,1 Yuk-Lam Ho,1 Steven J Kittner,5 Katherine P Liao,1,2 Tianxi Cai,1,6 Christopher J O’Donnell,1,2 Luc Djousse,1,2 David R Gagnon,1,4 J Michael Gaziano,1,2 Peter WF Wilson,7,8 Kelly Cho1,2 On behalf of the VA Million Veteran Program 1Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Cooperative Studies Program, VA Boston Healthcare System, Boston, MA, USA; 2Department of Medicine, Division of Aging, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA; 3Department of Medicine, Cardiology Section, Boston Medical Center, Boston University School of Medicine, Boston, MA, USA; 4Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; 5Department of Neurology, Baltimore VA Medical Center and University of Maryland School of Medicine, Baltimore, MD, USA; 6Harvard T. H. Chan School of Public Health, Boston, MA, USA; 7Atlanta VA Medical Center, Decatur, GA, USA; 8Department of Medicine, Division of Cardiovascular Disease, Emory University School of Medicine, Atlanta, GA, USA *These authors contributed equally to this work Background: Large databases provide an efficient way to analyze patient data. A challenge with these databases is the inconsistency of ICD codes and a potential for inaccurate ascertainment of cases. The purpose of this study was to develop and validate a reliable protocol to identify cases of acute ischemic stroke (AIS) from a large national database.Methods: Using the national Veterans Affairs electronic health-record system, Center for Medicare and Medicaid Services, and National Death Index data, we developed an algorithm to identify cases of AIS. Using a combination of inpatient and outpatient ICD9 codes, we selected cases of AIS and controls from 1992 to 2014. Diagnoses determined after medical-chart review were considered the gold standard. We used a machine-learning algorithm and a neural network approach to identify AIS from ICD9 codes and electronic health-record information and compared it with a previous rule-based stroke-classification algorithm.Results: We reviewed administrative hospital data, ICD9 codes, and medical records of 268 patients in detail. Compared with the gold standard, this AIS algorithm had a sensitivity of 91%, specificity of 95%, and positive predictive value of 88%. A total of 80,508 highly likely cases of AIS were identified using the algorithm in the Veterans Affairs national cardiovascular disease-risk cohort (n=2,114,458).Conclusion: Our algorithm had high specificity for identifying AIS in a nationwide electronic health-record system. This approach may be utilized in other electronic health databases to accurately identify patients with AIS. Keywords: acute ischemic stroke, algorithm, large databases, big data, administrative health data, cerebrovascular accident