Cancer Informatics (Jan 2007)

Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System

  • Christopher R. Flowers,
  • Leroy Hill,
  • Ashley Hilliard,
  • Susan G. Moore,
  • Rochelle Victor,
  • Michael Graiser,
  • Michael S. Keehan

Journal volume & issue
Vol. 3
pp. 149 – 158

Abstract

Read online

Background: Large linked databases (LLDB) represent a novel resource for cancer outcomes research. However, accurate means of identifying a patient population of interest within these LLDBs can be challenging. Our research group developed a fully integrated platform that provides a means of combining independent legacy databases into a single cancer-focused LLDB system. We compared the sensitivity and specifi city of several SQL-based query strategies for identifying a histologic lymphoma subtype in this LLDB to determine the most accurate legacy data source for identifying a specifi c cancer patient population.Methods: Query strategies were developed to identify patients with follicular lymphoma from a LLDB of cancer registry data, electronic medical records (EMR), laboratory, administrative, pharmacy, and other clinical data. Queries were performed using common diagnostic codes (ICD-9), cancer registry histology codes (ICD-O), and text searches of EMRs. We reviewed medical records and pathology reports to confirm each diagnosis and calculated the sensitivity and specificity for each query strategy.Results: Together the queries identified 1538 potential cases of follicular lymphoma. Review of pathology and other medical reports confirmed 415 cases of follicular lymphoma, 300 pathology-verifi ed and 115 verified from other medical reports. The query using ICD-O codes was highly specific (96%). Queries using text strings varied in sensitivity (range 7–92%) and specifi city (range 86–99%). Queries using ICD-9 codes were both less sensitive (34–44%) and specific (35–87%).Conclusions: Queries of linked-cancer databases that include cancer registry data should utilize ICD-O codes or employ structured free-text searches to identify patient populations with a precise histologic diagnosis.Abbreviations: LLDB: Large Linked Database; SEER: Surveillance Epidemiology and End Results; EMR: Electronic Medical Record; ICD-9: International Classifi cation of Diseases (9th revision); ICD-O: International Classifi cation of Diseases for Oncology; AP: Anatomical Pathology; WHO: World Health Organization.

Keywords