Journal of Pain Research (Jun 2015)

Identification of a potential fibromyalgia diagnosis using random forest modeling applied to electronic medical records

  • Emir B,
  • Masters ET,
  • Mardekian J,
  • Clair A,
  • Kuhn M,
  • Silverman SL

Journal volume & issue
Vol. 2015, no. default
pp. 277 – 288

Abstract

Read online

Birol Emir,1 Elizabeth T Masters,1 Jack Mardekian,1 Andrew Clair,1 Max Kuhn,2 Stuart L Silverman,3 1Pfizer Inc., New York, NY, 2Pfizer Inc., Groton, CT, 3Cedars-Sinai Medical Center, Los Angeles, CA, USA Background: Diagnosis of fibromyalgia (FM), a chronic musculoskeletal condition characterized by widespread pain and a constellation of symptoms, remains challenging and is often delayed. Methods: Random forest modeling of electronic medical records was used to identify variables that may facilitate earlier FM identification and diagnosis. Subjects aged ≥18 years with two or more listings of the International Classification of Diseases, Ninth Revision, (ICD-9) code for FM (ICD-9 729.1) ≥30 days apart during the 2012 calendar year were defined as cases among subjects associated with an integrated delivery network and who had one or more health care provider encounter in the Humedica database in calendar years 2011 and 2012. Controls were without the FM ICD-9 codes. Seventy-two demographic, clinical, and health care resource utilization variables were entered into a random forest model with downsampling to account for cohort imbalances (<1% subjects had FM). Importance of the top ten variables was ranked based on normalization to 100% for the variable with the largest loss in predicting performance by its omission from the model. Since random forest is a complex prediction method, a set of simple rules was derived to help understand what factors drive individual predictions. Results: The ten variables identified by the model were: number of visits where laboratory/non-imaging diagnostic tests were ordered; number of outpatient visits excluding office visits; age; number of office visits; number of opioid prescriptions; number of medications prescribed; number of pain medications excluding opioids; number of medications administered/ordered; number of emergency room visits; and number of musculoskeletal conditions. A receiver operating characteristic curve confirmed the model's predictive accuracy using an independent test set (area under the curve, 0.810). To enhance interpretability, nine rules were developed that could be used with good predictive probability of an FM diagnosis and to identify no-FM subjects. Conclusion: Random forest modeling may help to quantify the predictive probability of an FM diagnosis. Rules can be developed to simplify interpretability. Further validation of these models may facilitate earlier diagnosis and enhance management. Keywords: fibromyalgia, random forest, predictive modeling, electronic medical records, health care resource utilization, real-world data