BMC Medical Research Methodology (Jun 2023)
Use of electronic health data to identify patients with moderate-to-severe osteoarthritis of the hip and/or knee and inadequate response to pain medications
Abstract
Abstract Background No algorithms exist to identify important osteoarthritis (OA) patient subgroups (i.e., moderate-to-severe disease, inadequate response to pain treatments) in electronic healthcare data, possibly due to the complexity in defining these characteristics as well as the lack of relevant measures in these data sources. We developed and validated algorithms intended for use with claims and/or electronic medical records (EMR) to identify these patient subgroups. Methods We obtained claims, EMR, and chart data from two integrated delivery networks. Chart data were used to identify the presence or absence of the three relevant OA-related characteristics (OA of the hip and/or knee, moderate-to-severe disease, inadequate/intolerable response to at least two pain-related medications); the resulting classification served as the benchmark for algorithm validation. We developed two sets of case-identification algorithms: one based on a literature review and clinical input (predefined algorithms), and another using machine learning (ML) methods (logistic regression, classification and regression tree, random forest). Patient classifications based on these algorithms were compared and validated against the chart data. Results We sampled and analyzed 571 adult patients, of whom 519 had OA of hip and/or knee, 489 had moderate-to-severe OA, and 431 had inadequate response to at least two pain medications. Individual predefined algorithms had high positive predictive values (all PPVs ≥ 0.83) for identifying each of these OA characteristics, but low negative predictive values (all NPVs between 0.16–0.54) and sometimes low sensitivity; their sensitivity and specificity for identifying patients with all three characteristics was 0.95 and 0.26, respectively (NPV 0.65, PPV 0.78, accuracy 0.77). ML-derived algorithms performed better in identifying this patient subgroup (range: sensitivity 0.77–0.86, specificity 0.66–0.75, PPV 0.88–0.92, NPV 0.47–0.62, accuracy 0.75–0.83). Conclusions Predefined algorithms adequately identified OA characteristics of interest, but more sophisticated ML-based methods better differentiated between levels of disease severity and identified patients with inadequate response to analgesics. The ML methods performed well, yielding high PPV, NPV, sensitivity, specificity, and accuracy using either claims or EMR data. Use of these algorithms may expand the ability of real-world data to address questions of interest in this underserved patient population.
Keywords