Dimension reduction and shrinkage methods for high dimensional disease risk scores in historical data

Hiraku Kumamaru; Sebastian Schneeweiss; Robert J. Glynn; Soko Setoguchi; Joshua J. Gagne

doi:10.1186/s12982-016-0047-x

Emerging Themes in Epidemiology (Apr 2016)

Dimension reduction and shrinkage methods for high dimensional disease risk scores in historical data

Hiraku Kumamaru,
Sebastian Schneeweiss,
Robert J. Glynn,
Soko Setoguchi,
Joshua J. Gagne

Affiliations

Hiraku Kumamaru: Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School
Sebastian Schneeweiss: Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School
Robert J. Glynn: Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School
Soko Setoguchi: Duke Clinical Research Institute, Duke University
Joshua J. Gagne: Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School

DOI: https://doi.org/10.1186/s12982-016-0047-x
Journal volume & issue: Vol. 13, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Background Multivariable confounder adjustment in comparative studies of newly marketed drugs can be limited by small numbers of exposed patients and even fewer outcomes. Disease risk scores (DRSs) developed in historical comparator drug users before the new drug entered the market may improve adjustment. However, in a high dimensional data setting, empirical selection of hundreds of potential confounders and modeling of DRS even in the historical cohort can lead to over-fitting and reduced predictive performance in the study cohort. We propose the use of combinations of dimension reduction and shrinkage methods to overcome this problem, and compared the performances of these modeling strategies for implementing high dimensional (hd) DRSs from historical data in two empirical study examples of newly marketed drugs versus comparator drugs after the new drugs’ market entry—dabigatran versus warfarin for the outcome of major hemorrhagic events and cyclooxygenase-2 inhibitor (coxibs) versus nonselective non-steroidal anti-inflammatory drugs (nsNSAIDs) for gastrointestinal bleeds. Results Historical hdDRSs that included predefined and empirical outcome predictors with dimension reduction (principal component analysis; PCA) and shrinkage (lasso and ridge regression) approaches had higher c-statistics (0.66 for the PCA model, 0.64 for the PCA + ridge and 0.65 for the PCA + lasso models in the warfarin users) than an unreduced model (c-statistic, 0.54) in the dabigatran example. The odds ratio (OR) from PCA + lasso hdDRS-stratification [OR, 0.64; 95 % confidence interval (CI) 0.46–0.90] was closer to the benchmark estimate (0.93) from a randomized trial than the model without empirical predictors (OR, 0.58; 95 % CI 0.41–0.81). In the coxibs example, c-statistics of the hdDRSs in the nsNSAID initiators were 0.66 for the PCA model, 0.67 for the PCA + ridge model, and 0.67 for the PCA + lasso model; these were higher than for the unreduced model (c-statistic, 0.45), and comparable to the demographics + risk score model (c-statistic, 0.67). Conclusions hdDRSs using historical data with dimension reduction and shrinkage was feasible, and improved confounding adjustment in two studies of newly marketed medications.

Published in Emerging Themes in Epidemiology

ISSN: 1742-7622 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Infectious and parasitic diseases
Website: http://ete-online.biomedcentral.com

About the journal

Abstract

Keywords