BMC Medical Research Methodology (Dec 2023)
Developing non-response weights to account for attrition-related bias in a longitudinal pregnancy cohort
Abstract
Abstract Background Prospective cohorts may be vulnerable to bias due to attrition. Inverse probability weights have been proposed as a method to help mitigate this bias. The current study used the “All Our Families” longitudinal pregnancy cohort of 3351 maternal-infant pairs and aimed to develop inverse probability weights using logistic regression models to predict study continuation versus drop-out from baseline to the three-year data collection wave. Methods Two methods of variable selection took place. One method was a knowledge-based a priori variable selection approach, while the second used Least Absolute Shrinkage and Selection Operator (LASSO). The ability of each model to predict continuing participation through discrimination and calibration for both approaches were evaluated by examining area under the receiver operating curve (AUROC) and calibration plots, respectively. Stabilized inverse probability weights were generated using predicted probabilities. Weight performance was assessed using standardized differences of baseline characteristics for those who continue in study and those that do not, with and without weights (unadjusted estimates). Results The a priori and LASSO variable selection method prediction models had good and fair discrimination with AUROC of 0.69 (95% Confidence Interval [CI]: 0.67–0.71) and 0.73 (95% CI: 0.71–0.75), respectively. Calibration plots and non-significant Hosmer-Lemeshow Goodness of Fit Tests indicated that both the a priori (p = 0.329) and LASSO model (p = 0.242) were well-calibrated. Unweighted results indicated large (> 10%) standardized differences in 15 demographic variables (range: 11 − 29%), when comparing those who continued in the study with those that did not. Weights derived from the a priori and LASSO models reduced standardized differences relative to unadjusted estimates, with the largest differences of 13% and 5%, respectively. Additionally, when applying the same LASSO variable selection method to develop weights in future data collection waves, standardized differences remained below 10% for each demographic variable. Conclusion The LASSO variable selection approach produced robust weights that addressed non-response bias more than the knowledge-driven approach. These weights can be applied to analyses across multiple longitudinal waves of data collection to reduce bias.
Keywords