Patterns (Jun 2020)

gfoRmula: An R Package for Estimating the Effects of Sustained Treatment Strategies via the Parametric g-formula

  • Sean McGrath,
  • Victoria Lin,
  • Zilu Zhang,
  • Lucia C. Petito,
  • Roger W. Logan,
  • Miguel A. Hernán,
  • Jessica G. Young

Journal volume & issue
Vol. 1, no. 3
p. 100008

Abstract

Read online

Summary: Researchers are often interested in estimating the causal effects of sustained treatment strategies, i.e., of (hypothetical) interventions involving time-varying treatments. When using observational data, estimating those effects requires adjustment for confounding. However, conventional regression methods cannot appropriately adjust for confounding in the presence of treatment-confounder feedback. In contrast, estimators derived from Robins's g-formula may correctly adjust for confounding even if treatment-confounder feedback exists. The package gfoRmula implements in R one such estimator: the parametric g-formula. This estimator can be used to estimate the effects of binary or continuous time-varying treatments as well as contrasts defined by static or dynamic, deterministic, or random interventions, as well as interventions that depend on the natural value of treatment. The package accommodates survival outcomes as well as binary or continuous outcomes measured at the end of follow-up. This paper describes the gfoRmula package, along with motivating background, features, and examples. The Bigger Picture: Causal inference is a core task of data science. When data from randomized experiments are not available, data analysts often rely on nonexperimental (observational) data to estimate causal effects. The parametric g-formula is a statistical method to estimate the causal effects of sustained treatment strategies from observational data with time-varying treatments, confounders, and outcomes. Although this methodology was introduced in the 1980s, it has not been widely used due to the lack of open-source software. This article presents the gfoRmula package, an implementation of the parametric g-formula in R. The aim of this software is to facilitate the application of the parametric g-formula to complex, observational data to answer causal questions. Furthermore, this package helps provide a way to compare the performance of the parametric g-formula to other methods in the causal inference literature.

Keywords