BMC Medical Research Methodology (Jun 2022)

A progressive three-state model to estimate time to cancer: a likelihood-based approach

  • Eddymurphy U. Akwiwu,
  • Thomas Klausch,
  • Henriette C. Jodal,
  • Beatriz Carvalho,
  • Magnus Løberg,
  • Mette Kalager,
  • Johannes Berkhof,
  • Veerle M. H. Coupé

DOI
https://doi.org/10.1186/s12874-022-01645-2
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Background To optimize colorectal cancer (CRC) screening and surveillance, information regarding the time-dependent risk of advanced adenomas (AA) to develop into CRC is crucial. However, since AA are removed after diagnosis, the time from AA to CRC cannot be observed in an ethically acceptable manner. We propose a statistical method to indirectly infer this time in a progressive three-state disease model using surveillance data. Methods Sixteen models were specified, with and without covariates. Parameters of the parametric time-to-event distributions from the adenoma-free state (AF) to AA and from AA to CRC were estimated simultaneously, by maximizing the likelihood function. Model performance was assessed via simulation. The methodology was applied to a random sample of 878 individuals from a Norwegian adenoma cohort. Results Estimates of the parameters of the time distributions are consistent and the 95% confidence intervals (CIs) have good coverage. For the Norwegian sample (AF: 78%, AA: 20%, CRC: 2%), a Weibull model for both transition times was selected as the final model based on information criteria. The mean time among those who have made the transition to CRC since AA onset within 50 years was estimated to be 4.80 years (95% CI: 0; 7.61). The 5-year and 10-year cumulative incidence of CRC from AA was 13.8% (95% CI: 7.8%;23.8%) and 15.4% (95% CI: 8.2%;34.0%), respectively. Conclusions The time-dependent risk from AA to CRC is crucial to explain differences in the outcomes of microsimulation models used for the optimization of CRC prevention. Our method allows for improving models by the inclusion of data-driven time distributions.

Keywords