BMC Medicine (Jul 2021)

Validity of observational evidence on putative risk and protective factors: appraisal of 3744 meta-analyses on 57 topics

  • Perrine Janiaud,
  • Arnav Agarwal,
  • Ioanna Tzoulaki,
  • Evropi Theodoratou,
  • Konstantinos K. Tsilidis,
  • Evangelos Evangelou,
  • John P. A. Ioannidis

DOI
https://doi.org/10.1186/s12916-021-02020-6
Journal volume & issue
Vol. 19, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Background The validity of observational studies and their meta-analyses is contested. Here, we aimed to appraise thousands of meta-analyses of observational studies using a pre-specified set of quantitative criteria that assess the significance, amount, consistency, and bias of the evidence. We also aimed to compare results from meta-analyses of observational studies against meta-analyses of randomized controlled trials (RCTs) and Mendelian randomization (MR) studies. Methods We retrieved from PubMed (last update, November 19, 2020) umbrella reviews including meta-analyses of observational studies assessing putative risk or protective factors, regardless of the nature of the exposure and health outcome. We extracted information on 7 quantitative criteria that reflect the level of statistical support, the amount of data, the consistency across different studies, and hints pointing to potential bias. These criteria were level of statistical significance (pre-categorized according to 10−6, 0.001, and 0.05 p-value thresholds), sample size, statistical significance for the largest study, 95% prediction intervals, between-study heterogeneity, and the results of tests for small study effects and for excess significance. Results 3744 associations (in 57 umbrella reviews) assessed by a median number of 7 (interquartile range 4 to 11) observational studies were eligible. Most associations were statistically significant at P < 0.05 (61.1%, 2289/3744). Only 2.6% of associations had P < 10−6, ≥1000 cases (or ≥20,000 participants for continuous factors), P < 0.05 in the largest study, 95% prediction interval excluding the null, and no large between-study heterogeneity, small study effects, or excess significance. Across the 57 topics, large heterogeneity was observed in the proportion of associations fulfilling various quantitative criteria. The quantitative criteria were mostly independent from one another. Across 62 associations assessed in both RCTs and in observational studies, 37.1% had effect estimates in opposite directions and 43.5% had effect estimates differing beyond chance in the two designs. Across 94 comparisons assessed in both MR and observational studies, such discrepancies occurred in 30.8% and 54.7%, respectively. Conclusions Acknowledging that no gold-standard exists to judge whether an observational association is genuine, statistically significant results are common in observational studies, but they are rarely convincing or corroborated by randomized evidence.

Keywords