BMC Medical Research Methodology (Sep 2020)

Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise

  • Zad Rafi,
  • Sander Greenland

DOI
https://doi.org/10.1186/s12874-020-01105-9
Journal volume & issue
Vol. 20, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background Researchers often misinterpret and misrepresent statistical outputs. This abuse has led to a large literature on modification or replacement of testing thresholds and P-values with confidence intervals, Bayes factors, and other devices. Because the core problems appear cognitive rather than statistical, we review some simple methods to aid researchers in interpreting statistical outputs. These methods emphasize logical and information concepts over probability, and thus may be more robust to common misinterpretations than are traditional descriptions. Methods We use the Shannon transform of the P-value p, also known as the binary surprisal or S-value s = −log2(p), to provide a measure of the information supplied by the testing procedure, and to help calibrate intuitions against simple physical experiments like coin tossing. We also use tables or graphs of test statistics for alternative hypotheses, and interval estimates for different percentile levels, to thwart fallacies arising from arbitrary dichotomies. Finally, we reinterpret P-values and interval estimates in unconditional terms, which describe compatibility of data with the entire set of analysis assumptions. We illustrate these methods with a reanalysis of data from an existing record-based cohort study. Conclusions In line with other recent recommendations, we advise that teaching materials and research reports discuss P-values as measures of compatibility rather than significance, compute P-values for alternative hypotheses whenever they are computed for null hypotheses, and interpret interval estimates as showing values of high compatibility with data, rather than regions of confidence. Our recommendations emphasize cognitive devices for displaying the compatibility of the observed data with various hypotheses of interest, rather than focusing on single hypothesis tests or interval estimates. We believe these simple reforms are well worth the minor effort they require.

Keywords