Computational Ecology and Software (Sep 2022)

p-value based statistical significance tests: Concepts, misuses, critiques, solutions and beyond

  • WenJun Zhang

Journal volume & issue
Vol. 12, no. 3
pp. 80 – 122

Abstract

Read online

The p-value is at the heart of statistical significance tests, a very important issue related to the role of statistical inference in advancing scientific discovery. Over the past few decades, p-value based statistical significance tests have been widely used in most statistics-related research papers, textbooks, and all statistical software around the world. Numerous scientists in various disciplines hold the p-value as the gold standard for statistical significance. However, in recent years, the p-value based statistical significance tests have been questioned unprecedentedly, mainly because the paradigm of significance tests is wrong, p-value is too sensitive, p-value is a dichotomous subjective index, and statistical significance is related to sample size, etc. Scientific research can only be falsified, not confirmed. p-value based statistical significance tests are one of the sources of false conclusions and research reproducibility crisis. For this reason, many statisticians advocate to abandon p-value based statistical significance tests and replace them with effect size, Bayesian methods, meta-analysis, etc. Scientific inference that combines statistical testing and multiple types of evidence is the basis for producing reliable conclusions. Reliable scientific inference requires appropriate experimental design, sampling design, and sample size; it also requires full control of the research process. For complex and time-varying problems, the network or systematic methods should be used instead of the reductionist methods to obtain and analyze data. To change the scientific research paradigm, the paradigm of multiple repeated experiments and multi-sample testing should be adopted, and multiple parties should verify each other to improve the authenticity and reproducibility of the results. In addition to writing, publishing and adopting new statistical monographs and textbooks, the most urgent task is to revise and distribute various statistical software in the new versions based on the new statistics for further use. Before the popularization of new statistics, what we can do is to improve data quality, strict p-value levels of statistical significance tests, use more reasonable analysis methods or testing standards, and combine statistical analysis and mechanism analysis, etc.

Keywords