Conclusions from: Statistical Rituals: The Replication Delusion and How We Got There
Original by Gerd Gigerenzer at link below.
Conclusions of Gerd Gigerenzer’s “Statistical Rituals: The Replication Delusion and How We Got There”
(Advances in Methods and Practices in Psychological Science, 2018)
1. The Replication Crisis Is Not Only About Incentives
While much commentary attributes the replication crisis in psychology and biomedical sciences to external factors such as academic incentives, Gigerenzer argues this is insufficient. He introduces a complementary explanation: many researchers have internalized a “statistical ritual” that substitutes mechanical inference for scientific judgment. This internalization creates systematic delusions about the meaning of statistical results, especially p-values.
2. The “Null Ritual” Is a Hybrid Misconstruction
What Gigerenzer terms the null ritual—automatically setting a null hypothesis of zero effect, using a 5% significance threshold, and interpreting results dichotomously—is not found in proper statistical theory. It is a hybrid of Fisher's significance testing and Neyman-Pearson decision theory, combined without acknowledgment of their incompatible philosophical foundations. This ritualistic procedure:
Eliminates researcher judgment.
Encourages automatic interpretation of p-values.
Becomes detached from actual hypotheses or scientific reasoning.
3. Pervasive Delusions About p-values Undermine Scientific Understanding
Empirical studies show that researchers and students widely misinterpret p-values, including:
Believing that p = .01 implies a 99% chance of replication (replication fallacy).
Believing that statistical significance proves a hypothesis is true or the null is false (illusion of certainty).
Believing that p or 1 – p reflects the probability that the null or alternative hypothesis is true (Bayesian wishful thinking).
These delusions are common among both instructors and students, and also appear among physicians and medical researchers, where such misunderstandings can have life-threatening consequences.
4. Statistical Power Is Neglected
Despite the critical role of power in detecting real effects and ensuring replicability:
The average statistical power in psychology remains below 50%, akin to a coin flip.
There has been no significant improvement in power over five decades, despite repeated warnings.
This suggests that researchers are not designing studies optimally, even when it would serve their strategic interests, supporting the notion of internalized ritual rather than strategic opportunism.
5. Questionable Research Practices Are Widespread
Anonymized surveys indicate that a vast majority of psychologists engage in practices such as:
Selectively reporting significant outcomes.
Collecting more data until significance is achieved.
Rounding down p-values.
These practices reflect the dominance of significance as a surrogate goal, overriding concern for methodological rigor or theoretical clarity.
6. The Replication Crisis Is Rooted in Surrogate Scientific Practices
Gigerenzer argues that statistical significance has replaced scientific judgment as the main criterion of value in research. This reflects a broader trend:
Surrogate indicators (p-values, h-indices, publication counts) have displaced direct evaluation of scientific merit.
Administrators and reviewers often rely on these numerical proxies, undermining substantive evaluation and fostering ritualized publication behavior.
7. Proposed Reforms to Restore Scientific Judgment
Gigerenzer proposes four reforms aimed at breaking the ritualistic cycle:
Ban the reporting of “significant” or “non-significant” results; report exact p-values.
Clearly separate exploratory from confirmatory research; p-values should only appear in the latter.
Encourage competitive hypothesis testing rather than null-hypothesis testing.
Teach a statistical toolbox in psychology—covering Fisher, Neyman-Pearson, Bayesian inference, exploratory data analysis, and meta-analysis—with emphasis on judgment, context, and assumptions, not rote procedures.
8. Final Message: Return to Judgment-Based Science
The replication crisis reflects a deeper epistemological failure: the replacement of reasoning with routine. To address this, science must:
Reassert judgment, replication, transparency, and model-based reasoning as primary virtues.
Abandon the ritual of significance testing as a universal gateway to publication and credibility.
Gigerenzer concludes that meaningful scientific progress requires fewer but better studies, guided not by mechanical thresholds, but by clear hypotheses, methodological care, and openness to uncertainty.

