Statistical Rituals: The Replication Delusion and How We Got There
Gerd Gigerenzer gigerenzer@mpib-berlin.mpg.deView all authors and affiliations
https://doi.org/10.1177/2515245918771329
Note: Although trained in statistics for experimental psychologists in graduate school, I now question their utility. I have been looking into this over the last few months. A lot of the critiques are very technical and beyond my pay grade. Given the endemic use of flawed statistical method in areas of critical importance such as medicine, I think the scientific world needs to wake up (however unlikely that may be). Here is one perspective which I think is mostly correct:
Summary by ChatGPT 4.0
Gerd Gigerenzer's 2018 article, "Statistical Rituals: The Replication Delusion and How We Got There," delves into the replication crisis in psychological science, attributing it to both external incentives and internal methodological practices.
The Replication Crisis and External Incentives
Gigerenzer begins by highlighting the replication crisis, where numerous scientific findings, especially in psychology and biomedical sciences, have failed to be reproduced. He acknowledges that external factors, such as the "publish or perish" culture and the emphasis on metrics like the h-index, have incentivized researchers to prioritize quantity over quality. This environment encourages practices that may compromise the integrity of scientific research.
The Null Ritual and Its Consequences
Beyond external pressures, Gigerenzer introduces the concept of the "null ritual," a methodological approach prevalent in psychological research. This ritual involves:
Setting up a null hypothesis of no effect.
Using a 5% significance level to decide whether to reject the null hypothesis.
Concluding that a significant result confirms the research hypothesis.
He argues that this ritualistic adherence eliminates critical judgment at points where statistical theories demand it, leading to several misconceptions:
Replication Delusion: The mistaken belief that a p-value directly indicates the probability of replicating results. For instance, interpreting a p-value of 0.01 as a 99% chance of replication success. Studies have shown that this delusion exists among 20% of faculty teaching statistics, 39% of professors and lecturers, and 66% of students.
Illusion of Certainty: The false assumption that statistical significance proves the existence of an effect, disregarding the possibility of Type I errors or the influence of sample size.
Bayesian Wishful Thinking: The incorrect interpretation that the probability of the alternative hypothesis being true is 1 minus the p-value, which is a misapplication of Bayesian reasoning.
Historical Context and Educational Implications
Gigerenzer traces the origins of these statistical rituals to the mid-20th century, noting that they have become deeply embedded in the training and practice of psychologists. He emphasizes that these rituals are not inherent to statistical theory but have been perpetuated through education and practice without sufficient critical examination.
Recommendations for Reform
To address these issues, Gigerenzer advocates for:
Educational Reform: Transitioning from teaching statistical rituals to fostering genuine statistical thinking. This involves encouraging critical judgment and a deeper understanding of statistical concepts rather than rote application of procedures.
Editorial Policy Changes: Urging journal editors to move away from accepting manuscripts based solely on the dichotomy of 'significant' or 'not significant' results. Instead, a more nuanced interpretation of data should be encouraged, considering effect sizes, confidence intervals, and the broader context of findings.
Conclusion
Gigerenzer concludes that the replication crisis is not solely a product of external incentives but is also deeply rooted in internal methodological practices that have become ritualized over time. By challenging these rituals and promoting a more thoughtful and critical approach to statistical analysis, the scientific community can enhance the reliability and validity of research findings.
For a comprehensive understanding, the full article is available here: