Understanding the World: Statistical Inference?

Could just as likely be somebody's misunderstanding, quite possibly mine. I learned to do the calculations but that does not equate to understanding does it? I don't know if I am brave or misguided.

Jan 21, 2025

Author’s Preface

I wade into realms where I am arguably under-powered. Under-powered of course is a term co-opted by statisticians and given a meaning, but it is a metaphor. However, my opinions here may be suspect. They may stem from my inadequate training in statistical thinking, geared towards application rather than conceptual foundations as it was and it was a very long time ago. They may just be an aspect of my cognitive decline. Still, let’s have at it:

The null hypothesis is a default assumption that there is no effect, no difference, or no relationship in the context of a study. Statistical significance testing evaluates whether observed data are unlikely under this assumption. If the data deviate sufficiently from what the null hypothesis predicts—based on a chosen probability threshold (e.g., 5%)—the null hypothesis is rejected, suggesting the data are inconsistent with H₀. However, this rejection does not prove any specific alternative; it merely indicates that H₀ may not adequately explain the observed results.

In logic, we have the assertion: A or B; not A; therefore, B, or alternatively, not B; therefore, A. That's just simple, traditional logic. However, frequentist statisticians seem to operate in a world that is illogical—at the very least, I would say illogical and incoherent with regards to testing a null hypothesis—failing to affirm that rejection of the null hypothesis implies that some other hypothesis is true. We don't have to specify a hypothesis; we just have to say that some other hypothesis is true.

And saying that the truths are only probabilistic has nothing to do with the incoherence of their position. So, they are not operating in the world of logic; they're operating in some other realm only known to statisticians and not understood by me, since I can't make heads or tails of it. I speculate that the statisticians who embrace that illogic are not clear-headed at all. They're living in some sort of Baroque fantasy world where up is down and left is right, and their assertions don't hold water, probabilities or not. Their assertions make no sense. Either that, or they are equivocating in that statistical null hypothesis rejection does not mean what rejection means in the real world. It must mean something else, like eating ice cream or going to the dentist, as opposed to rejection. On the other hand, I may just be an idiot.

It is not just real-world reasoning we're talking about; we're talking about the real-world meaning of words. And the statisticians have used an equivocation in meaning without people recognizing that's what they're doing. Their use of the word "rejection" is actually highly idiosyncratic, to say the least—entirely misleading, in fact—and people haven't noticed just how misleading the word "rejection" is. Or maybe statisticians know, but they're not letting the rest of the world know. And certainly, those people studying statistics without being mathematicians don't understand that that's not what statisticians mean by the word "rejection."

So, in their probabilistic fantasy world, their model is conceptually flawed. I don't mean mathematically flawed; I mean the concepts—the mapping to the real world—make no sense. They're incoherent, and no amount of hand-waving by statisticians will make them coherent.

We have this tripartite division of ways of looking at statistics:
1 – Deductive truth: There's mathematics, a formal system, which is true.
2 – Logical coherence of meaning: There's the application of that formal system to the real world, which is a conceptual mapping—others such as Box have written about this with far greater clarity than I can do.
3 – Fitness for purpose: We have the application of mathematical tools for dealing with problems. That's a third, separate realm, entirely different.

Those are three different realms, and you need to look at all three, because we can compare all three pairwise:

Deductive truth to logical coherence of meaning: The mathematics will be deductively valid, but the explanations could be conceptually unsound.
Deductive truth to fitness for purpose: The success or fitness for purpose the techniques have in the real world must be empirically verified or hypothesized, since a lot of these things can't be empirically verified because of the complexity of the world, if nothing else.
Logical coherence of meaning to fitness for purpose: The logical coherence of meaning has nothing to do with success or failure in the real world. That's empirical.

So, people seem to conflate those three realms without understanding that they've done so, because they don't make a clear distinction between the mathematics itself and how we conceptually explain what it's trying to do in the real world (mapping, if you will).

Of course it is a linguistic consideration. that is the point. Statisticians misuse language, making their meaning problematic (incoherent would be a better word). The goal of language is to communicate (unless one is a politician). If the conceptual foundations are logically sound in their application to the real world, then statisticians need to use something other than the word rejection, since that is a very technical and misleading term. We could come up with alternative wording but for instance "incompatibility with the null hypothesis" still implies that there is compatibility with some unspecified alternative hypothesis, so this is still linguistic slight of hand. Ditto for 'non-retention". This implies logically, in any of system of logic, that retention must be the case. Still linguistic incoherence. The mathematics may work, by the flaw is logic and conceptual incoherence. I know I am not the first person to make that observation.

The defense on practical grounds is conflated with the conceptual foundations, these are independent. However, success is not and can not be conclusively be demonstrated in complex realms, since this requires empiricism and not just simulation either. Such demonstration is not even possible in many complex cases. It is taken as matter of faith that the inference technique work in complex non-linear domains based on no good evidence at all. It is just supposition, faith based, not proven and not provable.

ChatGPT Goes to Work on My Arguments (with a bit of coaching – such a sophist!)

Introduction

Statistics occupies a unique space where mathematical rigor meets real-world application. However, its foundational frameworks, such as Null Hypothesis Testing (NHT), often provoke debates about their conceptual soundness, logical coherence, and practical utility.

This discussion seeks to critically evaluate whether the frequentist reliance on NHT aligns with logical principles, adequately maps to real-world phenomena, and fulfills its intended purposes, particularly in complex domains where empirical validation is elusive. While acknowledging limitations in formal training, this critique raises questions central to the effective and coherent application of statistical inference.

Issues:

Conceptual Soundness: Statistical assertions must have linguistic and logical coherence with respect to real-world phenomena. Null hypothesis testing, as structured in frequentist statistics, fails this standard because its terminology (e.g., "rejection") is equivocal and does not logically map to real-world implications or reasoning. The terms often create ambiguity and is fundamentally incoherent about what is being demonstrated or implied.
Empirical Validation in Complex Domains: In complex, non-linear systems, it is frequently impossible to empirically validate inference techniques. Challenges such as evolving conditions, confounding factors, and lack of replicability in these sorts of cases make definitive testing infeasible, not to mention numerous practical issues. Without empirical validation, claims of success remain speculative.
Faith-Based Assumptions: The continued reliance on statistical methods in complex systems often rests on assumptions rather than demonstrated evidence. These assumptions are not empirically justified but are instead taken on faith, relying on tradition, intuition, or the perceived success of these methods in simpler contexts. This faith-based reliance lacks rigorous proof and is fundamentally unprovable in many real-world scenarios.

My claims center on the fundamental incoherence of null hypothesis testing (NHT) and the inadequacy of explanations offered in its defense.

1. Fundamental Incoherence of NHT

My argument is that NHT, as practiced in frequentist statistics, fails a test of conceptual soundness because it does not adhere to coherent reasoning or logical principles that map effectively to the real world. Specifically:

Rejection of H₀: The rejection of the null hypothesis does not logically imply that any alternative hypothesis is true. Instead, it represents an ambiguous state where H₀ is deemed "incompatible" with the observed data at the chosen significance level, but no further positive claim is made about the truth of any other hypothesis.
Linguistic Equivocation: The term "rejection" carries misleading implications. In logic or everyday reasoning, "rejection" implies a definitive judgment—if A is rejected, not-A is true. However, in NHT, rejection does not have this logical clarity. It represents a probabilistic judgment tied to arbitrary thresholds (e.g., p < 0.05), which undermines its conceptual coherence.

My critique is not about misunderstanding probabilistic reasoning—it is about the internal inconsistency of NHT itself. By adopting terms like "rejection," statisticians create an appearance of decisiveness that is not logically or conceptually justified.

2. Inadequacy of Hand-Waving Defenses

I assert that frequentist statisticians often rely on hand-waving explanations to defend NHT, sidestepping its conceptual flaws rather than addressing them directly:

Practical Success: One common defense is that NHT "works" in practice. However, you argue that practical success is an empirical claim, requiring rigorous validation. In complex, non-linear domains, such validation is often impossible. Therefore, appeals to practical success lack the necessary empirical grounding.
Probabilistic Nature: Another defense is that NHT operates in a probabilistic rather than deterministic framework. While this is true, you argue that it does not resolve the incoherence of NHT’s conceptual foundations. Probabilistic reasoning can still be logical and coherent, but NHT fails to meet this standard because its terminology and reasoning do not align with real-world meanings or logical principles.
Faith-Based Assumptions: Defenders of NHT often rely on faith rather than evidence to assert its validity in complex systems. For example, they assume that p-values reliably indicate the strength of evidence against H₀, despite significant theoretical and practical critiques of p-values as measures of evidence.

Key Issues

1. Conceptual Incoherence

NHT's conceptual framework does not make sense when mapped to real-world reasoning or linguistic meanings. The equivocation in terms like "rejection" and the lack of logical clarity make its assertions incoherent. The probabilistic nature of NHT does not excuse this incoherence—it merely shifts the focus to a different kind of ambiguity.

2. Misleading Terminology

The use of terms like "rejection" and "non-retention" obscures the true nature of what NHT is claiming. These terms carry logical implications in everyday and formal reasoning that do not align with their statistical usage. This equivocation misleads practitioners and non-specialists into believing that NHT provides a level of certainty or decisiveness that it does not.

3. Inadequate Empirical Justification

In complex systems, where real-world validation is infeasible, the success of NHT is taken on faith rather than proven. Simulations are inadequate because they often rely on assumptions that circularly validate the methods under scrutiny. As a result, NHT's effectiveness in these domains remains speculative.

4. Conflation of Realms

Defenders of NHT often conflate:
1. The mathematical validity of NHT as a formal system.
2. The logical coherence of its conceptual framework.
3. The practical success of its applications.
This conflation obscures fundamental issues, as success in one realm does not imply validity in the others. For example, mathematically valid procedures can still be conceptually incoherent or practically useless.

Conclusion

My critique is not about misunderstandings of probabilistic reasoning but about the fundamental conceptual incoherence of NHT as a framework. This incoherence lies in its misuse of language, lack of logical clarity, and reliance on inadequate defenses like faith-based assumptions and hand-waving explanations.

My claim is this: Frequentist statisticians are operating in a fundamentally flawed conceptual paradigm that obscures its incoherence through equivocation, tradition, and misplaced defenses. This critique challenges not just the details of NHT but its foundational legitimacy as a tool for statistical inference.

My critique centres on both the philosophical and practical soundness of statistical inference techniques. I hope have not misstated the case, and my assertions are factually incorrect.

Have I misstated the case, is my interpretation based on incorrect assertions as to matters of fact? I really don’t know, but others more versed in these matters have come out with similar views. Here is a reading list, which pertains to the issues.

Readings:

Box, G. E. P. (1976). Science and Statistics. Journal of the American Statistical Association, 71(356), 791–799. Retrieved from https://mkweb.bcgsc.ca/pointsofsignificance/img/Boxonmaths.pdf

George Box's 1976 paper, "Science and Statistics," is particularly relevant to the discussion on the conceptual coherence of statistical models and their application to real-world phenomena. In this work, Box famously stated, "All models are wrong, but some are useful," emphasizing the importance of understanding the limitations and practical utility of statistical models.
Martin Krzywinski https://mkweb.bcgsc.ca/pointsofsignificance/img/Boxonmaths.pdf?utm_source=chatgpt.com

Schneider, J. (2014). The Misunderstood Role of Null Hypothesis Testing in Statistical Science. arXiv. Retrieved from https://arxiv.org/abs/1402.1089
Meehl, P. E. (1967). Theory-Testing in Psychology and Physics: A Methodological Paradox. Philosophy of Science. Retrieved from https://en.wikipedia.org/wiki/Replication_crisis
Gill, J. (1999). The Insignificance of Null Hypothesis Significance Testing. Journal of Politics. Retrieved from https://www.jstor.org/stable/pdf/449153.pdf
Rozeboom, W. W. (1960). The Fallacy of the Null-Hypothesis Significance Test. Psychological Bulletin. Retrieved from https://stats.org.uk/statistical-inference/Rozeboom1960.pdf
Tendeiro, J. N., & Kiers, H. A. L. (2019). Advantages Masquerading as Issues in Bayesian Hypothesis Testing. Behavior Research Methods. Retrieved from https://research.rug.nl/en/publications/advantages-masquerading-as-issues-in-bayesian-hypothesis-testing

Ephektikoi - Guerrilla Epistemologist

Discussion about this post