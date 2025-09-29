All parts of this essay here:

Varieties of Statistics

Divergent Traditions within Frequentism

Frequentist statistics is not a monolith. It has historically divided into several sub-schools, each with its own emphases and philosophical commitments. The original Fisherian approach, championed by R. A. Fisher, focused on methods such as maximum likelihood estimation, analysis of variance, and the use of p-values as indicators of significance. Fisher himself did not view the p-value as a definitive decision rule but rather as a measure of evidence against the null hypothesis.

By contrast, Jerzy Neyman and Egon Pearson developed a different framework centered on long-run error control. They introduced the concepts of Type I error (false positives) and Type II error (false negatives), as well as the notion of statistical power. Their view treated hypothesis testing as a form of decision-making under uncertainty, where one must balance the risks of different kinds of errors.

What emerged over time, especially in the postwar sciences, was a hybrid system that drew on both traditions. This system—now commonly called null-hypothesis significance testing (NHST)—combined Fisher’s p-value machinery with Neyman and Pearson’s binary accept/reject framework. The result was a method that became standardized in psychology, medicine, nutrition, and economics, even though it was not quite what Fisher, Neyman, or Pearson originally envisioned.

The Meta-Assumptions Underlying Frequentism

Despite their technical trappings, all of these approaches rest on meta-assumptions—assumptions about the applicability of the model to the world. These assumptions are not proven by the mathematics itself. They are imported from outside, often implicitly.

1. The assumption that the probabilistic model applies to the real-world process.

2. The assumption that the central limit theorem, or one of its variants, holds in the situation under study.

3. The assumption that data behave as though they are sampled from a stable distribution.

4. The assumption that the sample is representative of the population to which results are generalized.

5. The assumption that arbitrary thresholds (5%, 1%) meaningfully divide true from false effects.

Without these, the frequentist framework loses its link to reality. The mathematics may still be internally consistent, but it would no longer be a description of the world.

Fragility in the Soft Sciences

The difficulties of these assumptions become most evident in the soft sciences. In psychology, medicine, nutrition, and economics, the conditions for tractable distributions rarely hold. Human beings and their environments are not like dice or coins. They change with every iteration. Situations are confounded by countless variables, many unmeasured or unmeasurable. Replication is often unstable.

And yet, frequentist statistics has been widely applied to these domains. Researchers treat human trials as if they were coin flips, relying on large-sample approximations to justify probabilistic claims. But the conditions that make dice rolls tractable—independence, identical distribution, and stable repetition—do not hold for human populations, therapies, or social behavior. This mismatch between model and reality undermines the credibility of the conclusions.

The Role of the Central Limit Theorem

Central to frequentist reasoning is the central limit theorem (actually, a family of theorems). These results state that averages of independent, identically distributed random variables converge toward a normal distribution as the sample size increases. In statistics, this theorem is used as a bridge between messy data and the mathematically elegant normal curve.

But applying this theorem to the real world requires a leap of faith. One must assume that the data-generating process behaves like the random variables described in the theorem. One must assume that independence and identical distribution, though almost never literally true, hold “well enough” for the purposes of inference. These are not conclusions of the theorem; they are meta-assumptions imposed on the situation by the researcher.

The Null Hypothesis and Its Problems

From this mathematical scaffolding arises the null hypothesis—the assumption that there is no effect, no difference, no association. In many contexts, this is not only plausible but common: the norm in the world is often null effect. Most treatments, interventions, and supposed causal relationships turn out to be ineffective or weak when tested rigorously.

Yet the statistical framework does not allow us to directly estimate the probability of a real effect. Instead, it allows us only to say: if there were no effect, these are the probabilities of obtaining results as extreme as those observed. That is a very different claim. It is conditional, indirect, and easy to misinterpret. It says nothing about the probability that the effect is real; it only tells us about the likelihood of the data under the assumption of no effect.

Arbitrary Cutoffs and Their Consequences

Adding to the problem is the widespread use of arbitrary thresholds: 5% or 1% significance levels. These cutoffs are conventions, not laws of nature. They were adopted historically for convenience, but they now function as rigid markers of credibility. Results just above 0.05 are dismissed as “not significant,” while results just below are celebrated as “statistically significant.” The difference, however, is trivial—merely a few decimal points.

This reliance on thresholds illustrates the irrationality embedded in the system. A result at p = 0.051 and a result at p = 0.049 are practically indistinguishable, yet they are treated as categorically different.

From Experiments to Populations

Perhaps the most questionable step of all is the generalization from experimental groups to entire populations. Researchers often know that their samples are not representative. Psychology studies are conducted on college undergraduates; medical studies often rely on small, homogeneous groups; nutritional studies rely on self-reported data with countless confounds. Yet, from these shaky samples, researchers make sweeping claims about human populations.

The leap from sample to population is itself a meta-assumption—one rarely justified and often false. Yet it is performed routinely, because the statistical framework demands it.

A Flawed but Entrenched Practice

Thus, psychology, medicine, nutrition, and related fields are built on this edifice of shaky reasoning. The frequentist tradition, with its reliance on the central limit theorem, the null hypothesis, arbitrary cutoffs, and unjustified generalizations, has been critiqued for generations. Scholars have repeatedly pointed out its flaws. Yet it remains entrenched, partly because of institutional inertia, partly because of lack of alternatives, and partly because of the illusion of rigor it provides.

In the end, the system persists not because it is logically or empirically sound, but because it has become the norm. As with the proverbial drunk searching for his keys under the lamppost, researchers keep using these methods not because they are appropriate to the problem, but because these are the only tools available where the light shines.

Shifting Grounds of Discourse in Applications Versus Theory

Minimal Assumptions at the Outset

When beginning from first principles, one can adopt a very spare set of assumptions about the real world. A few words are required that must in some way connect to lived reality: objects exist, events occur, and outcomes can be distinguished. These serve as anchor points tying abstract reasoning back to the empirical world. Such assumptions are minimal and intentionally modest. They create just enough grounding to allow probabilistic or mathematical discourse to begin, without loading the framework with unnecessary metaphysical baggage.

The Slippage Between Worlds

In practice, however, mathematicians who work with probability often slip between two very different domains of discourse:

1. Real-world events and objects – coins tossed, dice rolled, raindrops falling, or patients responding to treatments.

2. Mathematical events and objects – random variables, outcome spaces, distributions, and functions.

The transition from one to the other happens almost imperceptibly. A “fair die,” for example, begins as a physical artifact in the world, but soon it becomes an idealized object in a probability model. “Outcomes” begin as faces of a cube, but in discourse they quickly morph into elements of a mathematical set. This movement back and forth happens so smoothly that it often goes unnoticed, even by those making the arguments.

The Lack of Acknowledgment

What makes this slippage problematic is not the act of abstraction itself, but the lack of acknowledgment that it has occurred. Few mathematicians pause to say: here we are speaking of the mathematical die, not the physical one. The discourse flows seamlessly from material objects to symbolic structures, as though the two were interchangeable. But they are not. The physical die can be chipped, unbalanced, or influenced by air resistance, while the mathematical die is perfectly symmetrical and infinite in repeatability. The unannounced crossing of boundaries hides the fact that assumptions have been smuggled in during the transition.

The Demands of Discipline

To avoid this conflation would require immense discipline. Every time a shift occurred, the mathematician would need to mark it explicitly: Now we are speaking of the model; now we are returning to the real-world analog. Few, if any, do this consistently. The convenience of discourse—and the shared understanding of mathematical shorthand—encourages the blending of references. It is easier to speak as though the world is the model, rather than carefully distinguishing between the two.

Consequences of the Slippage

This shifting of grounds has consequences. When models are applied back to the real world, the borrowed clarity of mathematics lends a false impression of certainty. The rigor of the symbolic framework is mistaken for rigor in the empirical domain. Claims about real-world outcomes come to be treated as though they were as precise and necessary as the equations themselves, even though the equations only applied under idealized assumptions.

Thus, the constant but unacknowledged oscillation between theory and application is not a trivial matter. It is central to why probabilistic reasoning appears more solid than it really is. The mathematics is internally precise; the world is messy. But in discourse, the two are blurred together, and the limits of applicability are obscured.

Understanding Goes from Concrete to Abstract

The Necessity of Concrete Anchors

Mathematical reasoning, for all its apparent detachment, requires grounding in concrete examples for comprehension. Numbers on a page or symbols in an equation mean little until they are tied to something recognizable: the toss of a coin, the roll of a die, the movement of a planet, the growth of a population. Without these anchors, the symbols remain opaque, inert signs without substance. For most people, even those trained in mathematics, understanding begins not with abstraction but with a clear sense of what the numbers stand for in the world.

Abstraction as a Secondary Step

Once this grounding is secured, abstraction becomes possible. The physical coin becomes the idealized coin; the real-world dice rolls become random variables; the messy trajectories of planets become clean ellipses. At that point, the mind can operate on the abstractions themselves, manipulating them without constant recourse to the physical examples. But abstraction is not primary—it builds on prior familiarity with the concrete. The learning process, both historically and individually, appears to move from example to generalization, from the particular to the universal, from the tangible to the symbolic.

The Universality of the Path

It is doubtful that even the most advanced mathematicians are exempt from this pattern. Behind every theorem lies some image, analogy, or example that first gave it shape. Even if such aids later recede into the background, they remain the scaffolding on which the abstract edifice was constructed. A proof may ultimately be written in symbols alone, but the act of conceiving the problem almost always begins with something concrete—an intuitive picture, a familiar situation, a mental model tied to the world.

Einstein and the Role of Thought Experiments

Einstein provides a well-known case. His revolutionary insights into relativity emerged not from formal manipulation of equations alone, but from thought experiments: riding on a beam of light, watching clocks in moving trains, imagining elevators in free fall. These were not literal experiments; they were highly idealized scenarios, simplified almost to absurdity. Yet they were always grounded in physical considerations—what it means to move, to accelerate, to measure time and light. By stripping away complexities, he could follow the logic of the situation to its conclusions. The abstractions followed, but only after the concrete imaginings.

The Status of Mathematicians and Abstraction

There is a claim—though its accuracy may be debated—that Einstein was not regarded as a “high-level” mathematician by those who specialized in pure mathematics. His genius was not in manipulating symbols at the highest level of abstraction, but in crafting physical scenarios that clarified the stakes of a problem. Whether or not this assessment is fair, it illustrates the point: Einstein’s creativity came not from dwelling in pure abstraction, but from insisting that understanding must begin with something concrete, imaginable, graspable, before one ascends to the rarefied heights of formalism.

Concrete Before Abstract

The broader lesson is that abstraction without prior concreteness is hollow. Symbols must point to something. Proofs must rest on intuitions that are anchored, however loosely, in lived or imaginable experience. The concrete precedes the abstract—not always as a matter of chronology, but as a matter of intelligibility. Even when the mathematics later detaches and soars into its own autonomous domain, its roots remain in the soil of the concrete. Without that soil, the abstract withers into meaningless manipulation.

Validation of Models Against the Empirical World

Success in Simple Domains

There are certain domains where probabilistic reasoning has shown itself to be useful and reliable. These are the domains where the conditions most closely approximate the assumptions of the models: coin flips, dice rolls, roulette wheels, other games of chance, and some areas of industrial quality control such as statistical process control. In these cases, events can be repeated under relatively stable conditions. A coin can be flipped again and again; a die can be rolled thousands of times; a roulette wheel can be spun under consistent mechanical arrangements. In such contexts, empirical frequencies converge toward theoretical distributions. The law of large numbers and central limit tendencies appear to operate with some degree of success.

Limits in Complex or Soft Domains

Outside these narrow domains, however, the situation changes dramatically. In most of the fields where probability and statistics are now used—psychology, medicine, nutrition, economics, education—the conditions for repeatability and stability do not exist. Every trial is somewhat different. Populations shift, environments change, measurement instruments vary, and causes overlap and interact in ways that cannot be cleanly isolated. Replication efforts in these fields have repeatedly failed, often with success rates hovering around 60% at best, sometimes lower. Even that figure may overstate the case, since chance alone could account for part of the apparent replication. The empirical grounding for probabilistic claims is therefore far weaker in these soft domains than in the games of chance from which the statistical models were originally derived.

The Problem of Situatedness

The underlying issue is situatedness. Probabilistic reasoning works only when the conditions of the trials are sufficiently similar to be grouped together as belonging to the same class of events. Dice rolls can be considered equivalent because the die, the table, and the action of rolling are similar enough each time. But when studying human beings, ecosystems, or economic behavior, the situations differ in subtle and not-so-subtle ways that make them non-equivalent. The “class” of events becomes ill-defined. Without repeatability under similar conditions, the link between theoretical distribution and empirical outcome breaks down.

The Leap of Faith

Because of this, much of the application of probability to soft domains rests on faith rather than demonstration. The theoreticians assure us that the models apply, but replication does not bear this out consistently. Some theoreticians themselves acknowledge that the models should not be applied in such contexts, while others push ahead regardless. This creates a conundrum: the very tool—probability theory—used to justify claims of scientific rigor is one that its own practitioners sometimes admit does not apply.

Entrenched Practices and Weak Defenses

Despite these issues, the use of probabilistic models in the soft sciences continues largely unquestioned. Researchers and statisticians employ null-hypothesis testing, confidence intervals, and p-values as though these were unquestionable standards. When challenged, the defense is often weak: what else do we have? But this is not a scientific justification—it is an appeal to convenience. It is the same logic as the drunk searching for his lost keys under the streetlamp, not because he lost them there, but because the light is better. The tools are used not because they fit the problem, but because they are available.

The Challenge of Continuous and Unstable Cases

The difficulty is compounded in cases involving continuous outcomes or unstable processes. Verification requires some stability in the phenomenon under study, enough that repeated measurements under similar conditions produce comparable results. But in many real-world contexts, conditions shift so quickly and outcomes vary so widely that stability is absent. Without stability, the idea of empirical validation collapses. Continuous measures only heighten this difficulty: whereas dice or coins yield discrete, finite outcomes, continuous variables require binning or grouping to even approximate tractability. These groupings are themselves choices, not natural facts, and they add another layer of situatedness.

Convergence in Simple Cases

Nevertheless, in simple and tightly constrained domains, convergence between theoretical and empirical distributions does occur. With enough trials, the pattern begins to resemble the theoretical model. This is why probabilistic reasoning works so well for dice and coins: they provide an environment where assumptions are approximately satisfied, where large numbers of trials can be carried out, and where outcomes are clearly defined and enumerable. In these situations, the models and the world align well enough to justify the mathematics.

The Broader Lesson

The broader lesson is that validation of probabilistic models depends entirely on context. In domains where repetition under similar conditions is possible, probabilistic reasoning is supported by empirical verification. In domains where such repetition is impossible, its application becomes speculative, a matter of faith rather than demonstration. Probabilistic reasoning, therefore, is not universally valid but situated: it holds where conditions allow and falters where conditions cannot be stabilized.

Summary

This essay explores the situated nature of reasoning, emphasizing that both causality and probabilistic descriptions are context-bound. Causality underpins all meaningful understanding of the world, yet its expression varies with scale, circumstances, and the weight of different causal factors. Probabilistic reasoning emerges only when events can be framed as repeatable within defined conditions, such as dice rolls or other constrained systems, and even then it depends on idealizations and meta-assumptions.

The central claim is that variability is the norm, causality provides coherence, and probability is a linguistic and mathematical tool we use to describe patterns within that variability. But such descriptions are always situated—shaped by decisions about what to measure, how to group outcomes, and what contexts to treat as comparable. To mistake probabilistic models for features of the world itself is incoherent; they remain abstractions layered onto a causal and variable reality.

Afterword

Range of Responses

This essay covers a wide terrain, drawing together questions of causality, probability, situatedness, measurement, and the limits of statistical reasoning. Inevitably, the claims advanced here will meet with varied responses. Some readers will find themselves in agreement, recognizing in these arguments reflections of their own doubts about the application of probability in complex or unstable domains. Others will object strongly, defending the prevailing orthodoxy of statistical practice or advancing alternative philosophical accounts of causality and randomness.

Consensus and Truth

Yet consensus itself is not the measure of truth. Agreement may reflect nothing more than shared assumptions or the inertia of institutional convention. Disagreement may arise from entrenched habits of thought or from professional incentives to preserve the status quo. The truth or falsity of the arguments does not depend on how many voices align for or against them. Consensus can lend social weight, but it cannot confer epistemic legitimacy.

Defensibility of the Position

The positions advanced here are defensible, even if they are not uniformly popular. They rest on clear logical distinctions: between variability and probability, between mathematical models and the real-world processes they claim to describe, between epistemic limitation and ontological randomness. These distinctions, once brought into focus, cannot be easily dismissed. They reveal the fragility of much contemporary reasoning and highlight the need for greater caution when applying abstract models to empirical domains.

Intellectual Precedents

Moreover, these concerns are not unique to this essay. They echo themes raised by others—philosophers of science, statisticians, methodologists—many of them far more credentialed, more widely published, and more firmly established in their disciplines. Thinkers such as Nancy Cartwright, with her critique of nomological machines; John P. Ioannidis, with his exposure of the replication crisis; Gerd Gigerenzer, with his critique of statistical ritual; and even Einstein, with his skepticism about probabilistic metaphysics, have all voiced concerns that resonate with the arguments presented here.

Final Reflection

If these arguments are not popular, that is to be expected. They challenge habits of thought, institutionalized methods, and professional identities. Yet defensibility does not depend on popularity, and truth does not bend to consensus. The task, then, is to state these points clearly, to preserve the distinctions that matter, and to resist the temptation to confuse mathematical convenience with empirical reality. To do less would be to acquiesce in a set of practices whose flaws are already evident, but which persist more from inertia than from sound justification.

Readings (with annotations)

Probability, Statistics, and Their Limits

Cartwright, N. (1999). The dappled world: A study of the boundaries of science. Cambridge University Press.

— Argues against the universality of scientific laws, emphasizing “nomological machines” and local conditions for probabilistic reasoning. Central to the notion of situated knowledge and the fragility of models.

Gigerenzer, G. (2002). Reckoning with risk: Learning to live with uncertainty. Penguin.

— Critiques conventional probabilistic reasoning, showing how probability can mislead when applied to everyday and social domains. Highlights misunderstandings of statistical models in medicine and public policy.

Hacking, I. (2001). An introduction to probability and inductive logic. Cambridge University Press.

— Explores the foundations of probability, induction, and inference, clarifying conceptual issues around probabilistic claims. Useful for grounding the epistemological critique.

Howie, D. (2002). Interpreting probability: Controversies and developments in the early twentieth century. Cambridge University Press.

— Examines debates over the meaning of probability, including frequentist, Bayesian, and logical interpretations. Situates statistical traditions in historical context.

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.

— A seminal paper exposing the replication crisis in medicine and psychology, illustrating how probabilistic reasoning fails in soft domains.

Mayo, D. G. (1996). Error and the growth of experimental knowledge. University of Chicago Press.

— Examines statistical inference through error-probabilistic philosophy. Useful for analyzing whether probabilistic reasoning really grounds knowledge claims.

Taleb, N. N. (2007). The black swan: The impact of the highly improbable. Random House.

— Critiques the misuse of probability in finance and science, emphasizing rare events and unpredictability. Relevant to arguments about variability, curve fitting, and fragility of models.

Causality, Variability, and Induction

Hume, D. (1748/2007). An enquiry concerning human understanding. Oxford University Press.

— Classical treatment of causality as regularity and induction. Important for situating critiques of causal denial and the limits of “constant conjunction.”

Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge University Press.

— Provides a formal framework for reasoning about causality, contrasting with probabilistic traditions. Essential for distinguishing causal reasoning from probabilistic regularities.

Suppes, P. (1970). A probabilistic theory of causality. North-Holland.

— Explores how probability interacts with causal reasoning. Offers a formal but problematic framework that reflects the tensions discussed in the essay.

Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford University Press.

— Presents an interventionist account of causation, where manipulation and control ground causal claims. Resonates with the essay’s emphasis on causal weight and practical control.

Epistemology and Philosophy of Science

Einstein, A. (1949). Autobiographical notes. In P. A. Schilpp (Ed.), Albert Einstein: Philosopher–scientist (pp. 1–94). Open Court.

— Reveals Einstein’s skepticism toward probabilistic metaphysics, reinforcing the claim that randomness is epistemic, not ontological.

Lakatos, I. (1976). Proofs and refutations: The logic of mathematical discovery. Cambridge University Press.

— Shows mathematics as an evolving, argumentative practice rather than a realm of Platonic certainty. Connects to the essay’s theme of proofs as persuasive arguments.

Polanyi, M. (1966). The tacit dimension. University of Chicago Press.

— Develops the idea that knowledge is situated and cannot be fully formalized. Directly relevant to measurement choices, situated reasoning, and causal weight.

Quine, W. V. O. (1969). Ontological relativity and other essays. Columbia University Press.

— Explores the indeterminacy of meaning and the relativity of conceptual frameworks. Useful for analyzing how probability depends on language and situated descriptions.

Wittgenstein, L. (1953). Philosophical investigations. Blackwell.

— Critiques word games and conceptual confusions, providing tools to analyze pseudo-problems like “is mathematics discovered or invented?”

Wittgenstein, L. (1922/1961). Tractatus logico-philosophicus. Routledge.

— Addresses the limits of language and representation. Connects to the essay’s concerns about slipping between models and the real world.

Replication, Soft Science, and Methodological Critiques

Collins, H. M. (1992). Changing order: Replication and induction in scientific practice. University of Chicago Press.

— Explores the challenges of replicating experiments, especially in soft domains, showing replication as a social and situated process.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.

— Large-scale replication study showing how probabilistic claims in psychology often fail under repeated trials.

Ziliak, S. T., & McCloskey, D. N. (2008). The cult of statistical significance: How the standard error costs us jobs, justice, and lives. University of Michigan Press.

— Critiques null-hypothesis significance testing as a ritual disconnected from practical meaning. Essential for the essay’s critique of frequentism.

Mathematics, Abstraction, and Situated Reasoning

Lakoff, G., & Núñez, R. (2000). Where mathematics comes from: How the embodied mind brings mathematics into being. Basic Books.

— Argues that mathematics is a specialized form of human language grounded in embodied metaphors. Directly relevant to claims that mathematics is linguistic and situated.

Nagel, E. (1961). The structure of science: Problems in the logic of scientific explanation. Harcourt, Brace & World.

— Classic philosophy of science text examining explanation, laws, and reduction. Helps situate probabilistic reasoning within broader explanatory strategies.

Von Mises, R. (1957). Probability, statistics and truth. Dover.

— Early foundational work arguing for a frequency interpretation of probability. Useful as a foil for critiques of frequentist reasoning.

