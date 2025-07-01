Mathematical probability models are like maps of the world but they are constructs within formal systems. They exist as idealizations: smooth curves, closed-form distributions, and symmetrical assumptions. Their application to real-world data involves a leap from form to function.

Author’s Preface

This essay examines the foundations and limits of probabilistic reasoning, focusing particularly on the distinction between closed and open systems. It critiques the possible category mistake made when statistical tools, which were originally developed to model variability in systems with clearly defined and enumerable outcomes, are applied to open systems characterized by complexity, instability, and epistemic opacity. The argument builds upon prior essays in this series concerning the interpretive nature of proof and the fragile applicability of formal mathematical results—such as the Central Limit Theorem—in empirical domains. The discussion includes an analysis of fairness in dice, deterministic modelling, and the replication crisis in the behavioural and biomedical sciences, drawing attention to foundational assumptions that are frequently overlooked or misapplied.

Introduction

Probability theory originated in the analysis of structured systems, such as games of chance, where outcomes could be exhaustively enumerated and modelled with combinatorial logic. The resulting distributions were not empirical artifacts but idealized constructs derived from closed-world assumptions. However, these tools—designed for well-specified domains—have been widely exported to messy, open-world contexts. The central concern of this essay is the unjustified extension of probabilistic reasoning from closed systems to open systems. This move, often presented as if it were a matter of logical necessity, is better understood as a contingent extrapolation that must be empirically validated.

The argument proceeds in five parts: first, by describing the structure and empirical validation of probabilistic reasoning in closed systems; second, by questioning the application of statistical tests like the t-test or ANOVA to such domains; third, by analyzing idealization and the disjunction between mathematical models and real-world systems; fourth, by exploring the concept of fairness as a systemic rather than object-level property; and fifth, by addressing the misuse of probabilistic models in open systems and the foundational flaws this reveals in contemporary scientific practice.

Discussion

Probabilistic Reasoning in Closed Systems

Closed systems are those in which all possible outcomes are known in advance, and where the conditions under which events occur can be enumerated and stabilized. Classic examples include coin flips, dice rolls, and card draws from a standard deck. In these systems, the universe of possibilities is finite, bounded, and symmetric. The probabilities derived from such systems are not empirical in origin but are established through combinatorial reasoning.

Consider a six-sided die. The probability of rolling a three is computed not from observed outcomes over time but from the fact that there are six equally likely outcomes. These results can be tested against empirical reality—rolling the die many times—to see whether the real-world frequencies converge toward the theoretical distribution. In general, they do, assuming no cheating and a reasonably balanced and symetrical die. This correspondence gives these models their empirical utility in closed contexts.

The frequentist assertion—that probabilities represent long-run frequencies—hinges on a similarly bounded logic. The phrase “long run” is undefined, but it is usually taken to mean either a large number of trials or, in some formulations, a hypothetical infinite sequence. These assumptions are tenable only in contexts where events are stable, repeatable, and isolated from uncontrolled influences—in short, only in closed systems.

Misuse of Statistical Tools in Simple Probabilistic Domains

Statistical tools such as t-tests and ANOVA are often used to analyze differences between group means or variance components. But their foundational logic rests on assumptions appropriate to structured, closed-world models. Applying such tools to problems that are already governed by exact combinatorial probabilities—such as rolling a die or flipping a coin—could be conceptually backwards.

In domains where probabilities are already known and derivable through enumeration, using statistical inference to re-discover them is redundant at best and confused at worst. There is no need for a confidence interval around the probability of rolling a six on a fair die; it is already established as 1/6 through logical reasoning. Such tools are designed not for deduction from first principles but for estimation and hypothesis testing under uncertainty.

The broader problem is the unreflective use of statistical machinery. Instead of critically examining whether these tools are appropriate to the system under study, researchers often apply them by default. This may and probably does lead to spurious results, false confidence, and deeply misleading interpretations.

Idealization and the Gap Between Model and World

Mathematical probability models are like maps of the world but they are constructs within formal systems. They exist as idealizations: smooth curves, closed-form distributions, and symmetrical assumptions. Their application to real-world data involves a leap from form to function.

Even if the assumptions of a model are known to hold in the real world—a condition rarely achieved—it does not follow that the model will yield valid conclusions. However the former position is a widely asserted supposition, but not a theorem.

In the case of the Central Limit Theorem (CLT), the formal result requires assumptions such as large sample size, independence of observations, identical distribution, and finite variance. But the claim that if these conditions hold in a real-world context, then the CLT accurately describes the behavior of sample means, is not provable within the theory. It is an assertion that the model maps reality—a claim that must be demonstrated, not assumed.

Such demonstrations require empirical validation. No logical argument alone can prove that a model, no matter how elegant, applies to a given empirical domain. This is often forgotten in statistical pedagogy and practice, where theorems are presented as licenses for inference, rather than as formal constructs whose real-world applicability is always contingent.

The Misconception of Fairness and the Role of the System

Some confusion may arise from discussions of "fair dice" and other random devices. Fairness is often treated as a property of the object itself—e.g., that the die is balanced, or the coin is symmetric. But fairness is not an intrinsic quality of the object. It is a systemic property that arises from the interaction of the object with its context. With respect to the object itself, fairness is the abscence of unfairness, of bias.

In theory, a die could be perfectly shaped but thrown in such a way that one face is always more likely to appear. A coin could be flipped with a bias in motion or caught in a way that favors one side. Fairness, in this sense, is not about the object alone but about the full set of conditions governing the outcome—the system.

This insight has implications for how randomness is conceptualized. Randomness is not located in an object but in a system of variability. And once systems are understood as the unit of analysis, it becomes clear that even deterministic machines could, in principle, replicate the results of "random" processes if designed with sufficient control.

Deterministic Machines and the Boundary of Probabilistic Thinking

It is conceptually plausible—I suspect even practically feasible with current precision technology—to construct a deterministic machine that always produces a specific die outcome. Such a device would eliminate probabilistic variability entirely. It would show that even within systems usually thought of as random, determinism can prevail if enough control is exerted.

This thought experiment underscores the fragility of randomness as a modelling tool. If randomness depends on the lack of control, then randomness is simply a placeholder for epistemic ignorance. The more that is known and controlled, the less random the system becomes. Thus, probability models are not ontologically foundational—they are artifacts of limited knowledge.

The Category Mistake of Applying Probabilistic Models to Open Systems

The crux of the argument lies here. Open systems—such as psychological behaviour, economic performance, medical diagnosis, or nutritional response—are not bounded, stable, or isolable in the way closed systems are. Their variables are not fully known, their influences are not independent, and their outcomes are not reliably enumerable. Outcomes are not stable. Confounding factors abound. Complexity rules.

To import probabilistic models from closed systems into such contexts is to commit a category mistake. The structure of the model does not match the structure of the domain. This is not a minor methodological error but a foundational epistemological problem.

And yet, this extrapolation is widespread. The Central Limit Theorem and its associated tools are used to justify the use of normal distribution-based statistics in fields where assumptions are not just violated in practice, but in principle. The mapping of mathematical structure onto empirical structure is assumed, not demonstrated.

This assumption—namely, that if the conditions of the model hold, then the conclusions of the model will be valid—is itself an unproven conjecture. Even if independence, identical distribution, and finite variance could be shown to exist in the real world, it does not logically follow that the model’s outputs reflect the world. Only empirical testing can establish such a link.

The Replication Crisis and the Ill-Posed Nature of Inference in Open Systems

The replication crisis in psychology and biomedicine is often attributed to misuse of statistical tools, poor incentives, or flawed measurement. These factors may contribute. But they may also obscure a deeper problem: that the entire framework of inference is ill-posed. If the latter is the case, then surely the other potential deficiencies become irrelevant.

If the conditions for probabilistic reasoning in closed systems are not applicable to open systems, then the conclusions drawn from those models are likely to be incorrect. This is not a matter of technique but of foundational appropriateness. A tool misapplied is no longer valid.

The claim that model fitness cannot be known in principle and must be demonstrated empirically is logically grounded and consistent with long-standing distinctions in the philosophy of science between:

Internal validity (formal coherence): Whether a model is internally consistent or follows from its axioms. External validity (empirical applicability): Whether the model's structure and output correspond to the real-world system it purports to describe.

Even if all formal assumptions of a model (e.g., independence, identical distribution, large samples) are known to hold in a particular domain—a situation rarely achievable in practice—the claim that the model's output is accurate or informative about the world remains a conjecture. This is because:

The model itself does not contain empirical content—it is a set of mathematical relationships.

The transition from assumptions to applicability is not deductive; it is an inference from structure to function.

Therefore, fitness-for-purpose is not entailed by formal properties. It is an empirical hypothesis requiring testing against observation and outcome.

As a consequence, even a well-formed statistical model cannot be assumed to yield reliable results outside the context in which its performance has been empirically validated. This is not a minor caveat but a central epistemic constraint. It follows directly from the principle that models are representations—not the systems they represent. The map is not the territory.

Voices such as Nancy Cartwright have argued that statistical models must be tailored to the structure of the domain, not imported wholesale. Others, such as John Ioannidis, have drawn attention to institutional corruption and epistemic failure. But even these critiques often stop short of denying the applicability of the models themselves.

This essay makes a stronger claim: the fundamental tools may not be suitable at all. And when their assumptions are unverifiable or inapplicable, continued use constitutes an error not of execution but of epistemology.

Summary

Probabilistic reasoning works in closed systems because the structure of such systems supports the conditions required by the models. The events are enumerable, the conditions repeatable, and the assumptions often satisfied. But these features do not transfer to open systems. Applying statistical models to domains such as psychology, medicine, and social science requires assumptions that cannot be verified—and often do not even make sense—within those domains.

The belief that models proven valid in formal systems will yield valid results in empirical systems is a conjecture. It must be validated empirically and cannot be assumed from internal consistency alone. This is the central caution: coherence within a model does not imply truth about the world. That leap is not warranted by logic, only by evidence.

