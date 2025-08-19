Author’s Preface

This essay continues the Reason series, a set of reflections on the limits of formal systems and the role of language in human reasoning. Earlier essays in the series examined how arguments are constructed and read, how metaphysical assumptions about mind or logic distort our understanding, and how probabilistic reasoning in particular has been misapplied in open and unstable domains. The guiding theme throughout has been that reasoning, whether deductive, probabilistic, or statistical, is never a neutral mirror of reality. It is always mediated by language, shaped by human judgment, and bounded by the assumptions that give it form.

The present essay extends this line of critique into the field of statistics. Statistics is often treated as the practical branch of probability, a neutral tool that delivers objective conclusions from data. Yet on closer inspection, it inherits all the vulnerabilities of probability and multiplies them through its application to complex domains. What began as a mathematical language for closed, enumerable systems has been generalized into methods for medicine, psychology, economics, and beyond—domains where replication is limited, assumptions fragile, and outcome spaces unstable.

The aim here is not to deny the usefulness of statistics where its conditions hold, but to place its claims in proper perspective. The authority of mathematics lends statistics an aura of necessity, but this aura conceals its dependence on conventions, simplifications, and meta-assumptions about the world. Recognizing these limits is not a rejection of the discipline but an effort to restore clarity. Like the other essays in the Reason series, this one argues that tools of reasoning must be judged not by their elegance alone but by their fit with the world they claim to describe.

Introduction

Probability and statistics occupy a curious place in the history of ideas. They began as tools for analyzing closed, highly regular systems such as dice, cards, and coins—settings where outcomes were enumerable, replicable, and bounded. From this foundation grew a vast mathematical language of ratios, distributions, and asymptotic laws. Eventually, statistics emerged as an applied branch, extending probabilistic reasoning to data drawn from the complexities of the natural and social world.

This expansion carried both promise and peril. The promise was that by sampling data and applying probabilistic reasoning, researchers could generalize from the particular to the general, from the observed to the unobserved. The peril was that methods built for closed systems were now being applied to open, unstable, and context-dependent domains such as medicine, economics, ecosystems, and human behavior. Where replication fails, outcome spaces cannot be enumerated, and variability is extreme, the assumptions that make probability and statistics coherent may not hold.

The following discussion explores the emergence of statistics from probability, the meta-assumptions required to apply models to reality, and a series of conceptual and practical “gotchas” that reveal the limits of statistical reasoning. It also highlights the minority but serious view that many domains of interest are better described as chaotic rather than probabilistic, raising doubts about the universality of statistical inference.

Discussion

LANGUAGE AND THE WORLD

Origins of Language – Unknown, Unknowable, but a Human Invention

The origins of language remain fundamentally uncertain. Archaeology can recover tools, bones, and traces of habitation, but speech itself leaves no fossils. Genetics identifies changes in the human genome—such as the FOXP2 gene often linked to speech capacity—but cannot pinpoint when meaningful language began. Anthropology studies present-day hunter-gatherer groups and comparative primate communication, yet these provide only indirect analogies.

Scholars therefore place estimates for the emergence of language anywhere from fifty thousand years ago to over a million years ago, when early hominins may already have possessed rudimentary proto-languages. The absence of direct evidence means that the precise timing remains unknowable. What can be said with greater confidence is that language co-evolved with human cognitive and social development. It likely expanded alongside the growth of cooperative hunting, toolmaking, and symbolic practices such as burial and ritual. As humans became increasingly dependent on cooperation, language became the central mechanism by which cultural knowledge, tradition, and innovation were transmitted.

First Things First: Versatility of Language and the Organism

Language is rooted in neurological capacities—the ability to produce and interpret sounds, control vocal apparatus, and associate symbols with meanings. Yet once it emerged, it rapidly transcended biological necessity. Unlike fixed instinctual signals among animals, human language is open-ended. A finite set of sounds and symbols can generate an infinite variety of sentences, ideas, and narratives.

This versatility is seen in its many forms:

· Dialects and registers: regional varieties, social accents, and situational shifts in speech.

· Specialized vocabularies: scientific terminology, legal jargon, technical shorthand.

· Expressive arts: poetry, song, and literature that stretch language beyond ordinary description.

· Symbolic systems: mathematics, logic, and programming languages, which operate with the precision of language while functioning as abstract notational tools.

· Invented systems: constructed languages such as Esperanto or Klingon, created deliberately for communication or artistic effect.

No other known species exhibits anything approaching this range. Animal communication—whale song, bee dances, or primate calls—remains bound to specific functions such as mating, foraging, or alarm. Human language, by contrast, is flexible enough to be bent toward almost any representational purpose.

Language as Representation, Deceit, and Nonsense

At its core, language is a representational system: a means of mapping signs onto aspects of the world. Words can denote physical objects (“tree”), processes (“running”), abstractions (“justice”), and even imaginary or counterfactual states (“a unicorn with wings”). This capacity for displacement—referring to things not present or not real—is what allows communication to extend far beyond the immediate here and now.

Representation, however, is selective. To describe is always to simplify. A map of a city, no matter how detailed, leaves out countless features; so too with language. When describing a landscape, one may emphasize its mountains and rivers but ignore its insects or microclimates. Every act of naming or describing both reveals and conceals. This selectivity is essential to communication, yet it also ensures that no linguistic representation can ever capture reality in its entirety.

Beyond factual description, language can represent:

· Fictional aspects: myths that explain origins, novels that imagine alternative lives, or speculative thought experiments.

· Deceptive aspects: propaganda designed to mislead, false testimony in court, or casual lies in daily life.

· Nonsensical aspects: utterances that sound coherent but lack semantic content, from playful riddles to deliberate obfuscation.

The striking feature is that humans can exchange all these forms—truthful or false, coherent or incoherent—without always recognizing their status. A well-phrased fabrication may sound as plausible as a fact. Advertising often exploits this, using carefully chosen words that suggest more than they prove. Religious or political rhetoric can mobilize entire populations through symbols whose factual basis may be uncertain or wholly invented. At the same time, nonsense uttered by some folks or words used by poets can feel meaningful, even when the literal sense is absent.

This dual capacity—truth and falsehood, clarity and illusion—reveals both the power and the vulnerability of language. It enables the transmission of knowledge across generations but also leaves communities open to manipulation, misunderstanding, and confusion.

Language, the Larger Implication

Language is not a neutral tool for labeling things that already exist. It actively shapes human perception, thought, and social reality. Through language, categories are formed—such as “nation,” “law,” or “marriage”—that structure institutions and guide behavior. Different languages segment reality differently: one may have dozens of words for snow, another none at all. These differences influence not only how the world is described but also how it is understood.

Crucially, the ability of language to mislead is inseparable from its ability to convey truth. A courtroom oath, a scientific paper, and a piece of propaganda all rely on the same symbolic resources. The very mechanism that allows humans to affirm reality also allows them to distort or deny it.

Thus, language is the primary medium through which humans are anchored in reality while simultaneously able to construct alternative, illusory, or imaginative worlds. It is what enables shared understanding of physical survival—how to build a shelter or find food—but also collective ventures into myth, ideology, and fantasy. In this sense, language is the condition of human culture itself: both its stabilizing foundation and its source of endless invention.

MATHEMATICS, LANGUAGE AND THE PLATONIC FALLACY

Others Demystifying Language Beyond Platonism

Eugene Wigner spoke as a physicist trying to understand the relation between mathematics and the world. Others—such as Nancy Cartwright, Murray Gell-Mann, and various linguists, sociologists, and anthropologists—have also offered alternative perspectives. Their explanations differ, but none rely on Platonism.

Mathematics and the Illusion of Platonic Forms

Mathematics allows us to describe forms, relations, and patterns: to count, classify, sort, and delineate boundaries. In geometry, it postulates idealized figures—points, lines, planes, circles—used to describe aspects of space and experience.

The Platonic tradition, however, treated such objects as if they existed in a special, independent realm. In this view, a circle drawn in sand is only an imperfect copy of the perfect Circle that exists beyond space and time. This idea of forms shaped philosophy, theology, and science for centuries, and traces of it persist today when mathematical entities are spoken of as though they have independent existence rather than being linguistic or symbolic constructions.

Reification and the Platonic “Third Realm”

Reification occurs when abstractions are treated as concrete entities. Speaking of mathematics as if it belongs to a separate “realm”—neither physical nor mental—is an example. This supposed Platonic realm is incoherent: it is never specified what kind of existence it would involve. It is simply an illusion produced by language, not evidence of another order of being.

Language as the Ground for Mathematics

Mathematics is, at its base, a language: a system of signs and rules used to represent aspects of the world—or, at times, purely hypothetical constructions. Its strength lies in internal consistency and its ability to map onto real patterns. Sometimes mathematics reflects reality well; other times it generates structures that are self-referential, internally coherent, but detached from any external application. Probability, statistics, and other formalisms exemplify this: they do not require a Platonic foundation but operate as symbolic systems humans devised to represent and analyze.

Coherence Versus Truth

Not all mathematical or linguistic formulations describe the world accurately. Some are coherent within themselves yet disconnected from experience; others are incoherent but persist under the authority of tradition or technical formalism. Internal coherence within a system does not guarantee truth about the world.

The Mystery Remains – Language or Platonism

Even without invoking a Platonic realm, the relationship between mathematics, language, and the world remains puzzling. Why does mathematics so often succeed in capturing features of physical reality? Why does a symbolic system created by humans align so effectively with the behavior of the universe? These remain open questions, but they do not require positing a metaphysical “third realm.” It is enough to see mathematics as a human construction that sometimes matches, and sometimes diverges from, the patterns of the world.

Nothing Platonic, Just Language in All Its Mystery

This leads back to the central point: there is no need for a Platonic realm to explain the apparent power of mathematics or probability. These are human constructions—maps, not territories—useful when they approximate regularities of the natural world, but not metaphysical necessities.

LANGUAGE AND MATHEMATICS

Mathematics: Failure and Limits of Language

Language fails often, and mathematics fails as well. Models break down, probability schemes mislead, and formalisms get applied where they do not belong. Recognizing that mathematics is a form of language makes these failures unsurprising. It situates mathematics within the broader human enterprise of symbol-making and representation rather than outside it.

Dispelling the Mystery of Mathematics

Once mathematics is seen as language, the supposed “mystery” of its power dissolves. The puzzle arises only when mathematics is treated as metaphysical rather than linguistic. Clear thinking removes the need for Platonic realms or appeals to transcendence. The real question is practical: when does a particular formalism help make sense of the world, and when does it mislead?

Mathematics as a Sublanguage

Mathematics is a constrained sublanguage: precise, rule-bound, and specialized. Its vocabulary and syntax reduce ambiguity compared to ordinary language, but both rely on symbols arranged under rules to represent aspects of the world. Mathematics counts, classifies, sorts, and delineates boundaries. It constructs geometric forms, postulates structures, and provides systematic descriptions.

Yet it has often been mistaken for more than language. In the Platonic tradition, mathematical objects were treated as eternal forms existing in a special metaphysical “third realm.” A perfect circle or line was imagined as existing beyond space, time, matter, and mind. This Platonic residue still influences thinking today. But the Platonic realm is incoherent. To reify mathematical forms as metaphysical entities is to confuse representation with reality.

The Illusory Specialness of Mathematics

The sense of mathematics being “special” comes from mistaking a linguistic tool for an ontological reality. From this perspective, the so-called “unreasonable effectiveness” of mathematics is no more mysterious than the success of a well-crafted metaphor or a map. Both capture relevant aspects of reality without implying access to an independent metaphysical realm.

There is nothing uniquely mysterious about mathematics’ ability to represent the world. Its effectiveness, like that of language generally, lies in how well symbols fit patterns of experience. When the fit fails, the formalism is revised or abandoned. The long-standing philosophical habit of treating mathematics as uniquely transcendent rests on a confusion. Seen as language, the supposed mystery evaporates.

MODELS AND TOY WORLDS AND MAPS

Mathematics Gives an Unusually “Tight” Form of Reasoning

Take a simple example:

· 12 eggs cost $6

· 1 egg costs $(6 ÷ 12)

· 6 eggs cost $(6 ÷ 12) × 6 = $3

The conclusion feels inescapable. To deny it would require strange intellectual contortions, such as claiming that six is not half of twelve or that division does not apply here. This is why mathematics seems unusually “tight”: once the rules are grasped, the reasoning unfolds with a sense of inevitability.

Yet this inevitability depends on a shared symbolic system. Arithmetic is not self-explanatory—it must be taught, learned, and understood. Children do not discover long division spontaneously; they are initiated into the rules of the game. Once internalized, the system gives the impression of being obvious, even though it is not natural in the way that walking or speaking in everyday language is natural.

Toy Worlds and Idealization

Part of this “tightness” comes from the fact that mathematics often works in toy worlds—artificially simplified situations where assumptions are stripped down to essentials. The egg problem assumes:

· Perfect divisibility of cost.

· No variation in egg size, quality, or seller’s whim.

· A linear, proportional relation between number and price.

These assumptions may not hold in the real world, where bulk discounts, broken eggs, or variable pricing complicate matters. But within the toy world, the conclusion is unavoidable. Toy models like this illustrate how mathematics gains its strength: by ignoring irregularities, it isolates a neat structure that can be reasoned through without ambiguity.

Maps and Representation

Mathematical models, like maps, highlight some aspects of reality while ignoring others. A road map shows highways and intersections but leaves out trees and fence lines. Similarly, the egg-price calculation captures proportionality but not spoilage, supply, or inflation. The power of mathematics lies in this selective representation: it allows reasoning to be carried out cleanly on the model, with results that can then be projected back onto the world.

The danger is forgetting the selectivity. When a toy model is mistaken for reality itself, mathematics begins to look like revelation rather than representation. But it is always a map, never the territory. Its tightness is a property of the language and the rules, not a guarantee that the world will always conform.

The Learned Nature of Mathematical Systems

Another aspect often overlooked is that even these seemingly “obvious” systems require cultural scaffolding. The decimal system, the use of symbols like “12” or “÷,” and the convention of assigning values to currency are historical inventions. Ancient societies used other numeral systems, from Roman numerals to Mayan vigesimal notation, and their methods of calculation varied widely. What feels self-evident today is the result of centuries of refinement in symbolic systems.

Thus, mathematics provides tight reasoning not because it taps into eternal truths, but because it is a carefully constructed and standardized sublanguage. Once taught, it gives an impression of inevitability. But behind that inevitability lies the artifice of toy models, the selectivity of maps, and the human work of symbol-making.

LANGUAGE, MATHEMATICS, AND PROBABILITY

The Odd Branch of Mathematics: Probability Theory

Probability is an unusual branch of mathematics. It originated in the analysis of games of chance—dice, cards, coins—where outcomes were bounded and enumerable. Probability was defined as the ratio of favorable outcomes to total possible outcomes. This worked because such systems are stable, replicable, and constrained.

Over time, probability was extended to continuous systems, formalized in distributions, and developed into statistics—the applied discipline of inference. Yet probability retained distinctive features:

· It does not predict individual events.

· It relies on long-run frequencies.

· It presumes repeatability of trials.

Interpretations diverged:

· Frequentist: probability is the long-run relative frequency of events. This fits the original games-of-chance context and is grounded in observable counts.

· Bayesian/subjective: probability is a “degree of belief” assigned to propositions, constrained by axioms but ultimately psychological. This conflates numbers with states of mind and is conceptually problematic.

· Propensity (Popper and others): probability reflects tendencies or dispositions of physical systems to produce outcomes. This is metaphysically ambitious but lacks empirical traction.

The frequentist view, grounded in repeatable long-run frequencies, remains the only interpretation that avoids conflating numbers with psychology or invoking metaphysical tendencies.

Probability’s Dependence on the Natural World

Mathematical models are not free-floating abstractions. Probability theory arose because observable events could be counted, classified, and abstracted. The authority of probabilistic models depends on their tether to real-world patterns, however imperfect. Without the natural world as a reference, probability theory would have no basis or meaning.

Probability: Limits of Language in Description

Language never captures the whole of reality; it always selects, frames, and simplifies. Probability, as a formalized sublanguage, inherits these constraints. It can provide sharp predictions within carefully defined systems, but it cannot encompass the complexity of open, unstable, real-world domains.

Probability theory is therefore not a neutral mirror of reality but a humanly constructed framework, dependent on choices about boundaries, categories, and events. Its apparent objectivity conceals the interpretive work that goes into its application. Many researchers—and even statisticians—rarely acknowledge the epistemological, ontological, and linguistic assumptions underlying probability.

Applied probability always requires decisions: where to draw system boundaries, what counts as an event, and how to measure or classify outcomes. Events occur in space and time, yet they must be abstracted and simplified before being counted. This necessity reflects a general fact about language: it describes aspects of the world but never the world in its entirety.

Several core issues emerge:

· Probability is not a property of the world itself. It is a linguistic system for representing aspects of the world under constrained conditions.

· Numbers in probability codify decisions. To say “the probability is 0.5” is not to state an intrinsic fact of nature but to report a conceptual choice embedded in a model.

· Human judgment is unavoidable. Every assumption—what to count, how to define independence, what to ignore—shapes the model. The elegance of the mathematics does not remove this interpretive scaffolding.

· Applicability is conditional. Probability succeeds when variability is bounded, replication possible, and systems closed. It fails in open, unstable, non-replicable settings.

The conclusion is clear: probability is not a universal language of uncertainty but a narrow formalism with limited range. Its successes in dice and cards should not be mistaken for proof of its applicability everywhere. To treat it as metaphysical truth is to confuse a human construction with the reality it sometimes manages to describe.

Assumptions, Idealizations, and Axioms in Models

Probability models are not neutral mirrors of reality. They rest on assumptions—independence of trials, identically distributed variables, fairness of dice, stability of distributions. Idealizations are routine: we deliberately simplify, supposing conditions are more regular than they are, for the sake of tractability.

Which assumptions are chosen is not dictated by the world but by human decision. Some are treated as axioms, placed beyond questioning. This reveals the linguistic and judgmental core of probability: it is a way of speaking about the world, framed by conventions rather than necessity.

The Role of Idealization – “Just Suppose”

An idealization is a simplification: “suppose it works this way—even though we know it does not.” Dice are assumed to be perfectly fair, coin tosses perfectly symmetric, trials independent and identically distributed. These are not descriptions of reality but hypothetical constructs. They make problems solvable while sacrificing fidelity. That such models often “work well enough” does not erase their artificiality.

Hidden Assumptions in Probability

Probability is usually treated as a neutral, technical tool. Yet beneath computations lie unacknowledged assumptions:

· Epistemological: what can be known.

· Ontological: what exists and what counts as an event.

· Linguistic: how language and symbols partition the world.

These foundations are rarely articulated in applied work. Researchers often proceed as if probability were self-explanatory, when it rests on unexamined commitments.

Assumptions and Arbitrariness

Every model requires assumptions: definitions of events, rules of independence, distributions to represent variation. Out of many possible assumptions, a few are selected and treated as axioms. This arbitrariness is concealed by formalism, which gives an illusion of necessity where there is only judgment and convention.

Drawing Boundaries and Defining Systems

Applied probability requires a defined system: a bounded slice of the world where events are counted and classified. This means deciding what belongs inside or outside, and how events are individuated. A coin toss, for example, is conventionally reduced to “heads or tails,” though in reality it involves countless physical variables—spin, air currents, edge landings—excluded by agreement.

These boundary choices are not dictated by nature but imposed through description. The way a system is framed shapes the conclusions drawn from it.

Probability in Closed and Open Systems

Variability as a Precondition for Probability

Probability only has meaning when variability is present but limited.

· No variability: If outcomes are certain, probability collapses. A hammer always cracks a walnut; a stone always falls. Deterministic description suffices.

· Extreme variability: If outcomes are unstable, unbounded, or context-dependent, probability also fails. Chaotic dynamics, feedback loops, and sensitivity to initial conditions prevent replication or reliable enumeration.

· Limited variability: Probability works best where uncertainty is genuine but bounded. Dice, coins, and cards exemplify this: finite outcomes, near-perfect replication, and enumerable event spaces.

Probability’s Origins in Simple Systems

Historically, probability arose from games of chance. Pascal, Fermat, and Huygens defined probability as ratios of favorable to possible outcomes in dice, cards, and coin tosses. These contexts provided clear outcome spaces and stable replication, giving probability its initial foothold.

Events as Constructions

In theory, probability treats events as discrete and countable. In reality, events are processes in space and time, described differently depending on conventions. For example, a “disease case” may be defined by lab test, symptoms, or hospitalization—each producing different counts, and therefore different probabilities. What counts as an “event” is not dictated by nature but by linguistic and conceptual choice.

Closed vs. Open Systems

The decisive distinction is between closed and open systems:

· Closed systems: bounded, stable, and enumerable (dice rolls, lotteries, simple mechanical setups). Probability applies because events can be replicated, counted, and stabilized.

· Open systems: unbounded, unstable, and context-dependent (economies, ecosystems, medicine, human behavior). Replication fails, outcome spaces cannot be fully enumerated, and instability dominates.

Many statisticians and applied researchers extend probability to open systems as though universality were guaranteed. Critics such as Nancy Cartwright and Murray Gell-Mann argue this is untenable. Probability’s apparent universality is an illusion: a language devised for closed systems misapplied to domains where its preconditions do not hold.

Replication as the Foundation

Replication underlies probability. Long-run frequencies require repeating experiments under the same or nearly the same conditions. Dice and coin tosses allow this; human behavior, economies, and medical cases do not. Where replication fails, probability statements become conventions of modeling rather than empirical descriptions.

Probabilities, Limits, and Long-Run Behavior

A central fact about probability is that it does not predict individual events. The outcome of a single coin flip is unpredictable; what probability constrains is the aggregate pattern across many flips. In the frequentist view, probability is defined as the limit of relative frequencies as the number of trials increases. The law of large numbers formalizes this idea: as the sample grows, observed frequencies converge toward the expected ratio. Probability is therefore tied not to metaphysical necessity but to empirical regularities that emerge asymptotically in repeated trials.

From Ratios to Real Numbers

Originally, probability was expressed as ratios in discrete systems such as dice or cards: favorable outcomes divided by possible outcomes. Over time, it was generalized to continuous domains, requiring real numbers and calculus. Probability distributions became the core framework, describing the relative likelihood of outcomes across continuous or infinite spaces. Probability thus expanded from simple combinatorial counts to functions defined over domains far beyond the reach of direct enumeration.

The Key Point of Probability – Formalism and Application

Probability is unusual among mathematical branches because it straddles two domains:

· As formalism, it is pure mathematics—axioms, theorems, distributions.

· As application, it is a representational tool for describing uncertainty in the world.

The danger lies in conflating the two. Mathematical elegance does not guarantee empirical validity. Probabilistic models succeed only when their assumptions align with reality; when those assumptions fail, the models collapse. This dual character makes probability both powerful in closed, well-structured settings and fragile when extended to unstable or ill-defined domains.

STATISTICS

Probabilities and the Emergence of Statistics

Probability began as a way of reasoning about closed systems such as dice, cards, and coin tosses. In these cases, the set of possible outcomes could be clearly listed and replicated under nearly identical conditions. Probabilities were first expressed as ratios: favorable outcomes divided by total possible outcomes. This worked because the systems were bounded, stable, and enumerable.

Over time, these ratios were generalized into equations and probability distributions, expanding probability into a full mathematical language. Statistics emerged by applying this framework to real-world data. It offered tools such as estimation, regression, and hypothesis testing, later branching into more complex models. The central promise was that patterns observed in samples could be used to draw conclusions about larger, unseen domains.

But this promise depends on a leap. Methods developed for closed, replicable systems were extended into open systems—medicine, economics, ecosystems, human behavior—where outcomes cannot be clearly enumerated or stably replicated. Here, probability theory is applied to contexts very different from the games of chance that originally defined it. Some scholars, including Nancy Cartwright and Murray Gell-Mann, have argued that such systems may be better described as chaotic—sensitive to initial conditions, unstable, and non-replicable—rather than as probabilistic. This is a minority view, but it highlights a key point: probability’s success in closed systems does not guarantee its validity in open ones.

Meta-Assumptions

Every statistical model rests not only on assumptions inside the model but also on meta-assumptions—claims about whether the assumptions actually hold in the world.

1. Positive meta-assumption: If the assumptions do match reality (for example, if data really are independent and identically distributed), the model will give explanatory and predictive success.

2. Negative meta-assumption: If the assumptions do not match reality, the model will not work.

Both are fallible. Models may succeed for reasons unrelated to their stated assumptions, or fail even when the assumptions seem plausible. No formalism guarantees its own fit with reality; that fit must always be demonstrated.

Illustrative “Gotchas”

1. Central Limit Theorems in Medicine

The Central Limit Theorem (CLT) states that when enough independent, identically distributed samples are taken, their averages tend to follow a bell-shaped “normal” curve. This is often invoked to justify applying standard statistical methods. Yet in real medical data, patients are not identically distributed: age, genetics, lifestyle, and countless other factors introduce instability. Treating such heterogeneous data as if it came from a single, stable distribution can produce misleading results, creating false confidence in trial outcomes.

2. Frequentist Inference and Hypothesis Testing in Psychology

Classical, or “frequentist,” inference is organized around the null hypothesis. The null hypothesis is a formal statement that there is no effect or no difference. For example, in a therapy study, the null would state that patients given the new treatment improve no more than those given no treatment at all. From this assumption, statisticians construct a theoretical probability distribution describing the outcomes one would expect if the null were true.

This distribution is not what researchers actually want to know. In practice, investigators want to know whether the therapy works and, if so, how strong the effect is. That distribution—the one reflecting the “true” state of the world—is not available. Instead, the method asks a different question: If the null hypothesis were correct, what is the probability of observing data at least as extreme as those we obtained? This probability is reported as the p-value.

Here lies the central mismatch. The procedure produces a probability of the data given the null hypothesis, yet what researchers seek is the probability of the hypothesis given the data. The two are not equivalent. By design, the statistical framework substitutes a contrived calculation about data under a no-effect assumption for the very question that motivated the study. The result is a method that is mathematically consistent but conceptually misaligned with the aims of empirical research.

3. Significance Testing in Economics

From this procedure comes the widespread device of significance testing. Researchers often declare a result “significant” if p < 0.05, meaning the observed data would be very unlikely if the null hypothesis were true. But “significant” here means only “unlikely under a no-effect assumption.” It does not mean the result is important, causal, or practically relevant. In economics, a trivial change in household spending may meet the statistical threshold while being meaningless in real terms.

4. Effect Size in Drug Research

What matters most in applied domains is not whether a result is “significant,” but how large and meaningful the effect is. A drug trial might produce results that are statistically significant, yet show only a 2% reduction in symptoms—clinically negligible. Traditional statistics often sidelines effect size, treating it as secondary rather than central.

5. Sampling in Political Polling

Statistical inference assumes that a sample can represent a population. But this depends on the sample being genuinely representative. If polling systematically underrepresents certain groups—such as rural voters, younger voters, or minorities—the generalization fails. The mathematics of inference cannot rescue conclusions built on biased data.

6. Judgment Calls in Climate Science

Statistical models do not “speak for themselves.” They require decisions: which variables to include, how to define them, which interactions to model, which feedback loops to omit. In climate science, different modeling choices often yield different projections. The divergence reflects the judgments built into the models, not purely objective truth.

7. Ordinal Data in Education Research

Many fields in the social sciences, including education research, rely on ordinal data—information that can be ranked but not measured in a conventional sense otherwise. A familiar example is the Likert scale, which ranges from “strongly disagree” to “strongly agree.” The responses can be ordered, but the gaps between them are not measurable in the way that physical distances or weights are.

Despite this, researchers often perform computation: calculate averages and other statistics on such data as if the categories were interval data, where the steps between values are assumed to be equal and meaningful. This creates an illusion of numerical precision. For instance, moving from “neutral” to “agree” is unlikely to represent the same psychological or experiential shift as moving from “agree” to “strongly agree.” In fact, it may be incoherent to speak of “distance” at all in this context, since the categories are linguistic labels, not measurements of a continuous quantity.

Treating ordinal responses as though they carried interval properties gives results that look rigorous but rest on shaky conceptual ground. What appears as exactness is often nothing more than the artifact of forcing numbers onto categories that were never numerical to begin with.

8. Bayesian Normalization in Epidemiology

Bayesian methods attempt to improve on frequentist inference by allowing prior knowledge to be combined with data. Yet the procedure requires specifying all possible hypotheses and assigning them probabilities, which are then normalized to sum to one. In fields such as epidemiology, this is unrealistic. The list of possible hypotheses is indefinite—new virus variants, new interventions, new environmental factors can always arise. To pretend the hypothesis space is complete may be mathematically convenient but conceptually unsound.

Probability or Chaos? Two Views of Real-World Systems

Statistics extends probability from closed, enumerable systems into open, unstable ones. Its authority comes from the precision of mathematics, but that precision conceals fragility. In closed systems like dice rolls, probability works flawlessly. In open systems—economies, ecosystems, medicine, psychology—its assumptions often fail. Without bounded outcomes and replicability, statistical inference can become more rhetorical than descriptive, projecting the authority of mathematics onto systems it cannot truly capture.

The Mainstream View: Probability as a Universal Framework

The dominant position in science and applied research is that probability provides a general framework for reasoning under uncertainty. From this standpoint, statistical inference is a universal tool:

· In medicine, randomized controlled trials are analyzed with probability models to estimate drug effects and risks.

· In psychology, survey responses and behavioral data are fitted into distributions to test hypotheses about cognition and behavior.

· In economics, probabilistic models underpin risk analysis, forecasts, and market behavior.

· In climate science, probabilities are used to predict the likelihood of storms, droughts, or long-term temperature change.

The assumption behind this view is that all systems can, at least in principle, be represented as probabilistic: that outcome spaces exist, even if vast; that replication is meaningful, even if approximate; and that randomness and regularity can be combined within distributions. In this view, statistics is a universal science of inference, extending probability from dice games to every domain where uncertainty arises.

The Minority View: Many Domains Are Chaotic, Not Probabilistic

A contrasting view, articulated by scholars such as Nancy Cartwright, Murray Gell-Mann, Edward Lorenz, Benoît Mandelbrot, and C.S. Holling, questions this universality. They argue that many real-world domains are not probabilistic systems stretched thin but qualitatively different kinds of systems: open, unstable, sensitive to initial conditions, and marked by sudden, nonlinear change.

· Nancy Cartwright emphasizes that probabilistic models only work in “nomological machines,” environments that have been engineered or stabilized to make assumptions hold. Outside these protected conditions, the fit between models and the world collapses.

· Murray Gell-Mann highlights that complex adaptive systems, from economies to ecosystems, are driven by feedback loops and emergent behavior, often more fruitfully studied through the lens of complexity theory than probability.

· Edward Lorenz, in meteorology, showed that weather systems are chaotic: deterministic but unpredictable beyond short horizons. Probability can describe near-term ensembles, but not long-term stability.

· Benoît Mandelbrot, in finance, revealed that market returns often follow “fat-tailed” distributions, meaning extreme events are far more common than classical probability predicts. Treating them as probabilistic in the conventional sense underestimates volatility and risk.

· C.S. Holling, in ecology, described how ecosystems undergo sudden shifts—lakes flipping from clear to eutrophic, forests collapsing after pest outbreaks—changes not well captured by probabilistic smoothness but by concepts of resilience and tipping points.

The Core Contrast

· Mainstream probability/statistics assumes that uncertainty in the world can be captured through distributions, inference, and long-run frequencies. Its strength lies in closed systems, controlled experiments, and engineered environments.

· The minority view argues that many real systems are fundamentally not like dice games. They cannot be replicated in controlled fashion, their outcome spaces are undefined, and their behavior may be chaotic rather than probabilistic. For such systems, statistics is not universal but parochial—useful where assumptions fit, misleading where they do not.

Implication

This contrast reframes statistics not as a universal science of inference but as a specialized language. Its range of validity is narrower than its practitioners often claim. Where the world is bounded, stable, and enumerable, statistics shines. Where the world is open, unstable, and nonlinear, other conceptual tools—chaos theory, complexity studies, or narrative and case-based reasoning—may describe reality more faithfully.

Summary

Probability and statistics, though powerful, are not universal solvents for uncertainty. Probability theory developed in the narrow setting of closed systems where outcomes are discrete, bounded, and repeatable. Statistics extended this framework to open systems, where assumptions of stability, replication, and enumerability often fail.

Statistical models rest not only on internal assumptions but also on meta-assumptions about how well those assumptions fit reality. When these align, models can yield predictive and explanatory success. When they do not, results are fragile or misleading. Illustrative problems include overuse of the Central Limit Theorem in unstable domains, the conceptual mismatch of null hypothesis testing, reliance on arbitrary significance thresholds, neglect of effect size, unwarranted generalizations from biased samples, heavy dependence on judgment calls in model building, the misuse of ordinal data as though it were numeric, and the questionable premise in Bayesian statistics that all hypotheses can be enumerated and assigned probabilities.

The broader lesson is that statistics is a humanly constructed language, not a mirror of reality. It works well in carefully bounded situations but can become misleading when extended to systems that are unstable, non-replicable, or chaotic. Its authority stems from mathematical precision, but that precision conceals fragility. Recognizing the limits of statistics does not diminish its practical value; rather, it situates it properly—as a specialized set of tools, useful when its assumptions fit, and unreliable when they do not.

Annotated Readings by Theme with Notes

1. Statistical Critiques and NHST

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997

Cohen exposes the logical weakness of significance testing. He shows that reliance on p < .05 creates a false sense of discovery, while what researchers truly want—effect sizes and practical importance—remains unaddressed. This links directly to the essay’s critique of statistical “gotchas,” where numerical thresholds obscure the actual reasoning process.

Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33(5), 587–606. https://doi.org/10.1016/j.socec.2004.09.033

Gigerenzer argues that null hypothesis significance testing (NHST) has become a ritual. Researchers apply it mechanically without considering its assumptions or limitations. His analysis connects to the essay’s claim that statistical formalism often substitutes for genuine inference, producing spurious certainty.

Meehl, P. E. (1990). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66(1), 195–244. https://doi.org/10.2466/pr0.1990.66.1.195

Meehl demonstrates that psychological research produces vast literatures of statistical findings that do not converge on truth because the underlying models are underspecified. This ties to the essay’s discussion of open systems, where probabilistic inference fails when the assumptions are not stable.

Michell, J. (1999). Measurement in psychology: A critical history of a methodological concept. Cambridge University Press.

Michell critiques the assumption that psychological constructs are measurable in the same way as physical quantities. This supports the essay’s point about ordinal data being treated as interval data, creating a “veneer of precision” where none exists.

Yarkoni, T. (2020). The generalizability crisis. Behavioral and Brain Sciences, 43, e1. https://doi.org/10.1017/S0140525X20001685

Yarkoni highlights how statistical inference collapses when applied to non-representative samples and unstable constructs, creating results that cannot generalize. This directly parallels the essay’s concern that probabilistic reasoning assumes closed systems but is misapplied in open, drifting domains.

Note: These readings collectively show that NHST and allied practices are not just minor technical flaws but systematic misapplications of probability. They reinforce the essay’s critique that statistics, when built on unstable constructs, generates spurious certainty.

2. Open-System and Chaos Critiques

Altmann, E. G., Portela, J. S. E., & Tél, T. (2012). Leaking chaotic systems. Physical Review Letters, 111(14), 144101. https://doi.org/10.1103/PhysRevLett.111.144101

Altmann and colleagues show that chaotic systems do not conserve probability in the way classical models assume, undermining statistical inference. This illustrates the essay’s point that open systems “leak” and resist probabilistic representation.

Boffetta, G., Cencini, M., Falcioni, M., & Vulpiani, A. (2001). Predictability: A way to characterize complexity. Physics Reports, 356(6), 367–474. https://doi.org/10.1016/S0370-1573(01)00025-4

This work demonstrates that predictability is sharply limited in complex systems. It connects with the essay’s claim that many real-world domains are chaotic rather than probabilistic, highlighting the fragility of inference in unstable conditions.

Lorenz, E. N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20(2), 130–141. https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2

Lorenz’s classic study of weather chaos shows that small differences in initial conditions make long-term predictions impossible. This is foundational for the essay’s contrast between closed systems (dice) and open systems (weather, ecosystems).

Mandelbrot, B. B. (2004). The (mis)behavior of markets: A fractal view of financial turbulence. Basic Books.

Mandelbrot argues that financial systems follow fractal, fat-tailed distributions rather than Gaussian curves. His work reinforces the essay’s point that probabilistic models often underestimate extreme events in open systems.

Smith, L. A. (2007). Chaos: A very short introduction. Oxford University Press.

Smith provides a clear exposition of why chaos theory challenges statistical reasoning. His treatment grounds the essay’s assertion that some systems are chaotic rather than probabilistic in character.

Taleb, N. N. (2020). Statistical consequences of fat tails: Real world preasymptotics, epistemology, and applications. STEM Academic Press.

Taleb argues that fat-tailed distributions invalidate classical statistical tools such as the law of large numbers in practical domains. His analysis strengthens the essay’s case that probabilistic reasoning is not universally valid.

Velasquéz, T. (2009). Chaos theory and fractal analysis applied to financial markets. International Journal of Financial Studies, 7(4), 55–67.

Velasquéz shows how financial markets exhibit chaotic rather than probabilistic structure, further undermining classical models of risk. This supports the essay’s claim that many domains resist probabilistic treatment.

Note: These works show that instability, nonlinearity, and fat tails break the foundation of classical probability. They illustrate the minority view that chaos, not probability, better describes many open systems.

3. Modeling Limits and Causality

Box, G. E. P., & Draper, N. R. (1987). Empirical model-building and response surfaces. Wiley.

Box and Draper famously note that “all models are wrong, but some are useful.” This underscores the essay’s point that models depend on assumptions and must be judged by their fit to reality, not their elegance.

Freedman, D. A. (2010). Statistical models and causal inference: A dialogue with the social sciences. Cambridge University Press.

Freedman emphasizes that causal inference requires more than statistical association. This ties directly to the essay’s critique of treating statistical inference as if it captured causal truth.

Oreskes, N., Shrader-Frechette, K., & Belitz, K. (1994). Verification, validation, and confirmation of numerical models in the earth sciences. Science, 263(5147), 641–646. https://doi.org/10.1126/science.263.5147.641

This paper shows that models cannot be fully “verified” but only tested in limited ways. This connects to the essay’s claim that meta-assumptions about applicability are unavoidable.

Peters, O. (2019). The ergodicity problem in economics. Nature Physics, 15(12), 1216–1221. https://doi.org/10.1038/s41567-019-0732-0

Peters shows that averaging across possible states (ergodicity) is often mistaken for predicting the trajectory of a single system. This problem reinforces the essay’s critique of long-run probability reasoning in non-replicable systems.

Saltelli, A., et al. (2020). Five ways to ensure that models serve society: A manifesto. Nature, 582(7813), 482–484. https://doi.org/10.1038/d41586-020-01812-9

Saltelli and colleagues argue that models often overreach, misleading policy when assumptions are hidden. Their critique fits the essay’s theme that inference depends on fit, not elegance.

Weisberg, M. (2013). Simulation and similarity: Using models to understand the world. Oxford University Press.

Weisberg shows that models work by similarity, not by identity, with reality. This links to the essay’s discussion of models as selective representations, shaped by language and choice.

Note: These readings reinforce the essay’s critique of assumptions and meta-assumptions. They show that models fail when transplanted into domains that lack stability or ergodicity.

4. Philosophy of Science and Probability

Cartwright, N. (1983). How the laws of physics lie. Oxford University Press.

Cartwright argues that scientific laws only hold in specific contexts, not universally. This supports the essay’s view that probability is not a timeless structure but a local tool.

Cartwright, N. (1999). The dappled world: A study of the boundaries of science. Cambridge University Press.

Cartwright extends her critique, showing that science is patchwork, not seamless. This strengthens the essay’s claim that probability applies in narrow contexts.

Funtowicz, S., & Ravetz, J. (1993). Science for the post-normal age. Futures, 25(7), 739–755. https://doi.org/10.1016/0016-3287(93)90022-L

They argue that uncertainty and value-ladenness undermine “normal” science. This links to the essay’s theme that statistics conceals assumptions.

Gell-Mann, M. (1994). The quark and the jaguar: Adventures in the simple and the complex. W. H. Freeman.

Gell-Mann highlights complexity and emergence. His work supports the essay’s contrast between probabilistic and chaotic systems.

Hacking, I. (1975). The emergence of probability. Cambridge University Press.

Hacking traces the historical origins of probability as a human invention. This supports the essay’s framing of probability as a language.

Hacking, I. (2009). Scientific reason. Cambridge University Press.

Hacking examines the structures of reasoning in science. This grounds the essay’s claim that probability is a reasoning tool, not an ontological reality.

Keynes, J. M. (1921). A treatise on probability. Macmillan.

Keynes presents probability as logical relations among propositions, not objective frequencies. This links to the essay’s critique of probability as a metaphysical truth.

Polanyi, M. (1966). The tacit dimension. Doubleday.

Polanyi shows that knowledge always includes tacit, unformalized elements. This reinforces the essay’s point that statistical practice is saturated with judgment calls.

Wigner, E. P. (1960). The unreasonable effectiveness of mathematics in the natural sciences. Communications in Pure and Applied Mathematics, 13(1), 1–14. https://doi.org/10.1002/cpa.3160130102

Wigner’s famous essay treats mathematics as mysteriously effective. This contrasts with the essay’s counterclaim that effectiveness is a byproduct of language fit, not metaphysics.

Note: These works locate probability in philosophy and history. They frame it as contingent, local, and socially shaped—echoing the essay’s claim that statistics is a linguistic tool, not a universal science.

5. Language, Cognition, and Social Construction

Douglas, M. (1986). How institutions think. Syracuse University Press.

Douglas shows that social institutions structure thought. This supports the essay’s claim that statistical practices are collective conventions.

Hayakawa, S. I. (1949). Language in thought and action. Harcourt, Brace & Co.

Hayakawa emphasizes that language frames perception. This parallels the essay’s framing of probability as linguistic representation, not objective fact.

Jaynes, E. T. (2003). Probability theory: The logic of science. Cambridge University Press.

Jaynes advocates Bayesianism as an extension of logic, showing how probability is language-like reasoning. His work ties directly to the essay’s theme of probability as linguistic rather than ontological.

Lakoff, G., & Núñez, R. (2000). Where mathematics comes from: How the embodied mind brings mathematics into being. Basic Books.

They argue that mathematics arises from embodied metaphor and cognition. This supports the essay’s claim that probability is constructed, not discovered.

Latour, B. (1987). Science in action: How to follow scientists and engineers through society. Harvard University Press.

Latour shows how science is a social practice of persuasion. This connects to the essay’s view that statistical conventions are stabilized by communities, not metaphysics.

Note: These works emphasize the linguistic and cognitive basis of probability. They reinforce the essay’s central claim that statistics is a human construction, not a window onto a Platonic realm.

6. Quality Control and Applied Statistics

Shewhart, W. A. (1931). Economic control of quality of manufactured product. D. Van Nostrand.

Shewhart insists that stability must be established before applying probability models. His work grounds the essay’s critique of applying probability indiscriminately, showing that even in industry probability is conditional, not universal.

Note: Shewhart’s emphasis on stability anticipates the essay’s argument that probabilistic reasoning only works in closed systems.