Modern research relies heavily on statistics and the expectation that findings can be repeated. These ideas are often treated as routine foundations of scientific work. This discussion examines the limits of those assumptions, focusing on stable experimental setups, the meaning of probability, and the growing concerns raised by replication failures across many fields.

The goal is to explore how probability depends on clearly defined situations, why stable conditions matter, and why ongoing debates about statistical practice continue among specialists.

On the Likely Controversy

Debates about the limits of statistics and replication often trigger strong responses because they reach into the basic habits of modern research. In many areas, statistical testing is a routine requirement for publication, funding, and career progress. Entire research traditions, training programs, and professional paths rely on these methods. As a result, raising questions about their proper role can seem like challenging the legitimacy of large portions of scientific work.

Within statistics and the philosophy of science, however, such disagreements are long-standing. For many decades, specialists have debated what probability really means, when statistical methods are appropriate, and how far conclusions can be extended beyond tightly controlled conditions. Different schools of statistical thought openly disagree, and concerns about replication, bias, and the misuse of statistical tools appear regularly in academic writing. For experts, the existence of these disputes is not surprising.

The tension arises because everyday research practice often presents statistical results as settled and dependable, while specialist discussions portray a more cautious and divided picture. When ideas from those specialist debates are expressed in plain language, they can sound radical to audiences used to treating statistical findings as routine evidence. This gap between daily practice and methodological debate helps explain why such ideas may seem controversial in many research settings while remaining familiar to scholars who study the foundations of statistics and scientific method.

Personal Positioning

My graduate training began in experimental psychology and later expanded into self-study in philosophy and the philosophy of science. This perspective has been shaped by many decades of effort. Similar criticisms have been raised by philosophers of science, statisticians, and mathematicians, and comparable arguments have likely been expressed by others with deeper expertise.

Distinction Between Chance and Probability

Chance refers to the basic fact that some events occur more often, some less often, some never occur, and some are unavoidable. Probability is the numerical way of describing and measuring those patterns of chance.

Classical Probability as Structured Counting

Classical probability starts with situations that have clear rules and well-defined boundaries. There is a known set of possible outcomes, and those outcomes can be listed and counted. Because the situation is structured in this way, arithmetic can be used to work out how often different results should occur. These calculations are often expressed as odds. In this approach, probability depends on the way the situation is set up and on how the outcomes are counted, rather than being something that exists independently in the world.

Conditions Required for Probability

If the situation changes, the probabilities change, and the way they are measured changes as well. In fact, only some situations can even be treated in terms of probability. These require clear setups, limited variation, defined outcomes, and rules that can be repeated under similar conditions.

Such situations also include decisions about what counts and what does not. If a die falls off the table, the result is usually ignored. If a coin lands on the floor, a decision has to be made: does it count, or is it not a legitimate move? These choices show that probability depends on how events are defined and counted. Anyone claiming that probability is simply a built-in feature of the universe has some explaining to do.

Variability and Scientific Laws

All measurements show some variation, even in cases that appear highly regular, such as Ohm’s law. That law works only within a particular setup where it applies, and even there small differences appear because conditions can shift slightly. It applies only within certain limits, such as specific kinds of conductors, temperature ranges, and voltage ranges. Within those boundaries it serves as an approximation. In such a carefully controlled setup, results from one trial to the next are very similar, so the small remaining variation can often be ignored.

Gravity provides another example. The familiar equation applies most cleanly in a vacuum. In everyday conditions, air resistance, wind, and other factors affect the measured speed of falling objects. When these outside influences are removed, the variation becomes extremely small. These examples show that even physical laws operate within constrained setups where variability exists but may be minimal.

Nomological Machines (Stable Setups)

The philosopher Nancy Cartwright uses the term “nomological machine” to describe a stable, clearly defined experimental or observational setup. The term may sound technical, but it simply refers to the kind of controlled situation already discussed: a setup in which certain conditions are kept fixed so that outcomes can be counted and compared.

If the setup changes, the way probability is calculated changes as well. In other words, the “nomological machine” changes. Probability therefore applies only within a particular stable setup. The idea of such a setup assumes that some factors are treated as constant and that outcomes and events are counted in agreed ways. At the same time, some variation must remain; without variation there would be certainty, and probability would no longer be needed.

Probability Applies to Events in Stable Systems

Probability always refers to events and to the outcomes that follow from those events. Some events happen quickly, others unfold slowly, but the idea remains the same: probability concerns what happens when events occur and how often different outcomes appear.

This leads to the idea of a stable system. Although the word “stable” can sound technical, the meaning is simple. Stability means that the important features of a situation remain the same from one occasion to the next. When those essential conditions stay consistent, probability can be applied in a meaningful way. Current research practice often tries to apply probability to situations that are not stable, and this creates serious difficulties.

Probability only makes sense within the limits of a clearly defined setup with known characteristics. It requires variability, but the variability must be limited and controlled. It also requires defined outcomes and clearly described events that produce those outcomes. In other words, there must be rules governing what happens and how results are counted.

A stable setup can be run once, run again, and run many times while remaining the same in its essential features. Of course, no situation is ever perfectly identical from one trial to the next; some variation always remains. Stability simply means that the most important characteristics do not change over time. This makes repeated trials possible and allows similar results to appear again when the setup is repeated.

Debate About Probability in Unstable Systems

Some statisticians and philosophers argue that statistical methods should not be applied to unstable situations. In fields such as the social sciences and economics, however, statistical analysis is still widely used despite the lack of stable and clearly defined setups. Among certain mathematicians and philosophers of science, the view is that statistical frequencies cannot be meaningfully applied to situations where outcomes cannot be clearly listed, where conditions cannot be described in detail, and where a proper stable setup cannot be created.

From this perspective, the foundations of statistics—stable setups, predictable conditions, and clearly defined outcomes—are often missing in large areas of modern research. A number of scholars have therefore questioned whether current statistical techniques are appropriate in these contexts. Many researchers who rely on statistical methods do not specialize in the foundations of statistics, which contributes to ongoing debate about the reliability of findings in fields such as psychology and biomedical research.

Competing Interpretations of Probability

There is tension, even among modern statisticians, about what probability actually means. Some Bayesians treat probability as subjective belief, which is presented here as nonsensical. Frequentist approaches, in contrast, focus on counts—how often outcomes occur and how frequently different counts appear. From these counts, the patterns of outcomes are described. Historically, attention was first given to the set of possible outcomes in a defined situation. Later, the focus shifted to how those outcomes appear over many repeated trials. This became the standard frequentist idea of the long run: after many, many trials, what overall pattern appears and what probabilities emerge? In this sense, the long-run view expands earlier ideas about simple odds rather than replacing them.

Even within the frequentist camp, disagreements are common. There are several distinct versions of frequentist thinking. One early approach is associated with Fisher. Another is linked to Neyman and Pearson, who disagreed strongly with Fisher. A later approach, often called null hypothesis significance testing, blends ideas from both traditions. Many people use this hybrid method without examining its origins, while some mathematically oriented critics reject it and object to it frequently. The debates are often technical and difficult to follow without advanced training. Even those trained in applied statistics may find many of these arguments hard to understand.

From an outside perspective, the disputes can appear so deep that the reaction becomes, “a pox on all their houses.” Confidence in statistics as a unified discipline becomes difficult, and despite its wide use, there is no complete agreement among statisticians about what counts as proper practice.

Meanings of “Distribution”

The word “distribution” is often used in a general sense, but in statistics it has several distinct meanings that are easily confused, especially by readers without technical training. It can refer to a table of numbers that lists how often outcomes occur. It can also refer to a plot or graph that shows those numbers visually. In other cases, it refers to real-world data that have been collected from observations or measurements. Finally, it can refer to a theoretical curve or formula that represents how outcomes are expected to behave in an idealized situation.

Internal Disputes Within Frequentism

Frequentist statistics is often presented as a single, unified approach, but in practice it includes several competing traditions. One early approach is associated with Fisher, who developed methods focused on measuring how surprising a result would be if nothing unusual were happening. Another approach was developed by Neyman and Pearson, who emphasized rules for decision-making, such as setting clear standards for when to accept or reject a claim. These two traditions did not agree with each other, and their disagreements shaped much of twentieth-century statistical thinking.

Modern research practice often uses what is called null hypothesis significance testing, which blends ideas from both traditions. This hybrid approach became widely adopted in many scientific fields, often without much attention to its mixed origins. As a result, everyday statistical practice draws from methods that were originally developed within competing frameworks.

Because of these different traditions and their unresolved disagreements, there is no complete consensus among statisticians about the best way to use frequentist methods. The appearance of unity in textbooks and research papers can therefore hide a long history of debate within the field itself.

Probability Distributions from Repeated Events

Probability distributions emerge when many events are observed and their outcomes are recorded over time. As the number of events grows, patterns begin to appear in how often different outcomes occur. These long-run patterns form the basis of the frequentist view of probability.

Working out the numbers behind these patterns can begin with simple counting, but the arithmetic can quickly become complicated when many outcomes and combinations are involved. For readers without strong mathematical training, these calculations can become difficult and confusing.

Replication Crisis as Evidence of Limits

The difficulty of repeating many research findings has raised doubts about how probability is being used in areas where conditions are unstable. Some scholars argue that these failures cannot be explained only by poor methods or by using the wrong statistical tools. Instead, they suggest that the deeper problem may be the assumption that probability applies in these situations at all. According to this view, treating such domains as suitable for probability may itself be a mistaken assumption.

Risk Factors vs Probabilities

In medicine and public health, the language of “risk factors” is often presented in a way that sounds like probability. Statements about increased risk appear to suggest that the chance of a future outcome has been measured in a precise and reliable way. However, some mathematicians and statisticians draw a sharp distinction between simple frequency counts and true probability. From this stricter perspective, the numbers used to describe risk factors are not probabilities in the classical sense.

Under a strict definition, probability grew out of carefully defined situations such as games of chance. In those settings, the rules are clear, the possible outcomes are known in advance, and the same setup can be repeated again and again. Dice, cards, and coins are standard examples because the boundaries are fixed and the outcomes can be counted in a precise way. When outcomes are counted within such a controlled setup, probability has a clear meaning.

Risk factors in medicine are very different. They usually come from observing large populations over time and counting how often certain outcomes appear. For example, a group of people with one characteristic may develop a disease more often than a group without that characteristic. These counts can be useful and informative, but they are not created within a tightly controlled and repeatable setup. Real people live in changing environments, experience many influences at once, and cannot be treated as identical copies of one another.

Because of this complexity, some scholars argue that these numbers are better understood as frequency summaries rather than true probabilities. They describe how often something has happened in a particular set of observations, but they do not arise from a stable system with clearly defined outcomes and repeatable conditions. From this viewpoint, presenting risk factors as probabilities may give a stronger sense of certainty than the underlying evidence can support.

This distinction is central to criticism of modern medical research. If probability only makes strict sense within stable and repeatable setups, then applying the language of probability to complex and changing human populations becomes controversial. According to critics, this does not mean the data are useless, but it does mean that the numbers should be interpreted with caution and with a clear understanding of what they truly represent.

Skepticism About Replication in Soft Sciences

Attention then turns to fields such as the social sciences, psychology, medicine, and economics. Replication is frequently attempted in these areas, yet the underlying situations are rarely stable or even closely similar from one study to the next. The number of unknown influences is large, interactions between variables are complex, and many important factors are difficult to define or measure clearly. Under these conditions, strong claims about replication become highly doubtful.

The issue here is not statistical process control in tightly managed industrial settings. The concern is the use of statistical reasoning in complex and unstable environments. Some mathematicians, statisticians, and philosophers of science have argued that theory itself predicts serious difficulties in applying probability under such conditions.

To claim replication in these fields is to claim that the underlying setup—the stable system that produces the results—is similar enough across studies to allow meaningful comparison. Yet in many cases, most of the important features of that setup are unknown or only partly understood. Claims of replication therefore rest on assumptions that may be difficult to justify.

Those who argue that replication is possible in these domains carry the burden of explaining how the essential conditions remain sufficiently similar across studies. Even when repeated studies appear to produce similar results, it remains an open question whether this reflects genuine similarity of conditions or chance alone.

In contrast, in simple situations such as rolling dice, the setup can be treated as similar enough across trials to allow meaningful repetition. The conditions are never perfectly identical, but they are close enough in their essential features to support the idea of replication. In complex human and biological systems, the ability to make this claim is far less clear. Without demonstrating that the essential conditions remain sufficiently similar, statements about replication risk becoming empty or unsupported.

Burden of Proof on Replication Claims

Claims of successful replication raise an important question: how strong is the evidence that replication truly works in unstable research settings? The replication crisis has led some observers to argue that the evidence for reliable replication in such conditions remains weak. When the underlying conditions shift from study to study, the very foundation needed for replication may be unstable.

This places a burden of proof on those who claim that replication has occurred. It is not enough to report similar results; there must also be a convincing case that the essential features of the setup remained sufficiently similar across studies. Without this, apparent agreement between studies may reflect coincidence rather than genuine repetition of the same conditions.

Under this view, some reported successes at replication could be the result of chance rather than evidence that the same underlying process has been reproduced. Distinguishing true replication from coincidence therefore becomes a central challenge in fields where research conditions are difficult to stabilize.

Critique of Statistical Practice

A common criticism is that statistical methods are often used in ways that resemble “cargo cult statistics.” In this view, the techniques are applied because they are expected or required, rather than because the underlying conditions truly justify their use. Statistical tools may be used most heavily in areas where their suitability is uncertain.

Large numbers of research papers rely on statistical methods in fields where the conditions are complex and difficult to stabilize. The mere availability of a tool does not guarantee that it is appropriate for every situation. According to this perspective, widespread use can create the appearance of rigor even when the underlying assumptions are weak.

The replication crisis is often cited as supporting evidence for this concern. If probability methods depend on stable and repeatable conditions, then their heavy use in unstable research settings raises serious questions about how well they apply.

Doubts About Scientific Incentives and Bias

Concerns are often raised about how incentives and funding may influence research. When careers, grants, and institutional goals depend on producing positive results, questions arise about whether this pressure can shape the way studies are designed, conducted, and reported. In areas where large financial interests are involved, such as biomedical research, these concerns become especially prominent.

From this perspective, the reliability of evidence supporting replication and successful use of statistics is questioned. Some observers argue that the quality of this evidence may be lower than commonly assumed, and that institutional pressures can make it difficult to assess research findings with full confidence.

Summary

This discussion examined the relationship between probability, stable experimental setups, and the practice of replication. Probability was presented as a numerical way of describing patterns of chance that arise within clearly defined and repeatable situations. These situations require rules, defined outcomes, and limited variation. When the setup changes, the probabilities change as well.

The idea of stable systems—sometimes called nomological machines—was used to explain why probability works well in tightly controlled settings but becomes controversial in complex and changing environments. Even well-known physical laws operate within constrained conditions, where small variations exist but are often negligible. In less controlled fields, the essential conditions may be unknown or constantly shifting.

Debates within statistics show that there is no single agreed meaning of probability. Different traditions offer competing interpretations, and modern practice often blends ideas from these traditions. At the same time, failures to replicate research findings have raised doubts about the use of probability and statistics in unstable domains such as the social sciences and medicine.

The discussion also highlighted criticisms of statistical practice, including concerns about misuse, incentives, and the difficulty of distinguishing true replication from coincidence. Overall, the central theme is that probability and replication depend on stable, repeatable setups, and their application becomes increasingly uncertain when those conditions are not present.

