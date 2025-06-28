Author's Preface

The Core Absurdity: Measuring the Inherently Unmeasurable

Some members of the soft sciences research community, and some statisticians possibly, have convinced themselves of an absurdity—that you can measure that which is inherently unmeasurable. By analogy with the concrete world, where you can count, you can compare dimensions, they think you can do the same thing with unstable psychological phenomena and social phenomena that you can't feel or touch or manipulate, and they've come up with methods of doing it that probably have very little validity, although they believe otherwise.

Proxies, Scales, and the Illusion of Order

To do so, they introduce proxies and operational definitions in Likert scales. They attempt to describe things that are contextual and unstable. Take, for instance, judgment—reaching conclusions—as in the common Likert scale of disagree to agree, with maybe five or seven points along the scale. But it's not really a scale, except by fiat, I guess. It's not measurement, although numbers can be assigned. The numbers are surrogates for categories. But they get treated as numbers. And it's asserted that there's inherent order to the categories.

But does this conform to internal order? Is there actually a linearity to this phenomenon of reaching conclusions? Do you have degrees of reaching conclusions, degrees of belief? I'm not sure that it makes sense. It makes sense to people who are enculturated in the paradigm. But when you look at it, it doesn't make a whole lot of sense—at best a vague sense. But it's quite clear that we're talking about categories that are unstable.

The Construction of a Survey Instrument: An Example

So you look at how you'd actually construct and use one of these things and respond to it. So let's say you decide that you want to assess human happiness, contentment, or well-being. So you say, okay, I'm a pretty smart guy. I'm going to come up with 20 questions, and we'll scale them with a Likert scale. And you think a bit, and you come up with 20 different assertions, and you ask respondents to indicate their level of agreement. I won't call it measurement—the indication of agreement.

Now, where did those assertions come from? Well, first of all, it's language. It's assumptions about the world, about the mental world, about how people work. And you try to put it into language. And a different individual might put it in a different language, and a different language might treat it totally differently. So already you're working with great ambiguity.

And when you come up with this set of 20 assertions, and then you feed it to your victims, and they look at it and they read the first assertion, they wonder, What the hell does this mean? And they say, Oh, well, maybe I know what it means. And then they look at the scale and they think, Okay, well, do I agree? Do I disagree? Do I strongly agree? I don't know, they think.

The Instability and Whimsy of Responses

The response is almost whimsical. And it varies from time to time, reading to reading, upon reflection. And there's nothing stable in the least about it. So they give an answer, and they wonder if they gave the right answer: I don't know if I really feel that way; maybe I need to revise it. So they rub it out and revise it. And they do that to all the items on the list.

So they come up with these very whimsical, arbitrary answers that have some vague—arguably a very vague—reflection on some underlying concept, which itself is rather vague and tenuous and may pertain to nothing in the human nervous system or the human mind.

From Arbitrary Labels to False Calculation

Then the researcher takes that and says, Well, there's some numbers here. I can do some computations. So they ignored all caveats that they were given in introductory stats about using non-parametric data in parametric statistics. And they assume, Well, they're numbers, so I can calculate with them. So they think that it makes sense to add them or subtract them or do any other mathematical operations. But they're not numbers. They're just responses tagged as numbers, labeled as numbers. In this case, they represent conclusions.

The Misuse of Probability and the Bell Curve

Well, don't ever get me started on statistics and the use of probabilistic models that may apply to dice rolls and similar closed systems, but arguably have nothing to do with human variability. They give theoretical models called distributions, which may work under some assumptions in dealing with those closed systems. In the long run—we will leave that undefined as to what the long run actually means—but to imply that these statistical distributions apply to human psychology or society—that's unproven, probably unprovable and more than unlikely for various reasons.

It's often asserted to be OK, though, by researchers and others who should know better. It's only a few brave scholars who cry out that the emperor has no clothes.

Enculturation into Statistical Faith

I was taught in statistics and psychology that the bell curve accurately describes all kinds of phenomena, and that's why we use it. And also that if I collected a bunch of data and graphed it, I would always see something approximating a bell curve. That became an article of faith among people teaching me statistics.

Graphing Numbers That Aren’t Numbers

We have a problem here, of course, when we're trying to graph numbers that aren't numbers. And that makes no computational sense. I guess you can still graph frequencies of categories.

So the claim would have to reduce to, We can take categories and graph their frequencies, and they'll come out as a bell curve in the long run, maybe, with repeated observations or with sufficient observations. And we seem to see that in some result, but maybe it's a helped with a little bit of sleight of hand, based on the manipulation of the data, I'm not sure how it all works.

Introduction

The Pretense of Measurement and the Failure of Method

This essay challenges the foundational assumption underlying vast areas of soft science research: that human thoughts, feelings, judgments, and attitudes can be meaningfully measured. Across psychology, education, and the social sciences, researchers routinely assign numbers to internal states, treat those numbers as if they represent quantities, and apply statistical models to them as if they were observations of a stable, physical system. This practice, widely accepted and institutionally reinforced, is not just methodologically weak—it is conceptually incoherent.

The central claim here is that most so-called measurement in these fields is not measurement at all. It is a linguistic performance: a process of taking verbal responses to vague prompts, assigning them numerical labels, and treating those labels as if they reflect real, measurable properties. Tools like Likert scales are presented as scientific instruments, but they are simply categorization devices that convert subjective impressions into pseudo-numerical form. These numbers do not correspond to physical quantities. They do not have defined units, equal intervals, or a true zero point. They are not measurements. They are placeholders—context-bound, unstable, and deeply interpretive.

Despite this, such data are treated mathematically. Researchers average them, graph them, and subject them to probabilistic analysis. They invoke the bell curve, assume normal distributions, and calculate statistical significance. But these operations rely on assumptions that do not hold. The phenomena being studied are not stable, not linear, and not reducible to units of anything. The appearance of rigor is achieved by adopting the external trappings of science while bypassing its internal standards.

This essay will follow the structure of the author’s preface precisely, elaborating on each point with expanded argument and grounded illustration. It will show how entire research programs have been constructed on a foundation of category errors—errors that convert language into numbers, numbers into data, and data into false certainty. The goal is not to nitpick poor applications of statistical method, but to expose a deeper failure: the belief that human variability can be domesticated through tools that were never designed for that purpose.

In doing so, the essay takes aim not just at flawed practice, but at an intellectual posture that refuses to confront its own assumptions. This is a critique of method at its root. It asks what it means to measure, what it means to quantify, and what happens when those terms are stretched beyond their limits. The answer, as will be shown, is that scientific appearance is achieved at the cost of epistemic integrity.

Discussion

The argument unfolds in several parts: (1) the core absurdity of measurement claims in soft sciences, (2) the role of proxy variables and operational definitions, (3) the instability and linguistic arbitrariness of survey-based responses, (4) the misuse of numbers and computation on category data, (5) the unwarranted extension of statistical models to unstable phenomena, and (6) the role of enculturation and institutional self-deception.

1. The Core Absurdity: Claiming to Measure the Unmeasurable

The preface opens by stating that some researchers in the soft sciences—and presumably others—have embraced the belief that one can measure that which is inherently unmeasurable. The analogy to physical dimensions like length or weight underscores the categorical error being committed: just because one can count chairs or measure rainfall, it does not follow that one can measure subjective states like belief, judgment, or emotional well-being in the same way.

Physical quantities have standards. A kilogram, a second, a liter—these are agreed upon, stable, and physically instantiated. Mental states are not. They are internally variable, context-sensitive, and linguistically entangled. To treat them as if they are simply more complex versions of weight or distance is to mistake metaphor for method.

What makes this absurdity dangerous is that it is not merely conceptual; it guides research, informs policy, and fills academic journals. Systems of research and publication have been built on the pretense that these intangible, unstable inner states can be quantified, stored, and manipulated as data.

2. Operational Definitions and Proxy Scales: The Likert Illusion

To make measurement appear possible, researchers use proxies. Chief among these are operational definitions, which redefine a concept in terms of how it is “measured.” For example, happiness might be defined operationally as the average score on a 20-item Likert scale questionnaire. But this is not a discovery; it is a tautology. The researcher simply declares that agreement with certain statements constitutes happiness.

Likert scales—a 5- or 7-point range from “strongly disagree” to “strongly agree”—are the most common method. These are not scales in the traditional sense. A physical scale measures things such as mass in known units. A Likert scale assigns numbers to subjective agreement levels, then assumes (without evidence) that the intervals between them are equal and that the responses are stable and meaningful.

This is a sleight of hand: words are assigned numbers, and the numbers are then treated as if they possess mathematical properties that the underlying words do not. The numbers serve as labels for categories, but they are interpreted as if they were measurements. This process is linguistic alchemy—turning vague judgments into pseudonumbers and treating them as data.

3. Instability and Whimsy: The Nature of Human Response

A key portion of the preface outlines the inherent instability of survey responses. The hypothetical example is a researcher creating 20 questions intended to assess happiness or well-being. These questions are constructed in language. But language is not neutral. It reflects the beliefs, assumptions, and worldview of the writer.

Moreover, when these items are presented to a participant—the term “victim” is pointedly used—the experience is one of uncertainty and second-guessing. A respondent may read a statement and wonder, “What does this mean?” They may think they understand, then reconsider. When they finally choose a response, it may reflect a momentary impression more than any settled attitude. Upon reflection, they may even erase and revise their answer.

This is not an incidental problem. It strikes at the heart of the validity of the method. If the same person can respond differently to the same item at different times—or doesn’t know what their response even means—then the response cannot be taken as a stable indicator of a latent trait. It is not reliable, not precise, and not repeatable. At best, it offers a vague and flickering glimpse into a mental process that is not well-defined and is not amenable to quantification.

The instability is not just temporal. It is conceptual. The underlying construct—like happiness—is itself vague, culturally shaped, and subject to redefinition across time, place, and language. The survey format pretends that it is possible to standardize this instability into fixed points of agreement. That pretense is not merely flawed; it is epistemically dishonest.

4. Numerical Abuse: Treating Labels as Numbers

The preface then describes a crucial leap—from response to computation. Once the participant has chosen their responses, the researcher assigns numerical values and proceeds to apply arithmetic operations. Totals are computed. Averages are drawn. Comparisons are made. And all of this is treated as if it were working with real numbers.

But the numbers assigned in Likert scales do not behave like real quantities. Adding "strongly agree" and "disagree" makes no conceptual sense. Averaging such responses assumes that the intervals between options are equal—which they are not. The pretense here is that the labeled categories are points along a linear continuum. In truth, they are fuzzy judgments with no known interval spacing, no known zero point, and no guarantee that one person’s “4” corresponds in any meaningful way to another person’s “4.”

Yet the field proceeds as if the data are valid. Computations proceed. Charts are drawn. Articles are published. The categorical nature of the responses—nominal or at best loosely ordinal—is ignored. The statistical methods used often require assumptions of interval data and normality, both of which are unjustified. The entire process amounts to computing with labels and pretending they are quantities.

This is not just a technical error—it is a conceptual failure. And it is widespread enough to have become the standard practice in many areas of psychology, education, and social research.

5. Distributions and Dice: Misapplied Statistical Models

The preface shifts focus to statistical modeling—specifically, the belief that probabilistic models, such as those involving bell curves, apply to human variability in the same way they apply to closed systems like dice rolls. But dice are simple, stable, physical systems. The probabilities involved are well-defined. The rules are known, the sample space is fixed, and the outcomes are discrete and limited.

Human behavior and mental states are none of those things. They are not discrete, not stable, not bounded, and not predictable in the same way. Yet statistical distributions like the normal curve are routinely applied to them. Researchers assert, often without evidence, that traits like intelligence, happiness, or satisfaction follow a bell curve. This becomes an article of faith. The model is assumed before the data are even examined.

In some cases, data are manipulated to fit the model. If responses are collected on a Likert scale, then frequencies of each response can be plotted. These frequencies may show a rough bell shape. But this is an artifact of the instrument and the cultural pressures to avoid extremes—not evidence of an underlying Gaussian distribution.

The reliance on distributions imported from physical systems is not innocent. It allows researchers to borrow the language and appearance of science while sidestepping its demands. The assumption that the central limit theorem guarantees normality in the “long run” is invoked, but no one defines what that means in practice. The result is a pseudo-statistical approach that creates the appearance of rigor while relying on unjustified assumptions.

6. Enculturation, Self-Deception, and Institutional Ritual

The preface concludes by noting that these practices are taught, accepted, and passed on without critical reflection. The author recounts being taught that the bell curve applies to psychological traits, with instructors pointing to graphs and asserting that bell-shaped curves appear reliably. This teaching becomes ritual. The tools are used because they are what everyone uses. The assumptions are accepted because they are what everyone assumes.

This is not science—it is tradition masquerading as method. The field becomes enculturated into a worldview where vague categories are treated as measurements, and where statistical procedures are applied by default. The researchers are not necessarily dishonest. They believe in what they are doing. But the belief system rests on a foundation that has not been questioned, and when questioned, often responds with dismissal rather than reconsideration.

Only a few dissenting voices point out the emperor has no clothes. And these voices are often marginalized, dismissed as troublemakers, or simply ignored. The system is built to reinforce itself. The illusion is stable because everyone agrees to pretend that it is not an illusion.

Methodological Drift

A final point, embedded throughout the preface but worth extracting explicitly, concerns the nature of drift—from observation to abstraction, from judgment to computation, from language to algebra. The fundamental problem is not merely with the tools, but with the category error involved in using them. The tools of measurement, quantification, and probability were developed for physical systems, where they work. They were never designed for measuring mental states or interpreting subjective expressions. When imported into psychology or sociology, they are not adapted—they are imposed. The result is not clarification but distortion.

Summary

Measuring the Unmeasurable and the Collapse of Method in Soft Science

This essay has examined and expanded in full detail the critique presented in the Author’s Preface: that research practices in the soft sciences, particularly in psychology and social science, rely on the pretense of measurement where no true measurement is possible. The argument unfolds around a central theme—that the tools used to quantify physical reality have been misapplied to unstable and linguistically mediated mental phenomena, resulting in a system that produces numbers without meaning and calculations without coherence.

At the foundation of the critique is the observation that psychological constructs such as belief, judgment, happiness, or agreement are not measurable in the same way that mass, temperature, or time are. They lack physical units, stable referents, and definable standards. To circumvent this, researchers introduce operational definitions and proxy instruments, particularly Likert-type agreement scales. These instruments assign numbers to verbal judgments but do not—and cannot—establish that these numbers correspond to real quantities. The numbers are arbitrary labels, not measurements.

Despite this, the numbers are processed mathematically, as though they were genuine quantities. Responses are summed, averaged, and subjected to statistical analysis, often using techniques that assume equal intervals and stable distributions. But the data do not meet these requirements. They are derived from language, interpreted subjectively, and fluctuate from moment to moment and person to person. The use of mathematical models—especially the assumption of bell curve distributions—is not grounded in the data itself but in institutional habits and unexamined tradition.

The essay emphasizes that this problem is not marginal or technical. It strikes at the core of the epistemic foundations of large swaths of soft science research. What is labeled as “data” often consists of vague, unstable, culturally variable responses to linguistic prompts. When such data are treated as if they were numerical facts about the world, the result is not knowledge, but a ritualistic simulation of science. The procedures are followed, the models are applied, and the conclusions are drawn—but the connection between the numbers and the human realities they claim to represent is weak, often nonexistent.

The essay further observes that the persistence of these practices is not due to deception or conspiracy, but to enculturation. Researchers are trained into a methodological tradition that rewards conformity and discourages foundational questioning. The system reproduces itself through education, publication, and peer review. Only a few dissenting voices raise the alarm that the system rests on incoherent assumptions—and those voices are rarely heeded.

In conclusion, the essay argues that much of what passes for measurement in psychology and the social sciences is better understood as a linguistic and cultural artifact, not a scientific achievement. The numbers produced do not measure internal states; they record how people respond to language under constrained conditions. Treating such data as if it were the output of a physical measuring instrument is not just mistaken—it is a category error that undermines the legitimacy of the conclusions drawn. Until this is acknowledged and addressed, the field will continue to generate outputs that are mathematically sophisticated but conceptually hollow.

Readings List

Each author below offers a deep, foundational critique of measurement, methodology, or conceptual structure in psychology and the social sciences—rejecting the prevailing paradigm, rather than attempting to repair it cosmetically.

Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88(3), 355–383. https://doi.org/10.1111/j.2044-8295.1997.tb02641.x

— Argues that psychology has never met the conditions for genuine measurement, and that its numerical methods are epistemologically unfounded. Gergen, K. J. (1985). The social constructionist movement in modern psychology. American Psychologist, 40(3), 266–275. https://doi.org/10.1037/0003-066X.40.3.266

— Rejects the idea of an objective, measurable inner life. Emphasizes that psychological concepts are linguistically and culturally constructed. Greenfield, T. B. (1986). The decline and fall of science in educational administration. Interchange, 17(2), 57–80. https://doi.org/10.1007/BF01807035

— A scathing critique of the misuse of scientific language and method in education and social science, calling out the fetishization of measurement. Smith, J. K., & Smith, L. M. (1991). The case for judgment-based evaluation. SAGE Publications.

— Critiques the reliance on standardized metrics in human evaluation. Argues for the irreducibility of subjective judgment. Coulter, J. (1979). The Social Construction of Mind: Studies in Ethnomethodology and Linguistic Philosophy. Macmillan.

— Demonstrates that our concepts of the mind are entirely language-bound and shaped by social context, not discovered by measurement. Porter, T. M. (1995). Trust in numbers: The pursuit of objectivity in science and public life. Princeton University Press.

— Explores how numerical reasoning became a surrogate for judgment, even in contexts where it does not fit. Especially relevant to psychology and policy. Lowe, R. (1998). Psychology as pseudo-science: Avoiding the grand delusion. The Psychologist, 11(12), 600–601.

— Directly challenges the idea that psychology is a science, pointing to its conceptual vagueness and methodological errors. Midgley, M. (1992). Science as salvation: A modern myth and its meaning. Routledge.

— Challenges the idea that science can explain or solve everything. Particularly critical of reductive models applied to human life. Cromby, J. (2012). Feeling the way: Qualitative clinical research and the affective turn. Qualitative Research in Psychology, 9(1), 88–98. https://doi.org/10.1080/14780887.2012.630831

— Critiques the quantification of subjective experience, arguing that language, affect, and embodiment are not reducible to numerical models. Shotter, J. (1993). Cultural politics of everyday life: Social constructionism, rhetoric and knowing of the third kind. Open University Press.

— Emphasizes that knowledge—including psychological knowledge—is constructed through language, power, and cultural practices. Offers a radical alternative to measurement-based epistemology. Feyerabend, P. (1975). Against method: Outline of an anarchistic theory of knowledge. Verso.

— Attacks the very idea of a single, rational scientific method. Suggests that what counts as knowledge is always local, interpretive, and pluralistic. Ryle, G. (1949). The concept of mind. Hutchinson.

— Dismantles the idea of the mind as a separate entity or system that can be observed or measured. Argues that most mental concepts are category mistakes. Illich, I. (1976). Limits to medicine: Medical nemesis—the expropriation of health. Marion Boyars.

— While focused on medicine, this book critiques the conversion of human conditions into measurable pathologies. The critique extends to all "soft" systems of knowledge that claim measurement without foundation. Polanyi, M. (1966). The tacit dimension. University of Chicago Press.

— Shows that all knowledge is rooted in tacit, unformalizable human judgment. Undermines the idea that psychological or social knowledge can be fully captured through formal models. Woolgar, S. (1988). Science: The very idea. Tavistock.

— Explores how science constructs its own objects of study. Applies especially to how psychology invents and reifies mental states as if they were empirical entities.

