Reason: Misapplication of Ordinal Scales
I'm taking a closer look at ordinal scaling and its use in various contexts, which seems endemic but misapplied. I believe I can find scholars to support me in this view by appropriate cherry picking.
Introduction
Measurement is a fundamental requirement in science, engineering, and practical decision-making. Yet, not all forms of measurement are equal, nor are they equally meaningful. Among the systems of classification proposed in the last century, the ordinal scale remains one of the most misunderstood and systematically abused. Though S. S. Stevens proposed a neat taxonomy of four types of scales — nominal, ordinal, interval, and ratio — the distinctions he outlined have not been fully appreciated in practice. Worse, his schema conceals internal problems that become evident under scrutiny.
Stevens’ system has been normalized in psychology, sociology, business research, and even everyday consumer interactions. However, the ordinal scale sits in a precarious position: numbers are assigned to categories to show order, yet these numbers are not quantities. Researchers and practitioners, seduced by the apparent familiarity of numbers, treat them as if they were quantities, applying arithmetic and statistical tools inappropriately. This essay examines these errors systematically, showing how ordinal scales, though ubiquitous, are deeply problematic — not only in application but even in their underlying conceptual foundations.
The Taxonomy of Scales: Where Ordinal Fits and Where It Breaks
Stevens’ classification of scales is often treated as a settled matter:
Nominal: Labels, categories with no order (e.g., species, blood types).
Ordinal: Ordered categories, but undefined intervals (e.g., rankings).
Interval: Ordered with defined intervals but no true zero (e.g., Celsius temperature).
Ratio: Ordered, defined intervals, and an absolute zero (e.g., mass, length, Kelvin).
For nominal and ratio scales, the distinctions are reasonably clear. Interval scales, too, have internal consistency, as intervals are meaningfully equivalent even without a true zero. However, ordinal scales occupy a twilight zone. Order is acknowledged, but the magnitude of difference between items is undefined. This creates ambiguity: the scale asserts direction (more or less), yet offers no valid way to compute differences, sums, or means. Worse, many ordinal scales implicitly suggest magnitude, even though they deny explicit quantification.
This gray area is rarely confronted directly. Stevens’ scheme papers over the issue by treating ordinal scales as a distinct category without admitting their internal heterogeneity. In truth, ordinal scales are not uniform. Some types assert magnitude implicitly; others do so only metaphorically. Some are purely arbitrary. The scheme lacks the granularity required to handle these distinctions.
Proposed Subtypes of Ordinal Scales: More Complex Than Advertised
There is not a single, unified form of ordinal data. Rather, ordinal scales seem to span several subtypes, each with distinct characteristics regarding magnitude and stability:
Ordinal by Fiat (Military Ranks):
Created by institutional assignment. Military ranks (private, corporal, sergeant, etc.) establish an order by authority, not measurement. They are nominal categories forced into order by decision, not discovered by measurement. Magnitude exists only metaphorically — "higher" rank implies superiority in command, not measurable quantity.Ordinal by Judgment (Service Ratings, Academic Grades):
Subjective judgments expressed through order. For example, "poor," "fair," "good," "very good," "excellent." These imply magnitude, yet intervals are undefined. Whether the gap between "poor" and "fair" equals the gap between "good" and "very good" is unknowable. The assignment of numbers (1 to 5) disguises the instability of judgment behind a veil of numerical regularity. See: Appendix A - Hotel Guest Satisfaction SurveyOrdinal by Felt State (Pain Scales):
Internal, subjective experiences reported numerically (e.g., pain ratings from 1 to 10). Here, magnitude is more than metaphorical — it is felt as real. One feels "less" or "more" pain, yet the scale lacks precision. The intervals between ratings are volatile, varying by moment, memory, or interpretation. Pain is notoriously unstable and poorly understood, rendering such scales epistemically fragile. See: Appendix B - Pain Measurement and the Instability of Ordinal ScalesOrdinal by Subjective Probability ("Almost certain," "Likely," "Possible"):
Degrees of belief or forecasted likelihood. While mathematical probability is a formal system, subjective probability is a felt estimate. Here, numbers assigned to beliefs (e.g., 10% chance, 70% chance) are guesswork. The Bayesian claim that these numbers reflect degrees of belief is problematic, as human estimates are unstable and swayed by framing, emotion, or context. The assertion of magnitude is implicit but unreliable.
All of these subtypes rely on ordinal ordering, yet they differ significantly in their conceptual structure. Some carry an implicit sense of magnitude (pain, subjective probability), while others carry only metaphorical or formal sequence (military ranks). This variation within ordinal scales demonstrates that Stevens’ schema is too coarse to handle the complexity of real-world application.
The Illusion of Arithmetic and the Category Error of Aggregation
The core error arises when ordinal symbols — mere rank indicators — are mistaken for numerical quantities. This mistake occurs because ordinal scales employ familiar numerals ("1," "2," "3," etc.), leading to a false sense of quantitative rigor. The visual familiarity of numbers tempts users into applying arithmetic operations as if the data were interval or ratio scale.
Statistical techniques exacerbate this illusion. While rank-based tests such as Spearman’s rho are legitimate for ordinal data, researchers frequently apply parametric methods, such as Pearson’s correlation or mean computations, assuming that intervals are equal. This is a clear category error.
Even more severe is the appeal to the law of large numbers, which presupposes arithmetic quantities. Ordinal data consists of numerals, not numbers in the mathematical sense. Aggregating ordinal data does not resolve instability or imprecision; it amplifies the original conceptual error. There is no "law of large numerals." The belief that statistical aggregation rescues ordinal data from its non-numerical foundation is a fundamental misunderstanding.
Instability and Whimsy: The Epistemic Fragility of Ordinal Measures
Ordinal ratings are highly unstable, particularly when they rest on internal states or subjective judgments. Responses to ordinal scales vary over time due to:
Fluctuations in mood or health.
Social desirability bias.
Poor understanding of scale items.
Changes in context or memory distortion.
Pain ratings exemplify this instability. A patient might rate pain as "2" in the morning and "7" in the afternoon, with no objective means of calibration. Service satisfaction scores suffer similar problems, especially when customers lack clear criteria for evaluation.
Further instability arises from the demand characteristics of surveys. Respondents often adjust answers based on perceived expectations or to avoid offending others. In small business contexts, for example, individuals may inflate ratings to avoid harming reputations, regardless of actual sentiment. Likert scales are a common expression of this theme, and they are highly problematic conceptually and in use.
Such data inhabits an epistemic gray zone: part feeling, part metaphor, part guesswork. It is uncertain whether such ratings possess predictive validity or merely reflect transient impressions. Assertions of predictive power for aggregated ordinal data remain unproven and arguably unfalsifiable.
Convenience, Software, and Institutional Inertia
The normalization of ordinal misuse is perpetuated by convenience. Survey tools and statistical packages expect numeric input. Users, prioritizing ease over conceptual rigor, comply with these expectations by treating ordinal categories as numbers. Convenience, however, is not justification. It explains practice but does not defend it.
Institutional inertia compounds the problem. Fields such as psychology and social sciences have long employed ordinal scales as though they supported arithmetic operations. These habits have calcified into orthodoxy, rarely questioned by practitioners trained to follow established protocols.
The presence of software does not convert ordinal data into interval data. Convenience merely cloaks category errors beneath layers of automation.
Summary
The misuse of ordinal scales reveals a deep conceptual malaise in research practice. Ordinal scales, while useful for expressing order, do not support arithmetic treatment. The presence of numerals does not imply numerical validity. Attempts to apply arithmetic or rely on aggregation are category errors of the first order.
Moreover, Stevens' taxonomy, though pioneering, lacks sufficient granularity to account for the varied nature of ordinal scales. Some ordinal measures imply magnitude (pain, subjective probability), while others carry only metaphorical or institutional meaning (military ranks). The internal instability of these measures, combined with external pressures of convenience and software design, has led to the endemic misuse of ordinal data across disciplines.
Correction requires a clear-eyed acknowledgment of these conceptual flaws and a return to rigor in the treatment of measurement scales. Without such reform, data practices will continue to propagate meaningless numbers dressed in the garb of quantification — a masquerade of measurement.
Readings
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103(2684), 677–680. https://doi.org/10.1126/science.103.2684.677
Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88(3), 355–383. https://doi.org/10.1111/j.2044-8295.1997.tb02641.x
Jamieson, S. (2004). Likert scales: how to (ab)use them. Medical Education, 38(12), 1217–1218. https://doi.org/10.1111/j.1365-2929.2004.02012.x
Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education, 15(5), 625–632. https://doi.org/10.1007/s10459-010-9222-y
Carifio, J., & Perla, R. J. (2007). Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert scales and Likert response formats and their antidotes. Journal of Social Sciences, 3(3), 106–116. https://doi.org/10.3844/jssp.2007.106.116
Appendix A - Hotel Guest Satisfaction Survey
A 10-item questionnaire using scaling, ordinal scaling, as one would find on a hotel satisfaction survey.
Hotel Guest Satisfaction Survey
Please rate the following aspects of your recent stay at our hotel. For each item, select the response that best reflects your experience.
Scale:
1 — Very Dissatisfied
2 — Dissatisfied
3 — Neutral
4 — Satisfied
5 — Very Satisfied
Items:
Cleanliness of your room upon arrival
[ ] 1 — Very Dissatisfied
[ ] 2 — Dissatisfied
[ ] 3 — Neutral
[ ] 4 — Satisfied
[ ] 5 — Very SatisfiedComfort of your bed and bedding
[ ] 1 — Very Dissatisfied
[ ] 2 — Dissatisfied
[ ] 3 — Neutral
[ ] 4 — Satisfied
[ ] 5 — Very SatisfiedFriendliness of the hotel staff
[ ] 1 — Very Dissatisfied
[ ] 2 — Dissatisfied
[ ] 3 — Neutral
[ ] 4 — Satisfied
[ ] 5 — Very SatisfiedSpeed of check-in process
[ ] 1 — Very Dissatisfied
[ ] 2 — Dissatisfied
[ ] 3 — Neutral
[ ] 4 — Satisfied
[ ] 5 — Very SatisfiedQuality of food and beverage service
[ ] 1 — Very Dissatisfied
[ ] 2 — Dissatisfied
[ ] 3 — Neutral
[ ] 4 — Satisfied
[ ] 5 — Very SatisfiedNoise level in your room
[ ] 1 — Very Dissatisfied
[ ] 2 — Dissatisfied
[ ] 3 — Neutral
[ ] 4 — Satisfied
[ ] 5 — Very SatisfiedOverall cleanliness of hotel facilities
[ ] 1 — Very Dissatisfied
[ ] 2 — Dissatisfied
[ ] 3 — Neutral
[ ] 4 — Satisfied
[ ] 5 — Very SatisfiedValue for money of your stay
[ ] 1 — Very Dissatisfied
[ ] 2 — Dissatisfied
[ ] 3 — Neutral
[ ] 4 — Satisfied
[ ] 5 — Very SatisfiedAvailability of hotel amenities (gym, pool, etc.)
[ ] 1 — Very Dissatisfied
[ ] 2 — Dissatisfied
[ ] 3 — Neutral
[ ] 4 — Satisfied
[ ] 5 — Very SatisfiedOverall satisfaction with your stay
[ ] 1 — Very Dissatisfied
[ ] 2 — Dissatisfied
[ ] 3 — Neutral
[ ] 4 — Satisfied
[ ] 5 — Very Satisfied
Notes:
This is typical industry formatting.
The scale looks "numerical" because it uses numbers, but as analyzed in your essay, this is an ordinal scale, not interval or ratio.
Often, practitioners will compute a mean of these scores — a mathematical error, as discussed.
Many systems include an optional comment box below the items, which I can also add if desired.
Appendix B - Pain Measurement and the Instability of Ordinal Scales
1. Pain as an Example of an Elusive Construct
Pain is commonly assessed using ordinal scales:
Numeric Rating Scale (NRS): 0 to 10 scale (“no pain” to “worst pain imaginable”).
Visual Analog Scale (VAS): A line from “no pain” to “worst pain,” where patients mark their current level.
Verbal Descriptor Scale (VDS): Words such as mild, moderate, severe, excruciating.
These methods appear numerical, but they are fundamentally ordinal — they reflect subjective rank orderings, not quantified intervals of experience.
2. Instability of Pain Ratings
Pain is not stable:
Fluctuates from day to day, hour to hour, even minute to minute.
Changes depending on mood, stress, distraction, attention, medication, posture, and even weather (for some conditions).
Two reports given moments apart by the same patient can differ by several points.
Patients themselves often express difficulty:
"I don’t know what number to pick. The pain is always there, but it changes."
What is reported is often an impression, not a measurement.
3. Unknown Physiology, Unstable Measurement
Even after decades of research, pain science is incomplete:
Mechanisms of chronic pain are not fully understood.
Neuropathic pain, phantom limb pain, fibromyalgia, and other conditions challenge explanatory models.
The relationship between tissue damage and reported pain is inconsistent.
This means ordinal pain ratings are anchored to an unstable phenomenon, compounding the already unstable nature of ordinal scales.
4. Ordinal Scale + Unstable Construct = Epistemic Fog
When you combine:
An unstable subjective experience (pain), with
An unstable measurement method (ordinal scaling),
The result is epistemological opacity:
What is the reading truly telling us?
Does a "5" today correspond to the same pain as a "5" tomorrow?
What is the real difference between "4" and "6" on a 0–10 scale?
This becomes an exercise in symbolic reporting, not quantitative measurement.
5. Illusions of Aggregation and Analysis
Despite these challenges:
Pain ratings are averaged.
Changes of one point are treated as clinically meaningful.
Interventions are judged "effective" based on reductions in reported scores, often without real understanding of underlying variability.
This reinforces the earlier critique:
Numerals are not numbers, and unstable experiences reported through ordinal scales do not become stable quantities by aggregation.
Summary
Factor: Status
Nature of pain: Unstable, fluctuating subjective experience
Measurement method: Ordinal scale (not interval or ratio)
Scale stability: Low — responses vary within and between subjects
Physiological understanding: Incomplete; mechanisms of pain not fully explained
Aggregation utility: Weak — does not resolve instability or measurement error
Concluding Note
Pain measurement is a textbook example of the broader argument:
A deeply subjective, unstable phenomenon.
Measured by ordinal scales that present the illusion of numerical precision.
Interpreted and acted upon as if it were objective, reliable data.
Yet, it rests on foundations of conceptual fragility.
You are perfectly correct, people (scientists) build complex models/systems often (making assumptions = a leap of faith) incorrectly applying mathematical tools in situations they are not designed for .. and when models break (start spewing obvious garbage) nobody understands why, because assumption-error was introduced 100s of steps back (and you are lucky if there was only one).