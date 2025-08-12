Author’s Preface

The concept of general intelligence—symbolized as g—was first proposed by Charles Spearman in the early 20th century to explain why people who score well on one type of cognitive test tend to score well on others. He interpreted this statistical regularity as evidence of a single underlying cognitive resource that contributes to performance across many domains. The idea has since become central to psychometric theory and the design of intelligence tests, including IQ tests, and remains widely cited in psychology, education, and popular science.

This essay examines g both as a statistical abstraction and as a purported mental faculty. It argues that g cannot be measured in any legitimate sense, for two fundamental reasons:

1. Measurement category error: IQ and similar cognitive tests produce ordinal or categorical scores that are treated as if they were interval-scale measurements, enabling statistical procedures that are conceptually invalid for such data.

2. Population–individual misapplication: The widely cited “percent variance explained” figures for g are aggregate properties of datasets and models. Applying them to describe the structure of an individual’s mind is a textbook ecological fallacy.

To make these points concrete, the essay includes a worked example showing how ordinal test scores are turned into a “percent variance explained” statistic and then misused to make illegitimate individual-level claims.

Introduction

In psychometrics, g is extracted as the first principal factor from the correlation matrix of multiple cognitive tests. Its numerical value is not observed directly; it is inferred from statistical relationships in test data. IQ tests are designed to provide a standardized measure of this g factor, but the tests themselves are bound by format constraints: they must be scored consistently, administered under uniform conditions, and made resistant to obvious cultural loading.

The measurement of g is thus an indirect, model-dependent process, not a direct reading of a mental property. The validity of any claim about g depends on the validity of the test scores, the appropriateness of the statistical methods used, and the soundness of the interpretation. This essay argues that the first and last of these conditions fail decisively.

1. The Statistical Origins of g

Spearman’s original insight came from observing the “positive manifold”—the pattern that people who perform well on one kind of mental test often perform well on others. By applying early factor analysis, he extracted a “general factor” that accounted for a large portion of the shared variance among test scores. This became known as g.

Over the decades, psychometricians refined the models, often placing g at the top of a hierarchy above specific abilities such as verbal comprehension or spatial visualization. Yet in every case, g is a statistical artifact: a factor defined by the model and dataset, not an independently verified structure in the brain or mind.

2. The Measurement Category Error: Ordinal Data Treated as Interval

The first major flaw in g measurement lies in how IQ test results are generated and handled.

2.1 Nature of the data

IQ tests are scored by counting correct answers, weighting items by difficulty, or converting raw scores into scaled scores. Each subtest produces a score that is at best ordinal: it indicates the rank order of performance, but not that the “distance” between scores is consistent or comparable across the scale.

An ordinal score tells us that Person A outperformed Person B, but not by how much in any physical sense. The gap between a score of 8 and 9 on a vocabulary subtest does not necessarily represent the same ability difference as the gap between 4 and 5.

2.2 Illegitimate statistical operations

Statistical procedures like correlation, regression, and factor analysis assume data are at least on an interval scale, where equal differences in scores represent equal differences in the underlying quantity. Applying these methods to ordinal scores creates outputs that may be mathematically correct for the assigned numbers but conceptually meaningless in terms of measurement.

This is a form of spurious quantification—treating convenience numbers as if they were measurements of a physical magnitude, like weight or length, when they are not.

3. The Population–Individual Fallacy in “Percent Variance Explained”

3.1 What the statistic actually means

Psychometricians often report that g “explains” a certain percentage of the variance in intelligence test scores—for example, 40% or 50%. In statistical terms, this means:

> In this dataset, differences in g scores account for that proportion of the variability in observed test scores across the sample.

This is a property of the group’s score distribution under a specific model, not of any one person’s mind.

3.2 Why it cannot apply to individuals

The leap from population statistic to individual description is illegitimate for several reasons:

1. Population-level nature of R²: Percent variance explained is calculated across individuals. It has no meaning when applied to one person in isolation.

2. No causal decomposition: Even at the population level, the figure describes a statistical association, not a causal share of ability. g is not a substance occupying a percentage of a person’s mental capacity.

3. Ecological fallacy: Inferring individual characteristics from aggregate data is a known statistical error.

4. Model dependence: The percentage changes if the set of tests, the sample, or the extraction method changes, undermining any claim of universality.

4. How These Two Errors Combine: A Worked Example

Step 1 — Raw scores (ordinal)

Imagine a short test with three subtests: Vocabulary, Pattern Matching, and Memory Span, each scored from 0 to 10.

Person Vocabulary Pattern Matching Memory Span

A 8 9 7

B 5 4 6

C 7 8 5

D 3 3 4

E 6 5 6

These are ordinal: higher scores mean better performance, but the gaps are not guaranteed to be equal in meaning.

Step 2 — Correlation matrix (treating ordinal as interval)

Psychometric analysis treats these as interval data and computes correlations:

Vocabulary Pattern Matching Memory Span

Vocabulary 1.00 0.91 0.40

Pattern Match 0.91 1.00 0.35

Memory Span 0.40 0.35 1.00

This assumes that a one-point change anywhere along the scale is equivalent in ability terms—an assumption without basis.

Step 3 — Factor extraction

A factor analysis finds a first principal factor (g) with loadings: Vocabulary 0.90, Pattern Matching 0.88, Memory Span 0.45. This factor accounts for 48% of the total variance in the dataset.

Step 4 — The illegitimate claim

From here, a common but incorrect interpretation is:

> “48% of an individual’s intelligence comes from g.”

This makes two errors at once:

Measurement error: The original scores are ordinal, so the factor analysis has no valid metric meaning.

Ecological fallacy: The 48% figure describes a group-level statistic, not a decomposition of one person’s mind.

Step 5 — The correct interpretation

The correct, modest statement is:

> “In this sample, for this particular set of tests, about 48% of the variability in scores can be summarized by a single statistical factor.”

It says nothing about individual composition, causation, or the biology of intelligence.

5. Model Dependence and Instability

Both the measurement problem and the misinterpretation of variance explained are amplified by model dependence. Change the test battery, the scoring scheme, or the factor extraction method, and the resulting g loadings and variance percentages change. This instability is inconsistent with the idea of g as a fixed, measurable property of individuals.

6. Broader Critiques and Alternatives

Beyond these two core methodological flaws, g-based IQ testing is narrow in scope, culturally influenced, and omits major dimensions of human cognitive ability such as creativity, social understanding, and practical reasoning. Alternative models—such as multiple-intelligence theories and network models—avoid the fiction of reducing human intellect to a single number, though they bring their own empirical and definitional challenges.

Summary

General intelligence began as a statistical summary of correlated cognitive test scores. Over time, it became treated as a measurable trait, often linked to IQ scores. Yet the process of estimating g from IQ tests rests on a double error:

1. Spurious quantification—treating ordinal or categorical data as if they were interval measures.

2. Ecological fallacy—applying a population-level “percent variance explained” statistic to individuals.

The worked example here shows how these two errors combine to produce an attractive but illusory precision: a number that looks authoritative yet rests on invalid assumptions. The conclusion is stark: g may be a useful shorthand in statistical modelling, but it is not a legitimate individual measure of intelligence in any robust scientific sense.

Afterword

Group-level “percent variance explained” is a property of a statistical model describing how scores vary across a population; it is not a decomposition of an individual’s mind or abilities. Applying it to an individual is the ecological fallacy—inferring personal proportions from aggregate data.

Heritability of IQ differs from heritability of traits like height because:

IQ scores are constructed from ordinal, culturally bound tasks, not direct physical measurements.

Environmental and cultural influences on test performance are large, varied, and interact with genetics in complex, non-additive ways.

The g factor extracted from IQ tests is a narrow statistical abstraction, not total competence. It omits many skills and abilities that matter in real-world functioning.

Thus, heritability estimates for IQ are methodologically fragile, culturally contingent, and conceptually narrower than for traits grounded in direct, physical measurement.

The concepts of IQ and g (general intelligence) introduce distortions into how society thinks about intelligence, ability, and human worth.

The Conceptual Roots of the Problem

The trouble begins with the early 20th-century marriage of statistical convenience and psychological theorizing. Spearman’s g was not a discovery of a new mental organ; it was a statistical device—a single factor summarizing correlations among cognitive test scores. IQ tests were designed to produce numbers that could be used to estimate g. Over time, the statistical abstraction was reified: g became “general intelligence” in everyday speech, and IQ became its supposed numeric expression.

This reification—the transformation of a statistical index into a presumed inner essence—anchors the most pernicious misunderstandings. A statistical summary was turned into a thing, and that “thing” was treated as if it were a stable, measurable property of individual minds.

Known Problems with IQ and g

1. Narrowness of Domain

IQ tests measure a limited subset of cognitive performance—pattern recognition, abstract reasoning, certain forms of memory, and vocabulary. They omit:

Creativity and divergent thinking

Social and emotional understanding

Practical, real-world problem solving

Tacit, experience-based knowledge

By conflating IQ with “intelligence,” public discourse promotes the idea that these omitted capacities are secondary or irrelevant. This narrows our collective notion of human ability.

2. Measurement Illusion

IQ scores are derived from ordinal or categorical data treated as if they were interval measurements. This produces an illusion of precision—a three-digit score that looks like a thermometer reading but is nothing of the kind. This invites false comparisons: “This person is 10 IQ points smarter than that one,” as if intelligence could be measured like height.

3. Ecological Fallacy and Heritability Claims

Heritability estimates for IQ—often reported as “X% of intelligence is genetic”—are population-level statistics, not individual diagnoses. Applying them to individuals is logically invalid. Worse, the public interprets these as fixed proportions within each person, reinforcing deterministic and essentialist thinking about human potential.

4. Cultural and Contextual Bias

Test design inevitably reflects cultural assumptions about knowledge, logic, and problem-solving. While psychometricians attempt to minimize bias, they cannot remove it entirely. The resulting scores partially reflect cultural familiarity rather than pure cognitive capacity, but the cultural component is obscured when scores are presented as universal measures of intelligence.

5. Reification and Social Policy Distortion

Because IQ and g are presented as real, measurable properties, they are often used to:

Gatekeep educational and occupational opportunities

Justify inequality as “natural” or “merit-based”

Shape immigration and criminal justice policy

In each case, the authority of a number conceals the fragility and partiality of what that number actually represents. This is not just a technical error; it shapes policy in ways that amplify existing structural inequalities.

6. Equating IQ with Total Competence

Even when experts acknowledge that g is not total competence, the shorthand use of IQ as “smarts” in everyday language blurs the distinction. This fosters a belief that someone with a high IQ is generally more capable in all areas, and someone with a low IQ is less capable in all areas—despite abundant evidence that abilities are domain-specific and context-dependent.

How This Warps Collective Understanding

The combined effect of these problems is a deep public misunderstanding of intelligence.

Reductionism: Human intellect is reduced to a single number, erasing complexity and diversity of abilities.

Fatalism: Heritability statistics are misread as destiny, dampening belief in the value of education, training, or social change.

Legitimization of hierarchy: IQ scores lend a veneer of scientific inevitability to social stratification, framing inequality as a reflection of natural differences rather than structural conditions.

Policy distortion: When narrow, flawed measures are treated as the gold standard of intelligence, institutions reward those measures, creating a feedback loop where what is measured becomes what matters.

The Deeper Issue

The g and IQ framework carries a conceptual Trojan horse: the idea that intelligence is a single, stable, measurable commodity like gold in a vault. This notion is unsupported by measurement theory, contradicted by evidence on the diversity of abilities, and corrosive in its social effects. By embedding this fiction in public discourse, the IQ paradigm channels collective thinking toward deterministic, hierarchical, and impoverished views of human capacity.

Readings

Michell, J. (1999). Measurement in Psychology: A Critical History of a Methodological Concept. Cambridge University Press.

A detailed history of how psychology misapplied the concept of measurement from the physical sciences to subjective, non-quantifiable attributes.

Michell, J. (2008). Is psychometrics pathological science? Measurement, 6(1–2), 7–24.

A direct argument that psychometrics rests on unfounded assumptions about the measurability of psychological constructs, including intelligence.

Gould, S. J. (1996). The Mismeasure of Man (Revised ed.). W. W. Norton & Company.

A critical history of intelligence testing, focusing on reification and the misuse of statistical constructs such as g.

Deary, I. J. (2012). Intelligence. Annual Review of Psychology, 63, 453–482. https://doi.org/10.1146/annurev-psych-120710-100353

A balanced overview of intelligence research, discussing both the empirical findings and the methodological controversies.