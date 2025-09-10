Author’s Preface

The Reason Series explores the foundations of reasoning—how human beings make sense of the world, how they succeed, and how they mislead themselves. Each essay examines a domain where language, thought, and knowledge intersect, often exposing confusions that arise when words outpace clarity. This installment addresses the rise of large language model artificial intelligence (AI) and its role in research and writing.

This essay considers the epistemological challenges raised by large language model artificial intelligence (AI) in research and writing, as part of the broader Reason Series exploration of how human beings think, argue, and mislead themselves. Traditional scholarship has always been grounded in identifiable sources: books to be read carefully, lectures to be weighed, journals to be scrutinized, and conversations with teachers or peers where authority can be assessed. This grounding allows one to trace claims back to their origins and to weigh competing arguments in context.

Large language models bypass these practices. They generate text that appears authoritative but is untraceable. The user receives eloquent language without accountable knowledge. The predicament is peculiar: the prose can sound like scholarship, yet it cannot be checked in the ways scholarship demands.

The difficulty is compounded by the character of the training data itself. The corpus of human writing is not a purified record of truth but a typical human production: it contains insight and rigor, but also error, contradiction, propaganda, distortion, and superficiality. Large language models inherit this mixture wholesale. By design, they default to what is statistically most common in the data. What is most common, however, is often what is most superficial, conventional, or mistaken. The result is a system that reproduces familiar assertions fluently, but without the ability to separate durable knowledge from fashionable error.

The purpose of this essay is therefore not to reject AI out of hand, nor to praise it uncritically, but to examine its strengths and weaknesses. When used with discipline, AI can accelerate the scaffolding of arguments, broaden horizons, and stimulate unexpected connections. When used without care, it can mislead, flatten subtlety, and erode the habits of scholarship. The responsibility remains with the human writer: to distinguish appearance from substance, rhetoric from knowledge, and fluency from truth.

Introduction

The availability of large language models such as ChatGPT, Claude, and Gemini has transformed how people produce text. With a short prompt, one can obtain a polished essay on Aristotle, a summary of quantum mechanics, or an annotated bibliography on climate change.

The promise is obvious: speed, breadth, and a kind of instant fluency. But this promise conceals fundamental problems:

Provenance is hidden. Users cannot know where a claim originated.

Fluency deceives. The most elegant prose may rest on falsehoods.

Contradictions proliferate. AI reproduces centuries of conflicting assertions without resolution.

Anthropomorphic metaphors mislead. We say the model “thinks” or “knows,” though it does neither.

The central question is how such tools can be used responsibly in scholarship without mistaking their outputs for knowledge.

Bypassing Traditional Scholarship

For centuries, scholarship has meant grappling with sources. Students and researchers listen to authorities, read books and articles, ask whether the author is credible, weigh one authority against another, and check references. This creates a chain of accountability: every claim can be traced back to a source.

A historian citing Edward Gibbon’s Decline and Fall of the Roman Empire can show exactly which passage supports an argument.

A scientist referencing Darwin’s On the Origin of Species points to a specific edition and page.

A philosopher debating Immanuel Kant’s Critique of Pure Reason engages with both the original text and its commentators.

AI short-circuits this process. It does not “read” Darwin or Kant. Instead, it draws on fragments of text that others have written about them. The product is a plausible simulacrum of scholarship but not the thing itself.

This is not unlike medieval glosses—commentaries upon commentaries—except here the chain has been cut. There is no original anchor the user can readily verify.

The Problem of Provenance

Provenance—the ability to track a claim back to its source—is the lifeblood of scholarship. Without it, knowledge becomes hearsay.

Large language models obscure provenance. They provide fluent statements without the chain of evidence.

Example 1: Misattributed Quotations

Einstein is frequently saddled with sayings he never uttered: “Insanity is doing the same thing over and over and expecting different results.” This line has circulated widely, often reproduced by AI. In fact, its earliest attribution appears in a Narcotics Anonymous pamphlet in the 1980s, decades after Einstein’s death. Without provenance, the misattribution gains authority through repetition.

Example 2: Fabricated Sources

In 2023, early versions of ChatGPT routinely generated bibliographies filled with journal articles that never existed. Titles, authors, publishers, and even DOI numbers looked plausible but were invented. One academic noted receiving references to “Smith, J. (1997). Quantum Dynamics and Ontology,” a phantom publication.

Example 3: Historical Documents

Ask an AI about the U.S. Constitution, and it may confidently describe “amendments” that do not exist. Unless one checks the National Archives or a legal edition, the falsehood might pass unnoticed.

By contrast, traditional scholarship demands accountability. If one says that Nancy Cartwright critiques universal laws, one can cite How the Laws of Physics Lie (1983). The link between assertion and source is secure.

The Language Trap

One of the most subtle problems in speaking about large language models lies in the words we use to describe them. Technical terminology exists: engineers speak of “transformer architectures,” “gradient descent,” and “probabilistic token prediction.” These terms are precise, but they are mean nothing to most people outside computer science. As a necessity we resort to everyday metaphors drawn from human cognition. We say the AI “knows,” “believes,” “thinks,” or even “hallucinates.” There is no workable alternative.

At first glance, this seems harmless. After all, anthropomorphic metaphors make complex technology more approachable. But these words are misleading. A large language model does not know anything, does not believe, and does not think. It does not have intentions, goals, or awareness. What it does is calculate the statistical likelihood of what sequence of words should come next, based on patterns learned from vast training data. It produces fluent continuations, not knowledge.

This matters because language shapes perception. If one says, “The AI thinks Aristotle was wrong about tragedy,” the phrasing suggests that the system has weighed Aristotle’s writings, considered counterarguments, and reached a conclusion. In reality, the model has done nothing of the sort. It has merely recognized that the words “Aristotle” and “tragedy” frequently occur alongside terms like “catharsis” or “mimesis,” and it strings them together in statistically probable order.

The trap becomes even clearer with other examples:

When an AI is described as “hallucinating,” the word evokes a human mind having sensory illusions —“confabulation” would be a better word of course, but we are stuck with “hallucination.” In fact, the system is simply generating outputs unconstrained by factual verification. It is not perceiving anything at all.

When people say an AI “understands language,” they project comprehension where there is only correlation. The model has no concept of what “cat” or “justice” or “hydrogen atom” actually refers to; it has only learned the patterns in which those words appear.

When headlines claim that “AI wrote a poem” or “AI argued for X,” the metaphor suggests creativity and intention. But what has occurred is recombination of forms, not original thought.

This linguistic trap creates the illusion that AI is a conversational partner, when in reality it is a predictive engine for text. The danger is subtle but profound: once we speak as if the machine thinks, we are tempted to treat its outputs as if they carry authority.

History offers parallels. In the 19th century, telegraphs and phonographs were sometimes described as if they had voices or messages of their own, when in fact they only transmitted or recorded human inputs. The difference now is that the outputs of AI are so polished and contextually appropriate that they mimic the form of genuine dialogue.

The challenge, then, is twofold: to resist, as much as possible, the seduction of anthropomorphic language, while also admitting a blunt reality. Calls for new, precise ways of describing what AI “really” does are largely futile. The technical terminology is specialized, inaccessible to most, and unlikely ever to enter ordinary speech. It is mostly irrelvant to our interactions in any case. To expect people to say “the model predicts” instead of “the AI thinks,” or “the system has been trained to associate” instead of “the AI believes,” is wishful thinking. Language does not work that way. People reach for metaphors, not technicalities.

In practice, anthropomorphic phrasing will persist, because it is simple, familiar, and intuitive. But without keeping this in mind, we risk mistaking appearance for reality. We risk speaking as if AI were an epistemic agent, when it is only a mirror of human text, polished and accelerated by algorithms, not a mind that knows or believes anything.

Example: Aristotle’s Poetics

If prompted about Aristotle, the model predicts words like “tragedy,” “catharsis,” or “mimesis.” The result looks like thought but is only statistical patterning.

The metaphor shapes perception. If one says, “The AI thinks Aristotle valued catharsis,” it implies reasoning. In reality, the model has simply detected patterns where “Aristotle” and “catharsis” often co-occur.

This linguistic trap encourages users to treat AI as a partner in dialogue, when it is not a being at all.

Confabulation and Hallucination

Confabulation—the confident generation of false information—is one of AI’s greatest hazards.

Case Study: The Legal Brief

In 2023, two lawyers in New York used ChatGPT to help write a legal brief. The document looked professional, but it included six court cases that simply did not exist. When the judge discovered this, he fined the lawyers $5,000 and issued a formal reprimand.

Case Study: Academic References

In university settings, students have submitted essays with AI-generated bibliographies. Some contained journal articles with plausible titles and even invented volume and page numbers. Libraries reported being asked to locate works that were pure invention.

Case Study: Historical Anecdotes

Asked about Galileo, AI sometimes produces anecdotes that never occurred—for example, describing him dropping weights from Pisa’s tower to disprove Aristotle. While this tale is widely repeated, historians note there is no evidence Galileo performed such an experiment; it is likely apocryphal, later woven into his reputation.

In each case, the writing is polished, persuasive, and wrong. Without knowledge of the subject, the reader cannot detect the error.

Assertions and Contradictions

Human beings never stop talking, writing, and broadcasting their views. Every day, in conversations, editorials, sermons, classrooms, podcasts, newspapers, blogs, radio shows, and social media posts, countless assertions are made about how the world works and how it ought to work. These assertions range from the trivial (“this food tastes best with salt”) to the ultimate (“the universe has a purpose”), and they cover every possible domain: politics, economics, science, art, religion, family life, sports, and beyond.

The striking fact is that contradiction is not the exception but the rule. For every assertion, there is usually a counter-assertion, sometimes several. People disagree about the health value of coffee, the justice of a war, the meaning of a poem, the morality of a law, the role of government, the cause of inflation, or the interpretation of a historical event. Conflicting opinion is not an anomaly in human discourse—it is its very fabric.

This problem extends across all domains:

Everyday life: People insist that exercise is essential for longevity; others argue it wears down the body. One person says a neighbor is generous; another claims the same neighbor is selfish.

Politics: Entire political campaigns are built on diametrically opposed economic prescriptions. What one party claims will ensure prosperity, another insists will bring ruin.

Religion: Competing traditions and sects put forward incompatible accounts of ultimate reality, each proclaimed as non-negotiable truth.

Media and commentary: The same event—a protest, an election, a scientific discovery—will be presented in one outlet as evidence of progress and in another as evidence of decline.

Scholarship and science: Even in the most rigorous domains, disputes are constant. Medicine has cycled through theories of humors, miasmas, germs, genes, and environments. Physics has shifted from Newton to Einstein to quantum mechanics. Historians regularly reinterpret causes of wars or revolutions, overturning older orthodoxies.

The key point is that conflict is not restricted to specialized scholarship but endemic to all human speech. It is not an occasional misstep but the normal condition of discourse. Assertions pile upon assertions, contradicting one another endlessly, without a universal mechanism for resolution.

This is the fundamental problem of knowledge. Human beings live in a storm of competing claims. Some are better supported than others, but in the general case there is no final arbiter that can settle them all. Certain conflicts can be managed in limited domains—science can run experiments, courts can render judgments, historians can weigh documents—but as a whole, the mass of human discourse is inherently contradictory and unstable.

Implications for AI

Large language models inherit this condition wholesale. They do not filter it, correct it, or adjudicate it. By default, they tend to surface what is statistically most common, which is often the most superficial or fashionable claim rather than the most accurate. They reproduce the noise of human discourse without judgment, giving fluent voice to the confusion that has always been with us. The contradictions of human history and daily talk flow unfiltered into their outputs, and no algorithm can fully resolve them, because the problem lies not in the tool but in the very nature of knowledge itself.

Pitfalls and Rewards

Pitfalls

False Authority – Fluency masks error. The best prose may contain the worst mistakes. Weak Citation – References may be fabricated or loosely tied to assertions. Intellectual Dependency – Writers may stop engaging with primary texts. Epistemic Drift – Essays risk becoming rhetoric without foundation. Confabulated Prose – Polished but baseless passages abound. Conflicting Assertions – AI reproduces contradictions without evaluation.

Rewards

Scaffolding – AI can outline arguments quickly. Breadth – It can surface perspectives from diverse fields. Acceleration – Drafts and revisions can be produced rapidly. Exploration – It can suggest connections otherwise missed.

The tool is valuable when treated as scaffolding. It is dangerous when treated as authority.

Prompt, Dataset, and Randomness

The interaction between a user’s prompt and the large language model’s training corpus is complex, opaque, and ultimately unreliable. A prompt does not retrieve facts in the way that a query brings back a library catalogue entry. Instead, it activates statistical associations in the model’s parameters. The same prompt, entered twice, may yield different answers because a deliberate element of randomness is built into the software. This prevents the system from repeating identical phrases and creates the appearance of variety, but it also undermines reproducibility. One cannot be sure that an argument or example can be retrieved again in the same form.

The exact wording of a prompt has a disproportionate effect. Minor differences in phrasing can summon different patterns from within the training data. Unless a user has skill in targeted prompting, the model defaults to the most common response statistically available. This is not necessarily the most accurate, rigorous, or well-thought-out answer, but merely the one most frequently encountered in the corpus. The “majority view” in a dataset often reflects the most superficial or conventional account, repeated widely, rather than the most careful analysis.

Another limitation is the model’s indifference to chronology. By default, it does not distinguish between a source written two centuries ago and one written yesterday. Without explicit prompting, a statement from 1825 may be treated as if it carried the same authority as a peer-reviewed article from 2022. In fields that evolve rapidly—medicine, physics, economics—this failure to weight knowledge by time is a serious distortion.

Recent versions of ChatGPT add further unpredictability. At times, the system supplements its training data with live access to the internet. At other times, it appears to engage in a deeper mode of “reasoning,” producing extended and more coherent arguments. Yet there is no obvious way for the user to predict when these features will be activated, or on what basis. The result is a system whose outputs vary not only by prompt, but also by hidden internal processes invisible to the person relying on them.

Examples of Prompt Effects

Newton’s Gravity Prompt A: “Explain Newton’s theory of gravity.”

→ Likely returns a conventional schoolbook explanation: the apple falling, inverse-square law, universal attraction.

Prompt B: “Summarize Newton’s Principia treatment of gravity in historical context.”

→ May draw in commentary from 18th- or 19th-century summaries, complete with archaic phrasing.

Prompt C: “What do modern physicists say about Newtonian gravity compared to relativity?”

→ Surfaces 20th- and 21st-century contrasts with Einstein, sometimes oversimplifying relativity as a “correction.” Each answer comes from the same model, but the wording of the prompt selects for different strata of the dataset: popular clichés, antiquated commentary, or modern textbooks. Darwin’s Evolution Prompt A: “What is Darwin’s theory of evolution?”

→ Produces the familiar textbook formula: natural selection, survival of the fittest, gradual adaptation.

Prompt B: “Describe how Darwin explained species variation in the first edition of On the Origin of Species.”

→ May surface 19th-century reviews, excerpts, or paraphrases, sometimes repeating outdated terminology.

Prompt C: “How has Darwin’s theory been modified by modern genetics?”

→ Produces the “modern synthesis” of evolution plus genetics, reflecting 20th-century integration. Again, the model reproduces whichever interpretation is statistically most likely given the phrasing, not what is most accurate or current. Economic Example: Adam Smith Prompt A: “What did Adam Smith say about capitalism?”

→ Surfaces simplified slogans: “invisible hand,” free markets, often detached from context.

Prompt B: “Summarize Smith’s discussion of division of labor in The Wealth of Nations.”

→ Retrieves specific passages (e.g., the pin factory), sometimes mediated by old textbooks.

Prompt C: “How do modern economists critique Adam Smith’s arguments?”

→ Produces references to Keynesian, Marxist, or neoliberal commentaries, depending on what is common in the dataset.

In each case, the output is determined less by the complexity of the topic than by the wording of the prompt, the statistical frequency of certain explanations, and the absence of temporal weighting.

In sum, the prompt–dataset interaction exposes three vulnerabilities: unreproducibility, superficiality, and unpredictability. Together, they make the model a poor foundation for scholarship, though it may still be useful for scaffolding exploration when its limitations are kept firmly in mind.

The Limits of Training and the Fantasy of General AI

The contradictions and errors that fill human discourse have grave implications for any vision of a “general artificial intelligence,” whatever that term is supposed to mean. The idea is often put forward as if an AI system could transcend present limitations and achieve something like universal reasoning, common sense, or even consciousness. Yet if such a system is trained on human discourse—and how else could it function?—it will inherit all the flaws, distortions, and conflicts of that discourse.

No matter how refined the curation, the training data will always be human in origin: books, articles, conversations, recordings, websites, social media posts. These sources are riddled with error and contradiction. The AI cannot escape this condition because it has no independent access to reality. It does not observe the world, perform experiments, or test its claims. It only processes what has already been said.

This means the rule of GIGO—garbage in, garbage out—is unavoidable. If the input is a tangle of half-truths, misstatements, outdated theories, propaganda, and superficial commonplaces, then the outputs will reflect that same mixture. The most polished algorithms cannot conjure certainty from confusion, or knowledge from noise. They can only recombine what has been given.

This is not a technical problem awaiting a fix. It is a structural limit. Even if the training corpus were curated with great care, it would still consist of human discourse, and human discourse is inherently contradictory and unstable. To imagine that an “artificial general intelligence” could somehow rise above this is to indulge in fantasy. Any AI built on language will always be bounded by the limits of language itself.

Thus the rhetoric of “general AI” distracts from the real issue. The problem is not whether machines can one day think like humans. The problem is that humans themselves cannot reconcile the contradictions of their own discourse, and any machine trained on that discourse will inherit the same irresolvable condition.

Summary

Large language models are powerful but unreliable. They generate text that is fluent and persuasive, yet often unmoored from identifiable sources. They replicate contradictions, hallucinate citations, and misattribute quotations.

At the heart of the problem is the training material itself. The vast body of text from which LLMs are built is riddled with errors, distortions, half-truths, and outright falsehoods—just as human discourse has always been. Scientific retractions, journalistic mistakes, urban legends, and popular misquotations all circulate alongside careful scholarship. A model trained on this mixture cannot rise above it. No matter how refined the algorithm, it cannot conjure reliable truth out of unreliable data. It can only recombine what is already there.

In principle, one could imagine mechanisms for improvement. A model might be linked to curated databases, live fact-checking tools, or external sources that filter out falsehoods. Hybrid systems that combine pattern recognition with explicit reasoning or authoritative reference checking could, at least in theory, perform better than the raw data they ingest. But a pure LLM, operating only on its statistical training corpus, cannot. Its outputs will always reflect the quality of its inputs, errors and all.

The danger is that eloquence disguises this fragility. The prose is so smooth that error becomes invisible. The opportunity lies only in scaffolding: LLMs can accelerate drafting, framing, and exploration. But the responsibility remains with the human writer: to separate appearance from substance, rhetoric from knowledge, and repetition from truth.

Annotated Reading List

Computer Science and AI Critiques

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623.

A landmark critique of large language models. Shows why they produce fluent but ungrounded prose.

Marcus, G., & Davis, E. (2019). Rebooting AI: Building Artificial Intelligence We Can Trust. New York: Pantheon.

Argues that current AI is brittle and lacks reasoning. Explains why models cannot judge between contradictory assertions.

Mitchell, M. (2019). Artificial Intelligence: A Guide for Thinking Humans. New York: Farrar, Straus and Giroux.

Accessible explanation of AI’s mechanics and limitations. Useful for understanding the problem of anthropomorphic metaphors.

Philosophy and Epistemology

Floridi, L. (2014). The Fourth Revolution: How the Infosphere Is Reshaping Human Reality. Oxford: Oxford University Press.

Shows how digital technologies transform knowledge. Connects directly to AI’s role in reshaping how assertions circulate.

Goldman, A. I. (1999). Knowledge in a Social World. Oxford: Oxford University Press.

Explores how testimony and authority function in knowledge. Highlights why AI, which lacks accountability, cannot be treated as testimony.

Fricker, M. (2007). Epistemic Injustice: Power and the Ethics of Knowing. Oxford: Oxford University Press.

Discusses credibility and trust. Helps explain the misplaced authority sometimes granted to AI outputs.

Philosophy of Science and Information Reliability

Cartwright, N. (1999). The Dappled World: A Study of the Boundaries of Science. Cambridge: Cambridge University Press.

Argues that scientific laws are local, not universal. Reinforces the essay’s warning about treating AI outputs as universal truths.

Hacking, I. (1983). Representing and Intervening. Cambridge: Cambridge University Press.

Stresses experiment as the ground of knowledge. Contrasts with AI, which cannot intervene in the world.

Oreskes, N., & Conway, E. M. (2010). Merchants of Doubt. New York: Bloomsbury.

A history of misinformation in science. Shows how AI, by reproducing all assertions indiscriminately, risks amplifying error.

Information Science and Library Studies

Buckland, M. (1991). Information as thing. Journal of the American Society for Information Science, 42(5), 351–360.

Defines information in multiple senses. Distinguishes text-as-thing from knowledge-as-justified-belief.

Case, D. O., & Given, L. M. (2016). Looking for Information: A Survey of Research on Information Seeking, Needs, and Behavior (4th ed.). Bingley: Emerald Group.

Comprehensive review of how humans search for and evaluate information. Essential context for understanding how AI alters information behavior.

Dervin, B. (1998). Sense-making theory and practice: An overview of user interests in knowledge seeking and use. Journal of Knowledge Management, 2(2), 36–46.

Emphasizes the active role of humans in interpreting fragmented information. Highlights AI’s inability to “make sense.”

Closing Note

The problems of AI writing—hallucination, provenance, contradiction—are not new. They are extensions of ancient difficulties in human knowledge, now magnified by machines that produce them at scale and with polish. The works in the reading list anchor this critique across computer science, philosophy, epistemology, and library science. They remind us that reasoning remains a human responsibility.