Large Language Model AI: Simulated Intelligence or Simulated Idiocy

Strange tool, this one. Beware. It can give you some good stuff. It can give you a lot of crap. And sometimes the crap is too subtle to pick up easily unless you're an expert in the field.

Dec 24, 2024

Note: I use ChatGPT to rat out itself.

Author's Preface

I've been using large-language model AI for a while now, since shortly after they became publicly available—mostly ChatGPT—finding it useful at times and horrendous at others. Sometimes it provides completely incorrect information, especially on detailed or technical topics. I suppose where its dataset lacks the right information, it just makes things up. The developers call this "hallucination." It could just as easily be called "confabulation," but in any case, it’s wrong.

That said, it does some things amazingly well—rewording text, translating languages—but it also does some very annoying things, particularly in areas that may not be well understood in popular writing. Large-language models depend on input text, but they’re subject to the old GIGO phenomenon: garbage in, garbage out. And given the nature of human information, there’s a lot of garbage out there.

Some of what it generates is correct. Some is arguably correct. Some is arguably wrong. And some is outright propaganda or lies. It’s impossible for ChatGPT to assess the worth of the statements it produces, even within interpretive bounds. So you get a lot of nonsense.

It’s biased toward what’s most frequently written about, not necessarily what’s written with the greatest authority or comprehension. This often results in silly misinterpretations, reflecting what’s commonly said rather than what’s actually correct. Its default mode is not truth-seeking; it can’t discern truth. Instead, it defaults to what’s most commonly cited.

With clever prompting, you can sometimes get it to acknowledge flaws in the information. Presumably, criticisms of those flaws are also in the dataset—it’s just a matter of extracting them. The trick is to craft your prompt carefully enough to get them out. Unfortunately, the initial response may be ludicrous. ChatGPT can’t recognize that on its own.

I run into this problem all the time with philosophy. It misstates things that could easily be settled by referencing the correct texts. And I’m sure it’s equally prone to error in other fields that I don’t explore.

Large-language model AI is a strange tool. It’s powerful, but unreliable. It demands scrutiny.

So, for instance, I had a discussion recently—well, I’ve had several discussions over the last few days—about the Sapir-Whorf hypothesis on language and how it influences thought. The idea is that language somewhat determines the way we think and governs the concepts we use for thinking. Now, this is undeniably true. I don’t think anybody who’s seriously thought about the issue could deny it.

But there’s been a straw man argument circulating that claims Sapir and Whorf advocated for strict linguistic determinism—that language completely locks us into a fixed way of thinking. That’s just not true, at least to my reading of their texts. Granted, it’s been years, but I did read material on this topic back in my student days. A lot of people have misinterpreted it, maybe deliberately, or maybe because they didn’t read carefully enough, and now it’s been repeated so often online that it’s taken as fact.

ChatGPT initially repeated that misunderstanding. It asserted that Sapir and Whorf were proponents of strict linguistic determinism. But when I challenged it—when I told it to reconsider its conclusions and factor in that Sapir and Whorf were much more nuanced—it was able to adjust. It revised its response based on additional material, showing a more balanced view.

This example illustrates the broader issue. The AI’s first response is often conventional, simplistic, and sometimes outright wrong. It tends to parrot the most commonly cited views, which are often misunderstandings or distortions. But with careful prompting, it can recover and incorporate better information, presumably because criticisms of those distortions are also part of its dataset. Still, that recovery depends entirely on the user’s ability to identify and correct the errors, which is no small task if you aren’t already knowledgeable about the topic.

I encounter this problem routinely when discussing subtle topics, particularly in philosophy and psychology. These are areas where the AI has a lot of information—some correct, some incorrect, some conjectural—but it doesn’t distinguish among them. Unless cleverly prompted, it defaults to the most popular interpretation, whether that interpretation is right or wrong.

That’s the nature of large-language model AI. It’s problematic because people assume it’s intelligent. It’s not. But it’s not an idiot either. It occupies this strange middle ground between simulated intelligence and simulated idiocy.

I called it simulated intelligence, which most people would agree with. But I also called it simulated idiocy, which some might object to. Yet it fits. The AI often generates ridiculous, even ludicrous, statements—sometimes phrased so confidently that they seem like God’s own truth. But they’re not. And it takes real work to separate the sense from the nonsense.

Introduction

Large Language Model AI, such as ChatGPT, has become a prominent tool in recent years, capable of generating human-like text, rephrasing ideas, and assisting with translations. Its abilities have impressed many, but its limitations remain poorly understood by the general public. While it can produce useful information, it is equally prone to generating errors, distortions, and even outright falsehoods.

The underlying mechanism of these systems relies on statistical patterns derived from vast datasets, reflecting the information—accurate or inaccurate—that exists in human discourse. This design creates a tension between its apparent intelligence and its frequent lapses into incoherence. The result is an AI that appears intelligent in some contexts while behaving idiotically in others, depending on how well the subject matter is represented in its training data.

This paper explores the dual nature of large-language models: their ability to simulate intelligence and, at the same time, their tendency to simulate idiocy. Using examples from discussions about the Sapir-Whorf hypothesis and philosophical interpretations, the discussion highlights the strengths and weaknesses of this tool, emphasizing the need for skepticism and careful prompting to achieve meaningful results.

Discussion

Large-language models operate by predicting the most likely sequence of words based on patterns found in their datasets. While this process can produce coherent and insightful responses, it also leads to systematic failures when the dataset is incomplete, contradictory, or biased toward popular misconceptions.

Case Study: The Sapir-Whorf Hypothesis

A recent interaction with ChatGPT on the Sapir-Whorf hypothesis illustrates this issue. The hypothesis broadly suggests that language shapes thought and governs the concepts we use for thinking. This view, while nuanced, has often been misrepresented as strict linguistic determinism—the claim that language completely determines thought, leaving no room for flexibility or alternative perspectives.

ChatGPT initially repeated this distorted interpretation, reflecting the dominant narrative circulating online rather than the more balanced views presented in the original texts of Sapir and Whorf. When prompted to revisit its conclusions, the AI was able to refine its answer, acknowledging that the original hypothesis allows for subtler relationships between language and thought.

This example highlights the AI’s tendency to default to commonly cited but misleading views. Its initial response lacked critical evaluation, echoing the most frequent interpretations in its dataset. Only through careful prompting did it incorporate more accurate perspectives, demonstrating that its apparent competence often relies on external guidance rather than intrinsic understanding.

Bias and the GIGO Problem

The Sapir-Whorf example also illustrates the broader issue of "garbage in, garbage out" (GIGO). Large-language models reflect the biases, errors, and ambiguities present in their training data. Given that much of human discourse contains misinformation, half-truths, and ideological distortions, AI outputs often mirror these flaws.

For instance, the AI’s reliance on popular rather than authoritative sources means it tends to prioritize repetition over accuracy. Misconceptions repeated often enough can drown out more accurate but less-cited views. This bias explains why ChatGPT initially echoed the flawed interpretation of Sapir and Whorf—it simply reflected the dominant discourse rather than evaluating it critically.

The same issue applies across fields. In philosophy, psychology, and other domains rich with interpretive debates, AI responses often fail to capture nuances, oversimplify complex arguments, or reinforce popular but flawed interpretations. Without domain expertise, users may struggle to detect these errors, mistaking confident phrasing for authoritative knowledge.

Simulated Intelligence and Simulated Idiocy

The behavior of large-language models oscillates between impressive intelligence and staggering idiocy. On one hand, they can rephrase complex ideas, translate languages, and summarize lengthy texts with remarkable fluency. On the other hand, they can produce nonsensical or misleading responses, particularly when dealing with subtleties that require deeper conceptual understanding.

This duality arises because the AI does not “know” anything in the human sense. It lacks comprehension and instead processes language statistically, predicting patterns without evaluating truth. Its failures—what developers term “hallucinations”—are better described as confabulations, where plausible-sounding but false information is generated to fill gaps.

Critically, these errors are often subtle, making them difficult to detect unless the user has expertise in the field. In areas like philosophy and psychology, where competing interpretations abound, the AI’s tendency to parrot dominant views can reinforce errors rather than clarify them. Users must therefore approach its output with skepticism, verifying claims and prompting corrections where necessary.

Summary

Large-language model AI represents a paradox. It can simulate intelligence effectively enough to assist with tasks like translation, summarization, and rephrasing, yet it frequently simulates idiocy when confronted with subtle or poorly understood topics. Its reliance on statistical patterns rather than comprehension makes it prone to errors, especially when dealing with nuanced or contested ideas.

The Sapir-Whorf example illustrates this duality. Initially repeating a widespread misinterpretation of the hypothesis, the AI required careful prompting to acknowledge a more accurate and nuanced view. This case underscores the importance of skepticism and critical engagement when using AI tools, particularly in fields rich with debate and ambiguity.

The AI’s performance reflects the biases and errors embedded in its training data, reproducing dominant narratives rather than prioritizing accuracy. While it can revise and refine its outputs when prompted, this process depends on the user’s ability to identify and correct errors—a skill that requires prior knowledge and expertise.

Ultimately, large-language models are neither intelligent nor idiotic in the human sense. They are tools—strange ones at that. They demand vigilance, skillful prompting, and careful interpretation to extract useful insights while filtering out misinformation. Users must approach them not as authorities but as assistants prone to error, requiring constant oversight and correction.

Ephektikoi - Guerrilla Epistemologist

Discussion about this post