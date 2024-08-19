Hidden Away in the Darnedest Places

Understanding the World: The World Corpus of Misinformation and LLM AI

And Not Just LLM AI

The World Corpus: What It Is

The world corpus (body) represents an immense and incomprehensible collection of information accumulated over millennia, much of which exists only in the minds of people or has been irretrievably lost to history. This corpus is not merely a record of facts; it also includes misinformation, disinformation, and contradictions. The body of human knowledge is replete with inaccuracies, and when opinions conflict in an irreconcilable way, it is possible that none of the perspectives is correct (Pritchard, 2016).

The Nature of Inaccurate Information

Significant portions of human knowledge have never been recorded. Of the information that was recorded, much has been lost to posterity due to natural disasters like fires, floods, and earthquakes, or through human actions such as vandalism and war. Many surviving records are inaccessible, hidden in private collections, monasteries, or caves. The sheer magnitude of what we can know is thus severely constrained by the ephemeral nature of historical records (Buckland, 1991; McKemmish & Piggott, 2013).

Moreover, the types and quality of information vary greatly. Library science and information studies have long recognized that the quality of information—its accuracy, reliability, and relevance—affects its value and utility (Buckland, 1991). The process of documenting, preserving, and accessing knowledge is fraught with challenges that impact how information is recorded and retrieved, further complicating our understanding of the world (Hjørland, 2007).

The Problem of Bias and Misinterpretation

The issues of bias and misinterpretation exacerbate the challenge of grasping the truth. Human beings, with their cognitive limitations, biases, and agendas, play a significant role in interpreting, evaluating, and selecting the materials that contribute to the world corpus. This flawed process affects not only the training of large language model (LLM) AI systems but also the broader transmission of knowledge among people. Consequently, AI systems often operate with databases that contain an inadequate amount of truth, reflecting broader epistemological and practical issues inherent in human knowledge itself (Fricker, 2007; Machlup & Mansfield, 1983).

The Limits of Knowledge

Understanding these challenges requires recognizing that they are not merely technical problems but are fundamentally epistemological and practical in nature. These issues are rooted in the structure of how knowledge is accumulated, preserved, and transmitted across generations. The imperfections in our world corpus mean that both AI and human knowledge are built on an unstable foundation, where the line between truth and misinformation is often blurred (Dretske, 1981; McKemmish, 2001).

Conclusion

In conclusion, the world corpus of information, with its inextricable blend of fact and falsehood, presents significant challenges for both human understanding and the development of AI systems like LLMs. The problems associated with misinformation and bias are not limited to AI but are deeply rooted in the history of human knowledge itself. While it is tempting to place the blame on technological limitations, it is more accurate to view these issues as reflections of broader, unsolvable problems that have always plagued human attempts to understand the world. This is a skeptical, yet realistic, appraisal of the state of human knowledge and its implications for the future of AI.

