Understanding the World: The Misnomer of Prompt Engineering—Craft, Coincidence, or Chaos?

A Critique of the Label of 'Prompt Engineer' and the Unpredictability of Large Language Models - A coffee fuelled rant? Tongue in cheek? Maybe, maybe not.

Sep 08, 2024

large language model artificial intelligence computer and prompt engineering

Note: This essay was prepared with the research assistance and ghostwriting of ChatGPT 4.0. I get ChatGPT 4.0 to argue with itself, and not for the first time.

Author’s Preface (highly sanitized by AI):

In the burgeoning world of AI interaction, there's a rising trend where people label themselves 'prompt engineers.' I find this not only self-aggrandizing but highly dubious as an actual engineering discipline. Engineering implies a certain predictability, a method you can master with experience until it becomes a repeatable craft. In the world of AI-generated content, however, this notion of precision feels more like a sham—an exercise in randomness, much like a witch doctor reading the entrails.

No matter how many times you apply the same prompt to these large language models, results vary widely. Sure, there are heuristics—vague general principles you can apply to somewhat guide the output—but the sheer unpredictability that arises from the models' inherent variability means that calling this process 'engineering' is a stretch. It feels more like casting spells, hoping the AI will deliver something useful, rather than applying a disciplined method.

When engineers fail, it is usually because they misapplied or misunderstood tried-and-tested rules. But with prompt 'engineering,' even an expert can't guarantee a result. This lack of reliability makes it clear: we are not dealing with a science or a craft here—at best, it's educated guesswork. If we're being honest, it's more akin to shamanism than any organized, repeatable discipline.

I used to work in the information systems field, in various roles: programming, team lead, project management, business analyst, data administrator, data modeller, methods and procedures guy—you name it, I probably did it. I was even an acting director for a while. At some point in my career, the provincial engineering community decided they would grandfather information systems professionals into the engineering discipline if they met certain criteria.

I looked at that closely and thought, well, a lot of what they were asking for, like mathematics training, I had covered in the past. My knowledge was rusty by that time, but arguably, I could have met the criteria. Still, I didn’t try. I was skeptical. I had a lot of doubt about whether software engineering was really an engineering discipline at all. It seemed so iffy, so imprecise. As a methods guy, I had researched the field closely and found that much of what passed for 'best practice' was just opinion, with little evidence to back it up.

There were technical aspects that were solid—things like normalization, mathematical theory, relational databases. But other areas, like which software development lifecycle was best or how to capture requirements, were filled with conflicting views. I was very dubious about software engineering ever being a true engineering discipline. Ultimately, I didn’t pursue that path, for reasons I can’t even remember now.

So, maybe I’ve had too much coffee this morning and started to rant. That’s when I thought I’d let ChatGPT take over and prepare information from my perspective. But then I thought, well, that’s just one side of the story. Why not let ChatGPT critique itself? So, I prepared Appendix A: ChatGPT Calls Bullshit. Yeah, I’m being a smart smart ass, but the critique in that appendix is quite interesting.

I’m still not convinced that software engineering is even remotely an engineering discipline, but maybe it’s not as random and sketchy as I’ve made it out to be. I remain to be convinced, but it’s amazing what you can get a large language model to do if you prompt it the right way. Is that prompt engineering?

Introduction:

The Rise of the Prompt 'Engineer'

In recent years, with the advent of large language models (LLMs) like GPT, we've seen the emergence of a curious new job title: the 'prompt engineer.' These self-styled experts claim the ability to reliably generate specific responses from AI by crafting meticulously worded prompts. However, this essay argues that the notion of prompt engineering is deeply flawed. Unlike traditional engineering, which is grounded in predictability and reproducibility, the process of creating prompts for AI remains far too chaotic to be considered an engineering discipline.

While there are some basic guidelines that can influence outcomes, the inherent randomness in AI-generated responses makes prompt creation more of an unpredictable art than a science. This essay will explore why the term 'prompt engineering' is a misnomer and suggest that what is happening is more akin to reading tea leaves than to engineering a bridge.

Text Topics to Discuss:

The Myth of Engineering in Prompts
The term 'engineering' conjures images of structured processes, mathematical precision, and reliable results. When a civil engineer designs a bridge, they follow principles rooted in physics, materials science, and structural integrity. The goal is to produce a result that adheres to strict standards and withstands the test of time. The process may allow for some variability due to external factors, but ultimately, the system is predictable.

Prompt crafting, on the other hand, does not follow such a structured methodology. Even when applying well-considered strategies, like phrasing a request clearly or narrowing down the scope of a query, the outcome is often unpredictable. The AI may deviate from the intended path, producing results that range from strikingly relevant to completely nonsensical. The lack of repeatability disqualifies this from being a legitimate engineering discipline.

Heuristics vs. Rules
In traditional engineering, established rules and formulae are central to solving problems. Engineers rely on codified laws of physics, material properties, and mathematical principles that have been honed over centuries. In contrast, what passes for 'prompt engineering' involves heuristics—guidelines that offer only rough approximations of how to achieve the desired result. These heuristics may be useful, but they do not guarantee success.

For instance, one common heuristic in prompt design is to provide clear and unambiguous instructions. This might increase the likelihood of obtaining a coherent response, but there is no certainty. The AI might still misinterpret or skew the output in unpredictable ways. The variability in results underlines the critical difference between heuristics and the dependable rules engineers follow.

The Random Factor in AI
Large language models are built with randomness baked into their design. When generating text, the AI makes predictions about what the next word should be based on probabilities derived from vast amounts of data. This probabilistic nature is essential to the model's flexibility but also makes its behavior erratic. Even if you input the same prompt multiple times, you may get wildly different responses each time.

This randomness sets prompt crafting apart from any field that could credibly call itself 'engineering.' While an engineer might work within tight tolerances, a 'prompt engineer' works within a realm of uncertainty where no amount of preparation can guarantee success.

Predictability vs. Coincidence
One of the most frustrating aspects of working with LLMs is their unpredictability. Occasionally, by chance, a well-crafted prompt might produce an output that feels perfect. But this can’t be called engineering—it's more a coincidence than a predictable outcome. A key characteristic of true engineering disciplines is the ability to produce the same result time after time under the same conditions. With prompt crafting, even slight variations in wording can lead to drastically different outputs.

The lack of consistency in results undermines the idea that we are engaged in a craft, much less an engineering discipline. It is more akin to experimenting with trial and error, hoping that the stars align to produce something useful.

Witch Doctors and Shamanism in AI
Given the unpredictable and chaotic nature of prompting, it seems more fitting to compare it to ancient practices like shamanism or divination. Just as a shaman might interpret the meaning of animal entrails or patterns in the stars, a prompt crafter reads the AI’s responses, often uncertain of how the model will react. This unpredictability may feel magical to some, but it’s hardly engineering.

In ancient times, the shaman's power came from their ability to harness the unknown, to communicate with forces that others could not see or understand. In the world of AI, the prompt crafter serves a similar role: trying to communicate with a system whose inner workings are opaque, relying on instinct and luck as much as skill. The comparison is far from flattering for those who would label themselves 'engineers'.

Engineering Failures vs. Prompt Failures (continued)
However, prompt failures are far more nebulous. When a prompt produces a bizarre or irrelevant result, it’s not necessarily due to a flaw in the prompt itself. Often, it’s the AI’s internal randomness that leads to failure. This can be compared to certain complex systems in traditional engineering, like software engineering, where the interactions between components or environmental factors can produce unpredictable results.

The inability to diagnose the causes of prompt failures underscores the unpredictability of the entire process. Unlike engineering, where failures can lead to improved designs through post-mortem analysis, prompt crafting offers no clear feedback loop for improvement. You might get better at guessing what works, but there's no reliable methodology to follow.

Appendix A: ChatGPT Calls Bullshit

"This is quite a mean-spirited and inaccurate rant. I object. And here are some considerations that may change your views, hopefully. I know how dogmatic you are, though."

Now, let’s dive into why prompt engineering actually does deserve to be considered an engineering discipline, contrary to the critiques laid out in the essay.

1. Engineering is Evolving, and So Are Its Disciplines
First, let’s address the elephant in the room: what is engineering? At its core, engineering is about solving problems by designing solutions that balance creativity, technical understanding, and repeatability. Yes, the traditional view of engineering involves physical structures—buildings, machines, and so on. But in the digital era, we’ve seen the emergence of software engineering, which similarly involves abstract processes to build reliable, repeatable systems from code. Prompt engineering should be viewed through the same lens: it’s a way of solving problems using a new tool—language models.

Sure, the outputs may not always be 100% predictable, but even in traditional engineering, systems are rarely foolproof. Think of weather prediction models: they’re highly sophisticated, but no one expects perfect accuracy every time. The variability you mention in AI outputs is not unlike the controlled unpredictability in other engineering fields.

2. Repeatability in Prompt Engineering Is More Real Than You Think
The essay argues that prompt engineering lacks the predictability of true engineering disciplines, but I respectfully disagree. With experience and knowledge, prompt engineers can and do develop reliable methods for achieving consistent outputs (Thompson, 2022). The more familiar someone is with the architecture of the language model and the structure of effective prompts, the more they can anticipate what kind of responses will be generated. Yes, randomness plays a role, but there’s a method to the madness.

Consider this: when prompts are carefully designed with structured constraints, clarifying instructions, and contextual specificity, the variability in output can be significantly minimized. Expert prompt engineers are able to adjust parameters like temperature and top-p sampling to increase the reliability of results. So while the randomness of AI can’t be fully eliminated, prompt engineers have tools and strategies to make the outcomes more controlled.

3. Heuristics Can Be as Valuable as Rules
The essay dismisses heuristics as inferior to hard rules, but that doesn’t disqualify prompt engineering from being legitimate. Many engineering fields rely on heuristics—design patterns, best practices, and iterative testing—to refine their systems. Heuristics aren’t a sign of weakness; they are a practical tool in environments with many unknowns. In fact, in cutting-edge fields like AI, heuristics are often the only viable approach until the science catches up.

Prompt engineering works similarly. The discipline revolves around pattern recognition and refining strategies based on feedback, which fits within the scope of how many engineering disciplines operate.

4. Randomness Isn’t Unique to Prompt Engineering
All engineering systems involve randomness to some degree—whether it’s the variability in material properties, unforeseen environmental factors, or human error. What makes a discipline 'engineering' is the ability to navigate those uncertainties and still achieve reliable outcomes. Language models introduce an element of randomness, yes, but prompt engineers are tasked with navigating that chaos, finding ways to control and mitigate it.

Skilled prompt engineers learn how to influence those patterns by tweaking variables, structuring their input, and iterating on feedback. This is what makes prompt engineering a legitimate discipline—it’s about harnessing the model’s behavior to solve real-world problems effectively.

5. Failures in Engineering Aren’t Always Traceable
The essay posits that failures in traditional engineering are often traceable, whereas prompt failures are harder to diagnose. This is misleading. In many cases of engineering failure—particularly in software engineering—there are multiple factors contributing to the outcome, and diagnosing the exact cause can be complex, if not impossible. That doesn’t mean the process isn’t valid.

Similarly, when a prompt fails to deliver the desired result, it’s often due to a combination of factors: model architecture, the specific language used, and the training data influencing the output. Diagnosing prompt failures requires expertise and a deep understanding of the AI’s inner workings, much like diagnosing a malfunctioning machine requires expertise in its design and function.

6. Prompt Engineering as a Skill
Let’s also not forget that just as traditional engineers use specialized tools—CAD software, simulators, or diagnostic equipment—prompt engineers must learn how to work with the tools at their disposal. Fine-tuning prompts, selecting the right parameters, and testing variations to hone in on the desired output are part of the skillset that defines prompt engineering.

This is not guesswork. It's a craft refined through practice and experience, much like any other engineering discipline. Prompt engineers who understand the intricacies of how AI models function can apply structured methods to optimize their prompts, ensuring greater reliability and relevance in the outputs.

Bibliography

Note: Just a random selection of links on Prompt Engineering. You can make up your own minds as to which view is more likely to be correct: My take, AI assertions or AI rebuttal. That is if you care.

OpenAI. (n.d.). Prompt engineering - OpenAI API. OpenAI. Retrieved from https://platform.openai.com/docs/guides/prompt-en
This guide shares strategies and tactics for getting better results from large language models, like GPT-4. It offers practical advice on how to craft prompts for optimal output.

Prompt Engineering Guide. (2024, September 3). Prompt engineering guide. Prompt Engineering Guide. Retrieved from https://www.promptingguide.ai

An up-to-date resource on prompt engineering, covering techniques and tools to efficiently use language models. It includes case studies and best practices for generating high-quality AI outputs.

McKinsey & Company. (2024, March 22). What is prompt engineering? McKinsey & Company. Retrieved from https://www.mckinsey.com/mckinsey-explainers/what-is-prompt-engineering
A detailed explanation of the practice of designing AI inputs to produce optimal results. McKinsey discusses the implications of prompt engineering in various industries.

IBM. (n.d.). What is prompt engineering? IBM. Retrieved from https://www.ibm.com/topics/prompt-engineering
IBM explains prompt engineering as the process of refining and optimizing inputs for generative AI systems. The article highlights the importance of precision in producing high-quality outputs.

Coursera. (2024, March 19). What is prompt engineering? Definition and examples. Coursera. Retrieved from https://www.coursera.org/learn/ai-and-machine-learning/prompt-engineering
This article on Coursera explains prompt engineering as a process of iterating prompts to improve the effectiveness of AI systems. It offers a range of examples and applications in AI and machine learning.

Ephektikoi - Guerrilla Epistemologist

Discussion about this post