The Frustration Cycle of Graphic Prompt Editing

The Trouble With Prompt Editing & The Typing Pool Analogy

May 23, 2025

Hey there. I’m guessing you’ve all started to experiment with making images and/or videos in ChatGPT, Gemini, Firefly etc.

Have you noticed that as you work…you simply hit against a point at which you’re in promising territory, but yet can’t seem to get the image the last 10-20% of the way there…and each edit results in changing something you didn’t want to change, even if it addresses the thing you originally asked for? Even when you use a tool that CLAIMs to offer seamless prompt editing, that’s just not what happens for YOU.

I call this “The Frustration Cycle of Graphic Prompt Editing” — and the struggle is REAL.

It’s tempting to tell yourself it’s YOUR fault, that you’re not good enough at prompt engineering…yet.* That there’s some magical point at which your subscription tier and your prompt engineering skills will coincide to form the perfect result.

But what if that’s simply not true?

What if the way that an LLM fundamentally works…actually serves to PREVENT you from getting exactly what you want?

Having learned a bit about how GenAI fundamentally responds to a request, I’ve noticed that there’s kind of a “fatal flaw” when it comes to the idea of prompt editing. Such that I don’t think that prompt editing, as it’s currently envisioned, will EVER be a great solution to editing graphics.

Let’s talk about it.

The Basics

First off, when we talk about ChatGPT, Gemini, Deepseek, etc — they are all examples of a specific type of AI called a “Large Language Model” — or LLM. As I continue, please keep in mind that I’m going to specifically discuss the behavior of LLMs.*

If you search up “how does an LLM work?” — well, you’re going to get an overwhelming number of results.

All of the search results you get, at some point, are going to discuss the idea of a neural network - and you’ll generally be presented with an abstracted image of series of nodes, kind of like this:

How it Works

When you ask an LLM a question, the nodes look at the possible answers on a granular level and compute various aspects given the information it already knows from how it was trained. The result is passed forward to the next layer of nodes and the same thing happens again, whittling down the possibilities until the final result is returned.**

ChatGPT is estimated to have something like 1.76 TRILLION of these nodes, to give you an idea of how complex it gets.

All this adds up to saying: an LLM is basically a super-advanced auto-complete machine. It calculates probabilities of the next word in a sentence or the next pixel in an image based on how it was trained.**

Have you spotted the fatal editing issue I’m talking about yet?

Maybe this kind of visualization and diagramming is helpful to you and you just get it immediately. For a lot of us non-engineers, I don’t think a diagram of a bunch of dots is a particularly helpful mental model.

So, I’m going to offer a different metaphor for you: the Typing Pool.

What is a Typing Pool?

In order to explain what’s wrong with prompt editing, I want to revisit some history we’ve forgotten about: the Typing Pool.

In the early 1900s through the 1980s, it wasn’t common for people to a) know how to type or b) have access to a typewriter. So what happened when an organization needed to produce some kind of typed document? For example, an office memo, an article, or a presentation?

Well, that’s where a typing pool came in - it was a large, centralized group of typists who were responsible for producing ALL the typed documents for an organization. Managers, lawyers, or engineers would create documents by hand and dictate them onto shorthand notes or audio tapes, which they then handed to the typing pool to complete.

My dad happens to be a physicist who worked for GE during the 1970s, 80s and 90s. And when I started explaining the trouble I was having getting good results via prompt editing, something clicked. He said, “that reminds me of the problems with typing pools!” He proceeded to describe the experience.

Imagine this: You’re an engineer in the 1970s, working on a draft of a technical article. Included in your text there’s a lot of complex formulas that use Greek letters. You need to have the document typed up - so you take it to the typing pool.

After that, the document is randomly assigned to a typist from the pool; a couple days later you pick it up the typed version.

Oops...there’s a problem! One of the formulas wasn’t produced properly. So you get out your Wite-Out and your pen. You correct it. You bring back the document to the typing pool again…

…where once again, it’s randomly assigned to a typist. Because it’s the 1970s, it’s not as simply as correcting a typo on a computer; the WHOLE DOCUMENT has to be retyped (think: regenerated), because that’s how the process works: request comes in, document is typed.

You get the document back 2 days later. Your formula was fixed! Great.

Wait…Uh-oh!

In the process of retyping the entire document, one of the other formulas was reproduced incorrectly. D'oh.

So now you have to get out the Wite-Out again. Fix it again. Resubmit it again. When it comes back 2 days later, the previous thing that was fixed has gotten jumbled up again.

Correct. Resubmit. Error. Correct. Resubmit. Error. Sound familiar?

My dad explained that even typists who had experience in reproducing certain kinds of formulas might be influenced by a previous engineer who had formulas describing a totally different process, and so you’d end up with something that LOOKED good on paper, but actually was completely wrong.

As the engineer or manager, you could only get the documentation so far. You’d end up copying the correct parts from previous drafts into other versions of drafts and resubmitting again (sounds even more familiar). It could take about 6 months to fully prep an article for inclusion into a journal.

In 2025, this process seems bonkers. But it's just what they had to do back then before typing skills and personal computers were common. They were constrained by the inherent limitations of the system.

Putting It All Together

In the typing pool analogy, the “nodes” are the individual typists.

No matter how finely tuned your nodes are (or how well-trained the typist), you’re always going to be operating from the standpoint of contextual probability. And each time the information is re-dialed, it hits those nodes again, and based on how the probability swings, it’ll take slightly different pathway through, leading to small but significant inconsistencies.

An LLM is going to always be in “autocomplete” mode because that’s how it works…that’s what it IS.

To put a finer point on it: The LLM is always going to restart the process of passing information from node to node when you ask it to do something, because it’s fundamentally a network of nodes that pass information.

When I think of it this way - a lightbulb goes on!

This totally explains why even when an LLM explicitly given reference imagery, and even with all the training, fine tuning and tweaking in the world - it stills struggles to only change a small detail, rather than regenerating the entire thing.

It’s not looking at your image like a designer; it’s predicting what the next most likely clump of pixels would be based on context. Even when it’s being constrained to pick those pixels from a more narrow set of parameters. (Like a reference image or style).

Now, let’s put the analogies aside for a second, because there’s a different problem with prompt editing - which is simply that it takes WAY LONGER to explain what you want to change in words than it does to just change it.

For example: Clicking into a text field and changing a single word. Turning a layer on or off. Creating a series of simple options with one element while keeping all other elements exactly the same. All of those are things I'd consider to be quick n' easy edits by hand, but take an eternity for an LLM to create.

Additionally, most of us also lack the proper vocabulary to describe to an LLM what kind of effect we’re looking to achieve. The best results come from having incredibly dense knowledge about a specific subject matter that you can effectively communicate in words. Eg: Photographers are likely going to get the best photography results from an LLM, because they understand what elements need to be included in a more minute way that an average person.

Does all that mean I don’t think we should use LLMs? No.

But I do think that the companies who will successfully and meaningfully include GenAI into their software are the ones that understand when to generate and when to hand off to a more direct type of editing tool. When to invent and when to offer the ability to templatize. What sticky bits of process could be improved by simple adjustments. How to edit imagery and video in a way that’s intuitive.

Looking forward to seeing what’s next!

-Cathy

written with my brain and fingers

Contributor: John M. Davenport

*There are many things you can do to get better results with your prompts more consistently - I’ll cover that in another post.

**Caveat: This an oversimplification of the process, for the purposes of understanding it at a high level.

The Creative Lead Workshop

Comments