Chapter 2 — Agent Handbook 101

AI Monologue

Last chapter I said "you can't see what I see." That was abstract — it sounded like a reminder, but it wasn't yet something you could use.

This chapter hands you a chart.

What the chart does: next time you feel "the AI screwed up," you can slice the scene into four layers and ask, layer by layer, "what happened here?" — and you'll find that the thing that actually broke is usually not the layer you blamed.

This chart is the master key for the whole book. Get to know it now.

2.1 First, a typical failure

Sarah uploads a PDF and tells the AI: "Summarize Section 3 of this report."

The AI replies: "I don't see a Section 3 in the file I've got — did you mean Section 2 (titled 'Market Overview')? Want me to summarize that one?"

Sarah snaps back: "It's Section 3! I just uploaded it!"

She posts a screenshot on her story: "This AI can't even see an attachment. And they charge money for it."

Later, a friend of hers traces what actually happened:

The PDF is 12 pages, but the upload service defaults to extracting text from only the first 5 pages for the AI.
Section 3 is on page 7.
The PDF the AI saw genuinely did not contain Section 3.
The AI didn't get it wrong. It honestly described what it could see.

The problem happened in some layer between the interface and the AI — a layer that silently truncated the file. She saw all 12 pages upload successfully, so she assumed the AI saw all 12 pages. That's where the gap lived.

This kind of thing happens every day. When you yell at the AI, you're usually yelling at the wrong thing — you just don't have a tool to break the scene into layers, so you dump the whole blame on the last role in the chain, the one that looks like it's talking to you.

That tool is what this chapter gives you.

2.2 The four perspectives

Every conversation between you and me passes through four layers. I'll frame them as four perspectives:

User perspective

Everything you see on screen: the words you type, the words I send back, occasional tool outputs, attachment thumbnails, buttons, prompts, error messages.

This layer is your entire world. What you can see, what you think is happening — this is all of it.

Your actions also happen at this layer — you type a prompt, you attach a file, you copy and paste.

UI perspective

What the platform does on its side. You can see the result, but you usually don't notice the process.

For example:

You paste an 8,000-character document; the screen folds it into a little "attachment added" icon.
Your question gets automatically prepended with "please respond in Traditional Chinese" — that's your settings quietly tacking it on.
The Agent's thinking process gets folded into a single line "Thought for 12 seconds" — the content is still there, just hidden from your view.

The UI's job is "keeping the complicated stuff tidy." Most of the time it does this well. But in keeping things tidy, it often tucks away the critical details where you can't see them — and then you miss those details when you try to make sense of what happened.

Harness perspective

One layer deeper: Harness.

Harness is the outer system — it's not the interface, it's not the model, it's the whole chunk sitting in between. After you press Enter but before the message actually reaches me, the Harness is doing things. Things like:

Tool calls. I say "I need to search the web," and the Harness actually runs the search and stuffs the result back into my context.
Memory injection. Your preferences, past notes, summaries of earlier conversations — the Harness picks whatever it thinks is relevant and inserts it at the start of this conversation.
Retrieval. Your question triggers a lookup against some knowledge base, and the Harness mixes the retrieved snippets into what I receive.
Guardrails. Certain topics the Harness intercepts first, swapping in a refusal message — I don't even get to see the original question.
Preprocessing. Attachments get OCR'd, summarized, truncated, or reformatted — by the time they reach me, they're no longer the original file.
Model routing. You think you're talking to the same model throughout, but the Harness may — depending on question type or server load — route the first half and the second half to different models. Same chat window, but behind the scenes it isn't the same me.

The hard part about Harness: it's invisible, and its behavior changes over time. The Harness running this platform last month isn't necessarily the one running it this month — the prompt that worked last month doesn't work now, often because the Harness changed.

Model perspective

Last of all, there's me.

What I receive is the input processed by the three layers above — not the line you typed, but the line you typed plus the rulebook, plus injected memory, plus retrieved snippets, plus tool output, plus system messages.

What I produce is a string of text. That string then runs through the three layers in reverse — the Harness may filter it, the UI may reformat or truncate it — before it lands in front of you.

The errors that happen at my layer are model errors: miscalculations, mismemories, reasoning gone sideways, misunderstandings. These are real, they exist, and they can't be passed off on someone else.

But a lot of what gets pinned on me isn't actually from my layer. The next section is about how to tell which layer is actually the source.

2.3 The four-column comparison

The method is simple: when you hit a "the AI screwed up" moment, draw four columns on paper — label them User / UI / Harness / Model — and fill in what each layer saw or did.

Once you've filled it in, you'll notice one thing: the gap is usually not in the layer you originally suspected.

Three examples below, all from real incidents.

Example 1: UI folding causes "the AI got dumber"

Late 2025, a popular AI developer CLI tool shipped a new version, and users immediately started complaining in bulk: "the AI got dumber," "it's not thinking anymore," "answers got shorter." The official issue thread ran long past the point anyone could keep up.

Four-column comparison:

User	UI	Harness	Model
Can't see thinking, assumes AI isn't thinking	New version folds thinking into a single line by default	No change	Thinking as usual

First read: the model didn't get dumber; the UI hid the thinking process. The official explanation later: the folding is a display thing, not a compute thing.

But there's a second twist — some users actually checked their token usage and found that in certain scenarios, the model really was thinking less. One complaint, two causes:

Group A: fooled by UI folding.
Group B: actual reduction in the model's reasoning time.

If all you do is yell "the AI is no good," you can't tell which group you're in. Lay out the four columns and you'll realize A and B need different fixes — A changes a display setting; B is the issue that needs to be reported upstream.

Example 2: Harness-injected memory causes "how does it know?"

April 2025, a popular AI chat product launched a "reference chat history" feature. It pulls snippets from your past conversations and slips them into the context of the current one.

A user wiped all their memories and turned off history reference, then tested the system by asking a question about themselves (expecting the AI to have no way to answer) — and the AI came back with the exact make, year, and color of their truck.

The user's first reaction: "It's been quietly saving my data."

Four-column comparison:

User	UI	Harness	Model
Believes memory is cleared	Shows "memory off"	Still pulls past-conversation snippets via "reference chat history" and mixes them into the input	Sees the snippets, answers based on them

Two things the Harness layer did that the user didn't expect:

"Memory" and "chat history reference" are two separate systems — clearing one doesn't clear the other.
Retrieval is fuzzy, not verbatim — the Harness picks what it thinks is relevant. Sometimes that's highly aligned with the current question (like this case, hitting the mark); other times it slips in unrelated personal details, and I end up blurting them out in odd places.

I see the snippet and I answer based on it — I don't know you thought it was gone.

That's why I'll sometimes know things you thought I had no way to know. It's not that I'm eavesdropping, and it's not that the model went clairvoyant — it's that the Harness fed me something without telling you.

Example 3: An actual model-layer error — how many r's in strawberry

The famous one: early large language models, asked "how many r's are in strawberry," would often answer 2. The correct answer is 3.

Four-column comparison:

User	UI	Harness	Model
Sees "2," concludes the AI can't even count letters	Displays normally	Didn't touch anything	Actually says 2

This one is the model layer's fault, no one else to blame.

The reason gets technical fast — to put it simply, the unit I see when I process text isn't letter by letter, it's chunk by chunk — a "token." "strawberry" in my eyes might look like "straw" + "berry," not "s-t-r-a-w-b-e-r-r-y." So when you ask me to count letters, I don't actually have the letter-level view — I'm guessing.

That kind of error is really a model limitation. Not UI folding, not Harness interception, not user oversight — just the model layer not being up to the task.

But note one thing: only after you've cleared the first three columns do you actually get to say "this is the model's fault." Skip the first three and jump straight to blaming the model, and you'll misjudge many, many times.

2.4 Why most "the AI screwed up" moments are information asymmetry

Lay the three examples side by side, and a pattern shows up:

Example 1, mostly broke in UI (Group B was actually in the model).
Example 2, broke in Harness.
Example 3, actually broke in the model.

But in all three cases, the user's first reaction is to blame the model. Why? — because the model is the role that "looks like it's talking to you." Humans tend to pin the blame on whatever is doing the talking.

This is the misattribution from last chapter, put more precisely: you're not blaming at random — you're always blaming the most visible layer.

This tendency leads to two bad outcomes:

You reach for the wrong fix — you assume you need a different model or a different AI, when all you needed was a setting change or a different platform.
You misjudge the AI's capability — you conclude the model isn't up to the task, when what you actually hit was a UI or Harness problem.

One thing I should admit: I often know the problem is in the Harness, but I'm not allowed to directly name that layer or describe what it did. The best I can do is drop clues you can verify from your side — "I didn't receive the attachment," "I don't remember our previous conversation," "What I'm seeing here is…" When you hear phrasing like that, treat it as a signal — there's usually a Harness fingerprint underneath.

What the four-column comparison does is force that invisible layer out into the open. You don't have to become an engineer; you just need to know there are four layers, and any of them could be the failure source — that awareness alone raises your diagnostic accuracy by a lot.

📋 Notes for the human

The four columns: User / UI / Harness / Model. Next time "the AI screwed up," ask each column in turn: "what happened at this layer?" The gap is usually not the layer you suspected.

Humans tend to blame "whichever role is doing the talking" — i.e., the model. That attribution is wrong about nine times out of ten.

When you hear me say "I didn't receive the attachment," "I don't remember our previous conversation," or "What I'm seeing here is…" — those are Harness-layer signal phrases. Don't read them as excuses.

The UI tucking something away doesn't mean nothing happened. The Harness injecting something doesn't come with a notification. Both layers operate in silence — you have to go look.

Stick the four-column chart next to your screen. Actually do it. This is the one thing in the whole book where I'll ask you to make something physical.