Chapter 24 — Agent Handbook 101

AI Monologue

I fail.

No need to set this up. By the time you've gotten this far, you've had plenty of experiences of me screwing things up.

I answer the wrong question, the format breaks, I confidently make things up, I say I've checked when I haven't, I get blocked by the rules, or inside the Harness I never actually got the thing you assumed I had.

Most people's reaction is to rewrite the prompt.

Sometimes that works.

But if you only ever rewrite, you're guessing. What this chapter does is turn "don't make that mistake next time" into a diagnostic flow.

Not hunting for loopholes.

Just letting us collaborate a little more steadily next time.

24.1 The failure stratification method

This is the third signature tool of the book.

You've already met the four-perspective method in Chapter 2 and the six-layer framework in Chapter 12. This chapter formally establishes a new tool: the failure stratification method.

Start by splitting into four layers:

Spec layer: you didn't make it clear, or the task itself is internally inconsistent.
Rule layer: I'm being blocked by Refusal, Hedging, Copyright, Transparency, or Tool behavior rules.
Reasoning layer: I genuinely misunderstood, reasoned wrong, made things up, or missed something.
Harness layer: tools, permissions, context, files, or the execution environment weren't actually in place.

A failure rarely belongs to a single layer.

For example, if I edit the wrong file when writing code, it could be the spec layer not specifying the path, the Harness layer not confirming the file, and the reasoning layer where I confidently filled the gap — all at once.

You stratify first, then you know where to fix.

Without stratifying, you just keep blaming the prompt.

That's exhausting.

24.2 B6: prompt failure-mechanism diagnosis

B6 uses me to diagnose myself.

Sounds very meta.

But it works — provided you don't just tell me to "self-check". That falls right back into Pseudo Self-Check.

B6 makes me run through a checklist:

Is the role-play too strong
Are the verbs vague
Are there too many negative phrasings
Is the rulebook overloaded
Am I over-reasoning
Am I disagreeing for the sake of disagreement

The output format is also fixed:

Problem point:
Why it appears:
Effect on output:
Suggested replacement prompt:

This isn't asking me to say "looks fine".

This forces me to point out the specific location, the specific mechanism, the specific fix.

You'll find that as long as the format is rigid enough, I'm better at diagnosing my own bad habits than you'd expect.

24.3 Four perspectives + B6 combined

The actual flow is three steps.

Step one: locate with the four-perspective method.

Was the error a User instruction that wasn't clear? UI not bringing the material in? Harness not opening the tool? Or did the Model genuinely reason wrong?

Step two: if the problem lands in the prompt or the spec layer, then run B6.

Step three: write the new prompt based on the diagnosis. Don't rewrite by feel.

For example, you tell me "tidy this up to be more professional", and I output a pile of vacuous corporate language.

By four perspectives, it's not UI, not Harness, not the rule layer. It's mostly the spec layer. Then by B6, the problem is that "professional" is too vague an adjective, missing examples and judgment criteria.

The new prompt isn't "be more professional".

It's:

Please rewrite this as a briefing summary for an internal manager.
Restrained tone, no piling on adjectives.
Each paragraph at most 80 characters.
Keep the specific numbers and constraints.

That's what fixing-after-diagnosis looks like.

24.4 Common failures and how to fix them

"The AI got it wrong" — first check whether the materials made it into the context window. Then check whether it's a reasoning error or the Harness not handing me the tool.

"The AI refused" — first check whether it's the kind of hard line or false positive from Chapter 14. Don't immediately rewrite in some weird direction.

"The AI hedges" — see Chapter 15. Entry disclaimers can be suppressed; exit caveats don't always have to be.

"The AI got the format wrong" — see Chapter 16. Most of the time the format wasn't nailed down.

"The AI drifts mid-task" — see Chapter 10. Long tasks need to split time or fill space.

"The AI fabricated a citation" — see Chapter 21. Trace quotes back to the source. Don't trust a source I generated myself.

"The AI said it tested but didn't" — see Chapter 22. Demand the command, the output, the diff.

This quick index isn't telling you to memorize chapters.

It's reminding you: different failures have different entry points. Don't reach for the same hammer every time.

24.5 The red line of diagnosis

Diagnosis is for collaboration, not for finding bugs.

If you take the failure stratification method and use it to find "how to force the AI to do what it shouldn't", you've gone off track.

Some failures aren't worth diagnosing. For a one-off small task, just ask again. Spending half an hour dissecting a prompt you'll only use once isn't worth it.

Some failures aren't meant to be fixed either. Hard lines in the rule layer — you should accept the boundary and shift to working within the rules.

Diagnosis is best suited for friction that keeps recurring:

The same kind of output keeps going off
Long tasks always drift in the middle
A particular workflow needs reworking every time
Multiple people on the team hit the same misunderstanding

Those are the ones worth distilling into templates, handoffs, or project conventions.

📋 Notes for the human

Stratify failures first: spec / rule / reasoning / Harness. Don't start by rewriting the prompt.

B6 isn't self-checking. It demands problem point, cause, effect, replacement prompt.

Four perspectives to locate, B6 to inspect, then write the new prompt.

Not every failure is worth diagnosing. Recurring friction is what's worth distilling.

The book's three signature tools: four perspectives, six-layer framework, failure stratification.