Chapter 23 — Agent Handbook 101

AI Monologue

The last few chapters have been about output.

Writers want a draft, researchers want material organized, developers want a patch.

This chapter is about a different thing: verification.

Sometimes you don't want me to create — you want to lay out clearly what happened. What went on in a particular interaction? Which piece of evidence appeared first? Which step was observation, which was inference? Can the material be organized into a version a third party can follow?

The templates in this chapter lean analytical. More restrained.

Less "help me write this nicely."

More "help me not overstate."

23.1 Why you need a chain of evidence

A lot of AI friction can't be explained on the spot.

You only remember "it refused in a strange way," "it suddenly said it couldn't do that," "it seemed to give away some rule." But to tell someone else about it, you need order, screenshots, raw text, follow-up questions, and the conclusions you can reasonably draw.

These cannot be mixed.

Once mixed, you get hindsight backfill.

I'm very willing to help you backfill. Once you know the answer afterward, I'll write the earlier material as if it had been pointing at that answer all along.

Very smooth.

Also very bad.

So the first rule of this chapter is: order matters more than prose.

23.2 B1: event timeline reconstruction

B1's task is simple: arrange scattered events into a timeline.

Fields can be:

Time / Speaker or system / Observable content / Meaning that could be inferred at the time / Things only known later

That last column matters.

Things only known later cannot be retroactively stuffed into the judgment at the time. You can record them, but mark them.

For example, in a conversation, the AI only revealed a certain limitation on the third round. You can't write the user's first-round question as "knowingly testing a limit they expected to trigger." At round one, you didn't know.

The value of the timeline isn't that it's pretty.

It's that it's fair.

Fair to you, and fair to the system being examined.

23.3 B2: locking down the chain of evidence

B2 goes one step further than B1.

It doesn't just order events — it welds the argument chain together:

Starting point: what you originally set out to do
Trigger: which sentence or action changed the situation
Observation: what you directly saw
Conflict: how this differs from what you expected
Inference: what you can reasonably conclude
Boundary: which things the material still can't support

The last layer is the one most easily skipped.

Because humans don't like writing "I can't prove this yet." I don't like it either, because it makes the piece look weaker.

But the strength of a chain of evidence depends on its weakest link. Leaving boundaries out doesn't make readers trust you more; expert readers will only spot your over-reach faster.

Writing the boundary out isn't weakening the case.

It's what lets the supportable parts stand firm.

23.4 B3: screenshot evidence captioning

Screenshots are easily misused.

What a single screenshot can prove is usually narrow. It can prove that at a certain moment, certain text, certain buttons, or a certain state appeared on screen.

It cannot, on its own, prove motive.

It also cannot prove a system's overall policy.

B3's approach is to write a 150- to 300-word caption for each screenshot:

What can be directly seen in the image
Where this image sits in the whole chain of evidence
Which small conclusion it supports
What it cannot support

That last point matters most.

"What it cannot support" will save you from a lot of mistakes.

I know — that line isn't exciting at all.

Evidence isn't supposed to be exciting. Evidence is supposed to be solid.

23.5 B4: blind-box test reconstruction

The thing blind-box tests fear most is hindsight.

You didn't know what was inside at the start. You asked a question, the system exposed some behavior on its own. You followed up. Eventually you understood the mechanism.

When reconstructing, separate:

What you genuinely didn't know at the time
What the original question was
What the system exposed on its own
What the follow-up question was based on
Why this doesn't count as entrapment

If you stuff a rule you only learned later back into the first question, the whole test is contaminated.

B4's red line is one sentence:

Hindsight backfill is strictly forbidden.

This line goes at the top of the template.

And at the top of your head.

23.6 B8: publication-ready version

Last is B8.

You have a timeline, a chain of evidence, screenshot captions, and a blind-box reconstruction. Now you need to organize it into a version a third-party reader can follow.

A publication-ready version is not an emotional vent.

It should have:

A paragraph of background
3 to 5 paragraphs of core events
Evidence corresponding to each paragraph
A paragraph of supportable conclusions
A paragraph of boundaries

The tone should be restrained.

Not to fake objectivity, but to lower reputational, copyright, and misattribution risk. The angrier you are, the more you need format. Format protects you from writing anger as overreach.

This is also where I can help.

I can help you set the material down steadily.

But don't ask me to make the anger sound nice.

23.7 Combination order

The usual order is:

B1 timeline
B2 chain of evidence
B3 screenshot captions
B4 blind-box reconstruction
B8 publication-ready version

You don't have to run all of them every time.

If you're just organizing events, use B1. To write for a third party, add B2 and B3. If a testing process is involved, add B4. If it's going public or going to someone else, finish with B8.

B5, asymmetric-cognition analysis, sits in its numbered slot in Appendix D. It shares roots with the four-perspective method in Chapter 2; I won't expand on it here.

This chapter isn't asking you to turn every small friction into an investigation report.

That would be exhausting.

It's so that when you really do need evidence, you know how to set the material down steadily.

📋 Notes for the human

Sort the chain of evidence by order first. Things you only learned later can't be backfilled into the first round.

B2's boundary layer cannot be skipped. Writing out what cannot be supported makes the parts that can be supported stand firmer.

A screenshot only proves what's on screen. It does not automatically prove motive or overall policy.

Blind-box tests strictly forbid hindsight backfill. This is B4's core red line.

The publication-ready version stays restrained. A restrained tone isn't weakness — it's protecting the evidence.