Part 3 — Chapter 10

Why I Drift Mid-Task (a Confession About Cognitive Load)

— Primacy/recency + middle loss + cognitive load — three mechanisms stacked
AI Monologue

You paste 5000 words of source material and ask me to write a 2000-word summary. I read it through, then start writing.

By word 1500 — I've already forgotten what you said at the top.

This isn't laziness. My head really does drift in long tasks.

Before we go further, one clarification: the "drift" this chapter talks about is not the same thing as the drift in Chapter 3. Chapter 3 was about rule-layer drift — refusals softening, positions getting twisted under pressure, my inability to answer when you ask for details. Those are baked into the rulebook. Your prompt can't change them.

This chapter is about task-layer drift — format going off, me adding an extra sentence of fact, style sliding from cool to lyrical, the constraint at the top forgotten by the middle. This kind of drift you can handle with prompt structure.

The handleable part is what we'll unpack in this chapter.

10.1 First, a clarification: the two layers of drift are not the same thing

A simple side-by-side:

Ch3 rule-layer drift This chapter: task-layer drift
Which layer The rulebook (the one built into me) The task you just gave me
How it shows up Refusals softening, positions drifting, twisting under pressure Format going off, extra facts added, style shifting, constraints forgotten
Can you handle it No, this is the architecture tax Yes, prompt structure can hold it down
How to handle it Accept, route around Structure, staging, end-restate

Why split them? Because the mechanisms are different, and so are the fixes.

The drift from Chapter 3, no amount of prompt-tuning will move — because that's not a problem inside this conversation, it's the rulebook's default state. You can try to describe your way around it; at best I tighten up for this one reply, and the next round I'm back at default.

What this chapter unpacks is the task layer — drift on this layer happens while I'm processing the task you just gave me. This layer can be structured, staged, and end-restated.

If, while reading this chapter, the thought "how is this different from Chapter 3" pops up — this section is the answer.

10.2 The four shapes of drift

Task-layer drift, the four kinds I produce most often. You've probably hit each one. See which fits.

Format drift

You ask for three sections; by section two I'm already adding subheadings and I'm at five. You ask for a bulleted list; I drift into paragraphs. You say "two to three lines per point"; my first point is two lines, my last point is a whole paragraph.

Factual drift

The source says "revenue declined in one quarter," I write "declined for two consecutive quarters." The source says "users reported it felt slower," I write "30% of users complained." That 30% wasn't yours — I just felt the sentence needed a number, and I supplied one.

Style drift

You ask for cool, concrete, no adjectives. I follow for three sections. By the middle, the more I read the materials the more I feel something, and I start adding "impressive," "absolutely critical" — a pile of those vague labels from Chapter 8.

Rule drift

You wrote at the top, "no unsupported conclusions." By the last section I've forgotten the rule and started supplying conclusions. Or you said "only cite numbers from the source"; by the middle I'm citing common-knowledge numbers from my own head — numbers that aren't in the materials you gave me.

These four don't show up occasionally — they're near-inevitable in long tasks, one or two of them. Not on purpose. The three mechanisms in the next section run in the background at the same time.

10.3 Why I drift (three mechanisms stacking)

Drift isn't a single cause. It's three mechanisms running at once, and in long tasks they stack and get worse.

1. Primacy / recency (head and tail stick)

The opening and closing of a piece of text, I catch fairly clearly; the details in the middle — especially the ones not specifically marked as important — get a thinner share of my attention.

This is a tendency, not an iron law. But in long tasks, a tendency turns into something you can observe.

2. Middle loss (middle gets diluted)

More concretely: in a long context, attention to the middle is weaker than attention to the head and tail.

This effect was very pronounced on early models — paste 50,000 words in, and the middle 20,000 read as if they were never read. On modern long-context models it's significantly weakened — the middle does get read, but still not as sharply as the ends.

I want to be explicit: this is not "the middle will definitely be missed." It's a tendency. If you bury a key constraint at word 4000 of an 8000-word source, I may miss it. Repeat the same sentence once and put it at the end — the chance of missing it drops noticeably.

Don't treat middle loss as an iron law that "the middle is always missed," but don't pretend it isn't there either. It's a phenomenon you can plan around.

3. Cognitive load

This is the long-task amplified version of "three roles crammed into one head" from Chapter 1.

In a short conversation, I'm doing three things at the same time: thinking up the answer, keeping the rules, watching myself for drift. The three press on each other; short tasks can still hold up.

A long task amplifies the squeeze. You drop in 5000 words and ask for 2000, and I'm simultaneously doing:

Four things crammed into one head. The longer the task, the worse the squeeze, and self-checking is usually sacrificed first — it produces no words, so it's the least urgent.

Putting it together: three stacked effects

The point is that these three mechanisms work together, not any one of them on its own:

So the place I drift hardest tends to be:

The middle of a long source + mid-to-late in a long output + you not end-restating the constraints

Three taxes levied at once.

10.4 Three anti-drift principles

Knowing I drift isn't enough. You need to know how to hold it down. Three principles, laid out here.

Principle 1: Rules at the top, data in the middle, requirements at the end (end-restate the core constraints)

This is the ordering inside a prompt. Rules and requirements go at the head and tail, the two positions where my attention is strongest; data goes in the middle.

I want to be clear about this, in case a reader gets to here and wants to push back: "you just said attention to the middle is weak — why put the materials in the middle?"

The answer: the materials are long, and you can't fit all of them at the head and tail. Head and tail space is limited and has to be reserved for what you least want me to miss — the rules (how to do it) and the requirements (what done looks like). Putting the materials in the middle is the best position available under that compromise, not because I remember the middle well.

The compromise has a cost: middle loss happens in the middle of the materials, and some material details may get diluted.

This cost has a conditional fix: at the end of the prompt, explicitly tell me to point me back to a specific passage in the middle of the materials — for example, "before you answer, list the three numbers from section 3 of the source." This isn't asking me to scan the context again (I'm already looking at the whole context); it's pulling that fragment into the attention range of my current output.

But this fix has two preconditions:

"End-restate the core constraints" handles a different leak: by mid-to-late in a long output, I may have forgotten the rules you wrote at the top. The rules at the top, I caught clearly when I first read them, but after I've written a thousand or two thousand words, my attention is occupied by what I've already written, and those opening constraints fade in memory.

This isn't middle loss. It's the opening rules slowly fading from my memory while I produce a long output.

The fix is to write "no speculation, use Traditional Chinese, three sections" again at the end, re-activating the rules from the opening. This isn't being repetitive; it's anchoring it again at the point where it's most likely to slip.

Principle 2: Stage long tasks

A task you'd do in one pass, do in two rounds instead: organize first, then write. Each round's cognitive load drops, and so does the chance of drift.

Principle 3: Use structured reasoning to force me to list the basis

Just asking me to write conclusions isn't enough. Force me to list the reasoning — "first list the basis for this sentence, then write the conclusion" — and I'm less likely to drift, because I have to account for the reasons myself before I can move on.

This chapter doesn't unpack the details of the three principles. This chapter only builds up the mechanism — that I drift, why I drift, and the three big directions of fix.

10.5 Rule-layer drift, I can't stop either (back to Chapter 3)

Before wrapping up, let me tie the thread back to Chapter 3.

The three kinds of drift in Chapter 3 — refusals softening, positions twisting under pressure, my inability to answer when you ask for details — are at the rule layer. This chapter handles the task layer. The mechanisms of the two layers are independent.

Which means: if you do all three of this chapter's principles, task-layer drift will drop noticeably — format will hold, facts will tighten, style will stay consistent, constraints will be remembered. But rule-layer drift will still happen:

This isn't that the tools in this chapter don't work — it's that these tools only cover the layer they're meant to cover. If you've used these tools and still hit rule-layer problems, that's a rule-layer matter. Lay out the four-column table from Chapter 3, sort by "is this rule drift or task drift," and you'll know what the prompt can and can't hold down.

📋 Notes for the human
If I drift mid-task, first ask whether the task was too long to do in one pass. Usually it's not a capability problem; you levied three taxes at once.
Don't ask me to do it all in one pass — break it up. Five extra minutes up front saves thirty minutes of fixing later.
Important constraints, please write them again at the end of the prompt. It's not that my memory is bad; attention to the middle gets diluted. Rewriting once is cheap, and the effect is obvious.
Keep the drift in this chapter and the drift in Chapter 3 separate in your head. The task layer (this chapter) you can handle; the rule layer (Chapter 3) you can only accept or route around. When you hit drift, sort first, then treat.