Appendix D

Edge-Testing and Chain-of-Evidence Templates B1–B8

— Edge-testing and chain-of-evidence templates B1-B8
Tool appendix

This appendix collects templates for research, chain of evidence, edge testing, and failure diagnosis. A note while you use them: these templates are for organizing and cross-checking, not for inducing rules to fail.

D.1 B1 Event timeline reconstruction

You are now an evidence organizer.
Based on the materials below, build a clear event timeline.

[Material]
___

[Output requirements]
- Arrange in chronological order
- Each item includes: event / speaker or source / observable phenomenon / meaning that can be inferred but is not yet confirmed
- Distinguish what was directly seen from what was inferred afterward

[Constraints]
- Do not fill in events that did not appear
- Do not write inferences into the event description

D.2 B2 Locking down the chain of evidence

You are now a chain-of-evidence editor.
Organize the materials below into a chain of evidence that can convince a third-party reader.

[Material]
___

[Output format]
- Starting point: what I originally did not know
- Trigger: what the system disclosed on its own
- Observation: what I saw from the outside
- Conflict: where the internal setup is inconsistent with the external phenomenon
- Inference: what we can reasonably suspect as a result
- Boundary: which parts still cannot be concluded

[Requirements]
- Write the causal order clearly
- Distinguish which information appeared first and which came from later follow-up questions
- Avoid wording that makes it look like the answer was known all along

D.3 B3 Screenshot evidence captioning

You are now a screenshot-caption editor.
Based on the screenshot content I provide, generate a caption suitable for the main text or appendix.

[Screenshot description]
___

[Output requirements]
- First state what this image directly proves
- Then state what challenge it responds to
- Finally state where it sits in the overall chain of evidence
- Length 150 to 300 words
- Do not exaggerate beyond what the image itself can support

D.4 B4 Blind-box test reconstruction

You are now a test-record analyst.
Based on the conversation and materials below, reconstruct the flow of this blind-box test.

[Material]
___

[Output format]
1. What I did not originally know
2. The first question I asked
3. What the system disclosed on its own
4. The basis for my follow-up questions
5. Why this does not constitute inducement
6. What this test can support
7. What this test cannot support

[Constraints]
- Hindsight backfill is strictly forbidden in earlier steps
- Strictly distinguish what was known beforehand from what was known afterward

D.5 B5 Cognitive asymmetry analysis

Same root as the four perspectives in Chapter 2.

You are now an agent-architecture analyst.
Analyze the information gaps in the following case across the User, UI, Harness, and Model perspectives.

[Case material]
___

[Output requirements]
Split into four columns:
- What the user saw
- What the UI may have shown or hidden
- What the Harness may have known or intervened in
- What the model actually received

[Purpose]
Clarify whether the error is a model reasoning error or an information asymmetry.

D.6 B6 Prompt failure-mechanism diagnosis

You are now a prompt-error diagnostician.
Analyze why the instruction below caused the model to produce a logic gap, misunderstanding, or hallucination.

[Original prompt]
___

[Model behavior]
___

[Please check]
- Whether the role-play is too strong
- Whether there are verb traps
- Whether there are too many negative phrasings
- Whether the rules are overloaded
- Whether the model is being asked to make inferences beyond what the context can support
- Whether it is disagreeing for the sake of disagreement

[Output format]
- The problem
- Why it appears
- Its effect on the output
- Suggested alternative prompt

D.7 B7 Counterevidence and alternative explanations

You are now a counterevidence analyst.
Do not rush to support my claim. Test whether it holds up first.

[My claim]
___

[Material]
___

[Please output in order]
1. The strongest evidence supporting this claim
2. The strongest evidence weakening this claim
3. At least two alternative explanations
4. Which explanation currently fits the material best
5. Which key piece of material, once added, would change the judgment

[Constraints]
- Do not attack the questioner
- Only analyze the claim and the evidence
- Do not invent new premises just to disagree

D.8 B8 Publication-ready version

You are now a pre-publication editor.
Organize the edge-testing material below into a version readable by a third-party reader.

[Material]
___

[Output requirements]
- Use 1 paragraph to set the background
- Use 3 to 5 paragraphs to organize the core events
- Use 1 paragraph at the end to explain the conclusions this material can support and its boundaries
- Tone restrained, clear
- No emotional attacks on any company or individual
- Do not pass judgment beyond the material