When Your AI Workflow Is Leaking

A reference for the moments when the tool is fine and the system around it isn’t

How to Use This Guide

This is the Day 5/6 guide. It is about the workflow you have built around the AI, not the AI itself. The lecture frame is “dynamic, then codified, then automated.” When something works, you write it down. When it works repeatedly, you promote it. When the bill spikes or the same bug loops for an hour, the fix is almost never a better prompt. Find your scenario, read the principles, adjust the system.

When You Might Open This Guide

You have retyped the same framing paragraph into the AI three times this week.
A long session is giving worse answers than it did an hour ago.
A sub-agent came back confident and wrong, and you almost believed it.
Your token bill or rate-limit just spiked and you do not know why.
You are stuck on the same bug and you keep rephrasing instead of restructuring.
You are about to start a new feature in the same chat where you planned it.

When You Keep Retyping the Same Context

Instruction files are the highest-leverage investment you will make this semester. One CLAUDE.md at the repo root shapes hundreds of future AI interactions. The time you spend writing it once is the time you stop spending on every session after. This is Casey’s “write one this week” rule.

Source: Day 5/6 slides — Why instruction files beat ad-hoc prompting

The progression is dynamic, then codified, then automated. Experiment live. When something works, capture it in a process doc under ai/instructions/. After three or more successful uses, promote it to a slash command. Do not automate before you have codified, and do not codify before you have seen the pattern work at least once in the wild.

Source: Day 5/6 slides — The meta-lesson: dynamic, codified, automated

CLAUDE.md is always loaded. Process docs are on-demand. Slash commands are one-step shortcuts. Three tiers, three uses. Put identity and invariants in CLAUDE.md. Put repeatable procedures in ai/instructions/. Put the five-line prompt you have typed a hundred times behind a /command. Know which tier a given piece of guidance belongs in before you write it down.

Source: Day 5/6 slides — Skills and slash commands (the three tiers)

When a Long Session Starts Giving Worse Answers

Context pollution is the number one project killer. The longer a session runs, the more wrong turns, half-patched attempts, and stale assumptions sit inside the context window. The AI optimizes against the garbage along with the goal. The symptom is “it was working an hour ago and now it isn’t.” The cause is almost always the context, not the model.

Source: Day 5/6 slides — Context pollution, the silent killer

Plan in one session, implement in a fresh session, review in a third. Multi-session discipline is the whole defense against context pollution. Session 1 produces the plan and roadmap. Session 2 opens clean, reads the roadmap, and builds. Session 3 opens clean again and reviews. Each session sees only what it needs to see.

Source: Day 5/6 slides — The three-session pattern

Fresh session beats better prompt, nearly every time. When you are stuck, the default move is to rephrase. The correct move is almost always to open a new session with the relevant roadmap or context.md as the starting prompt. You will feel like you are “losing progress.” The progress was already polluted. The new session catches up in two exchanges.

Source: Day 5/6 slides — The golden rule: when in doubt, fresh session

When a Sub-Agent Comes Back Confident and Wrong

Match witness count to risk. One agent is fine for routine work. Two for high-stakes. Three or more for anything truly consequential. The Law of Witnesses from Deuteronomy 19:15: a matter is established by two or three witnesses. Independent agents converging on the same answer is high-confidence signal. One agent’s confidence is noise.

Source: Day 5/6 slides — The Law of Witnesses

Sub-agents cost three to five times normal tokens when controlled, five to ten times when not. This is a known multiplier from the AgentTaxo and SupervisorAgent research. Budget for it. Use sub-agents for verification and for high-stakes implementation, not for routine edits. Fewer specialized agents beat many generic ones, with a 40 to 60 percent token saving in the published benchmarks.

Source: Day 5/6 slides — The token tax, real numbers

The supervisor-worker pattern: one agent decomposes, specialists execute, a verifier checks. Do not let one agent do all three. The supervisor is bad at execution, the workers are bad at seeing the whole, and the verifier has to be independent or its sign-off means nothing. Each role is a different prompt, and often a different session.

Source: Day 5/6 slides — The supervisor-worker pattern

When Your Bill or Rate Limit Just Spiked

Unchecked sub-agent fan-out is almost always the cause. A single top-level prompt that delegates to five workers, each of which spawns two verifiers, runs ten agents on one request. Track the fan-out. If you do not know how many agents a workflow spawns, you do not know what it costs.

Source: Day 5/6 slides — Token tax (5-10x unchecked fan-out)

Fewer specialized agents beat many generic ones. A focused agent with a narrow prompt and a clear tool list finishes faster and uses less context than a generic “do everything” agent. The published number is 40 to 60 percent savings. The practical version: if your agent’s prompt fits on one screen, it is probably the right size.

Source: Day 5/6 slides — Budget-conscious patterns (fewer specialized agents)

Long sessions are expensive sessions. Every new exchange re-sends the full context. A ten-exchange session costs far more than ten two-exchange sessions on the same work, because each exchange pays for the entire history. Multi-session discipline is also cost discipline.

Source: Day 5/6 slides — Context pollution and its cost implication

When the Same Bug Is Looping and You Keep Rephrasing

The escalation ladder: re-prompt with more context, fresh session, sub-agent review, manual intervention. Four levels. Most developers never get past level one. Each level is a different kind of fix, and skipping levels is how an hour turns into an afternoon. If rephrasing once did not work, rephrasing twice will not work.

Source: Day 5/6 slides — The escalation ladder (four levels)

Level 2 is the big one: fresh session with ai/project/context.md as the opening prompt. A clean session with a good context file is faster than an old session with a better prompt, every time. If you find yourself at level one for more than two tries, you are already supposed to be at level two.

Source: Day 5/6 slides — Escalation ladder: Level 2, fresh session with context.md

Level 3 deploys a sub-agent to review the stuck problem, not to fix it. The review prompt is “here is what we tried, here is what failed, analyze the root cause, do not write code.” A second, independent view is what breaks the loop. Same-agent retries do not.

Source: Day 5/6 slides — Escalation ladder: Level 3, sub-agent review

Level 4 is you, not the AI. At some point the right answer is to read the code yourself, simplify the problem, and hand the simpler version back. The ladder is not infinite. “When in doubt, fresh session” is the golden rule; “when in doubt twice, you debug” is the quiet corollary.

Source: Day 5/6 slides — Escalation ladder: Level 4, manual intervention

When You Need to Put the AI Down

Five scenarios say stop: deep learning, novel problems, security-critical code, stuck loops, team alignment. Each is a case where the cost of AI speed outweighs the benefit. Learning a concept for yourself, reasoning about something the training data has not seen, writing auth or payments code, escalating past level three, or negotiating with a teammate are all places to close the laptop lid on the AI.

Source: Day 5/6 slides — Pitfall prevention and when to put the AI down (five scenarios)

Over-engineering is a behavioral problem, and the fix is in CLAUDE.md. The AI defaults to elaborate solutions because that is what most of the training data looks like. A short instruction file line (“prefer the minimal solution that passes the tests; do not add options, flags, or abstractions that are not explicitly asked for”) shifts the default for every future session.

Source: Day 5/6 slides — Pitfall prevention: over-engineering fix in CLAUDE.md

Scope creep is checked against the PRD, not against taste. When the feature list grows, run the MVP reality check from the Day 2 guide. If the new item does not trace back to the PRD, it does not belong in this sprint. The AI will happily build what you ask for; your job is to not ask for the wrong thing.

Source: Day 5/6 slides — Pitfall prevention: scope creep against the PRD

When You Want to Run Work in Parallel

Worktrees give each AI session its own directory. A worktree is a second checkout tied to the same repo on a different branch. One terminal on feature A, another on feature B, another on a hotfix. No branch switching, no stashing, no “wait, which branch am I on?” Claude Code’s --worktree flag creates the worktree and launches a session in it.

Source: Day 5/6 slides — Git worktrees: what they are

Lightweight worktree use: a verification sub-agent in an isolated copy. You do not need both sessions to be heavy. Spin up a worktree purely so a verifier can work without touching your active state. Accessible even on a constrained plan.

Source: Day 5/6 slides — Worktrees: lightweight pattern for Pro plans

AI handles merge conflicts well when it has the roadmap and commit messages. It reads intent, not just diffs. When you merge two AI-built branches back together, hand the merging agent both roadmaps and the relevant commit history. It will outperform a human who sees only the conflict markers.

Source: Day 5/6 slides — AI plus worktrees: AI handles merges well

The Workflow System (Keep This Nearby)

Layer	What lives here	When you touch it
CLAUDE.md	Identity, invariants, default behaviors	Once per repo, then rarely
`ai/instructions/`	Repeatable procedures	After a pattern works once
Slash commands	One-step shortcuts	After three or more successes
Sub-agents	Verification, high-stakes work	When risk justifies 3-5x tokens
Multi-session	Plan, build, review in separate sessions	Every non-trivial feature
Escalation ladder	Re-prompt, fresh session, sub-agent, manual	When something is stuck

The system is the point. Any single piece can be skipped in a hurry. Skipping all of them is how semesters end with a project no one can ship.

Go Deeper

Where	Why It Matters
`agenticDevelopmentCoursePlan/day56-building-your-ai-workflow-system.md`	Full lecture, including the AgentTaxo paper reference and the worktree flow.
`agenticDevelopmentCoursePlan/slides/day56-slides.md`	Speaker notes on the frenemy example and the three-session pattern.
AgentTaxo / SupervisorAgent, arXiv 2025	Token distribution and efficiency in multi-agent systems. The source for the 3-5x and 40-60% numbers.
Jason’s “Systems Thinking” guide	Leverage points and feedback loops. This guide uses the vocabulary; his guide teaches the frame.
`agenticDevelopmentCoursePlan/day4-ai-friendly-code.md`	The logging and CLI design that make verification possible at all.