When You Are Writing Code AI Has to Work With

A reference for designing your code so AI can run it, read it, and fix it

How to Use This Guide

This is the Day 4 guide. It is about the code itself, not the prompts. Your app has to be the kind of thing an AI can interact with on its own, or the whole autonomous loop breaks. Open this guide when you are about to ship a feature, wire up logging, write a test, or paste a secret into a prompt.

When You Might Open This Guide

You just built a feature with a UI and you realized the AI has no way to exercise it without you clicking through.
You are about to write a test and you cannot decide whether the AI writes it, the AI runs it, or both.
Your logs are long prose strings that look fine in a terminal and useless to a model.
You are about to paste an API key into a prompt.
You are reviewing AI-generated code and it “looks right” and you have not actually checked.
The fix loop is running in circles on the same bug.

When You Are Designing the Interface AI Will Use

If AI can run a command, AI can test your app. If it can’t, you’re the bottleneck. AI cannot easily click buttons, cannot easily navigate a visual interface, and cannot see what your app looks like without Playwright. A CLI surface is the only thing that lets AI read, diagnose, fix, and retest without you in the loop.

Source: Day 4 slides — CLI-First: The Problem

Understand AI-as-test-runner vs AI-as-tester. A test-runner executes the scripts you already wrote. A tester dynamically probes the running system with curl, queries, log inspection, and edge cases you did not think of. Most tutorials teach only the first. The second is the real power of CLI-first design, and it is why the Explore phase matters.

Source: Day 4 slides — Two Modes of AI Testing

Four CLI best practices enable the autonomous loop. JSON output, a --help flag, proper exit codes, and stderr-vs-stdout separation. Those four. If one is missing the loop stalls because the AI cannot tell whether it worked, what the arguments should have been, or which output was the error.

Source: Day 4 slides — CLI Best Practices: 4 Principles

Put test credentials in .testEnvVars, not .env. .env is for the application; .testEnvVars is explicitly for AI and testing. The separation is not cosmetic. AI will source .testEnvVars on its own as part of the loop, and you do not want that mixed with your production environment file.

Source: Day 4 slides — Environment Variables: .testEnvVars

When You Are Deciding What to Log

If AI can see what happened, AI can fix it. That is the whole case for structured logging. The old debugging workflow (breakpoints, stepping through code, inspecting variables) assumes a human driver. Logs that are machine-parseable assume an AI that can read the full run and reason about it.

Source: Day 4 slides — The Debugging Revolution

Write structured JSON logs, not prose. “Error occurred in user service” is useless. A JSON line with level, service, action, error, and timestamp is searchable, filterable, and understandable by a model that has never touched your codebase before. Use Pino, structlog, or your language’s equivalent; do not reach for console.log or print.

Source: Day 4 slides — Multi-Language Logging Tools

Always log: function entry with inputs, function exit with results, errors with full context, external API calls, database queries in dev and test. That is the whole list. Every item has a reason; skipping any one of them creates a gap where the AI cannot reconstruct what happened.

Source: Day 4 slides — What to Log

Document the log setup in ai/guides/testing.md. The AI has to know where logs live, how to tail them, and how to filter by level. If you do not write that doc once, you re-explain it in every session.

Source: Day 4 slides — Document Logging Setup

When You Are Deciding Who Writes the Test and Who Runs It

TDD for units. Explore-then-codify for the system. Two complementary strategies, not one replacing the other. TDD works inside out: define the contract, write failing tests, implement to pass. Explore-then-codify works outside in: probe the running system, discover emergent behavior, capture what you found as repeatable integration tests. Unit tests catch logic bugs; integration tests catch wiring bugs.

Source: Day 4 slides — When to Use Which Strategy

Red, Green, Refactor, and verify the Red phase is really red. AI will write tests that accidentally pass because of some existing code. Step 4 of the TDD workflow is “run the tests and confirm they all fail.” Do not skip it. A test that cannot fail is a test that cannot catch anything.

Source: Day 4 slides — TDD Workflow Step 4: Verify Tests Fail

Ask the AI to review its own tests before implementation. “Review the tests you just wrote. What cases are missing? What assumptions did you make?” AI is excellent at identifying gaps in a test suite it just produced, as long as you ask. Skipping the review is how you end up with comprehensive-looking suites that cover three of the five real cases.

Source: Day 4 slides — TDD Workflow Step 2: Review Generated Tests

In the Explore phase, your role is to watch and learn. The AI is doing manual QA faster than you could, with more systematic coverage. Your job is to occasionally suggest areas to probe deeper (concurrent requests, long inputs) and to notice anything surprising. Then you direct the AI to codify the discoveries into scripts/test-integration.sh so they are not lost when the session ends.

Source: Day 4 slides — Explore → Codify (Phase 1: Explore)

When You Are About to Paste a Secret Into a Prompt

Never paste secrets directly into an AI prompt. They may be logged. Use the environment variable name, not the value. “Use the API key from .testEnvVars (OPENAI_API_KEY) to call the service” is the pattern. Once a secret is in a prompt it is out of your control.

Source: Day 4 slides — Handling Secrets in Prompts

Verify .gitignore before the first commit, not after. Secrets leak on the first commit, not the tenth. .env, .env.local, .env.production, .testEnvVars, *.key, *.pem, secrets/, and the ai/ folder all belong in .gitignore before the repo touches GitHub.

Source: Day 4 slides — .gitignore Security

Three patterns to spot in AI-generated code: SQL injection through string interpolation, hardcoded API keys, and user input treated as prompt instructions. AI generates these patterns confidently. Your job is to catch them. Parameterized queries, environment variables, and sanitized-plus-bounded prompts fix all three.

Source: Day 4 slides — Hands-On: Spot the Vulnerability

The confidence trap: AI makes you faster and more confident, and the confidence can be dangerous when you skip verification. The Stanford finding (Perry et al., 2023) is that developers using AI assistants produce more security vulnerabilities and simultaneously report higher confidence in their security. Independent verification (sub-agent review, fresh-session PR review) is the fix. The process is the defense.

Source: Day 4 slides — The Confidence Trap

When the Test-Log-Fix Loop Is Running

The loop is: implement, run tests, read logs, analyze, fix, re-test, repeat. Eight steps. The AI can run the whole thing without you after the first prompt. That is the whole point; if you are intervening at every step, the loop is not happening.

Source: Day 4 slides — The Autonomous Cycle

Share errors fully. “It doesn’t work” gives the AI no context. Full error with stack trace, what you were trying to do, expected behavior, actual behavior, relevant code, relevant logs, exit code. That list is the debug-prompt template. If you hand the AI less, you get less.

Source: Day 4 slides — Error Sharing Best Practices

When AI gets stuck, the fix is not another prompt. It is widening the view. Bounded rationality: the AI optimizes the part of the problem it can see, and a polluted context narrows that view. The escalation prompt is “stop, analyze, do not change anything yet.” Sometimes the real answer is a fresh session with a clean context.

Source: Day 4 slides — When AI Gets Stuck

CLI-First Quick Card (Keep This Nearby)

Rule	Why
JSON on stdout	AI parses it
Plain-text errors on stderr	Separation of concerns
`0` for success, nonzero for failure	The loop gates on exit codes
`--help` flag	Self-documenting, AI can orient itself
`.testEnvVars` for test creds	Separate from app `.env`
Source creds, do not paste them	Prompts can be logged
Structured JSON logs	AI can reason about them
`.gitignore` verified before first commit	Secrets leak on commit one

These are the rules the autonomous loop depends on. Break one and the loop either stalls or silently does the wrong thing.

Go Deeper

Where	Why It Matters
`agenticDevelopmentCoursePlan/day4-ai-friendly-code.md`	The full lecture, including the Stanford paper reference and the security exercise.
`agenticDevelopmentCoursePlan/slides/day4-slides.md`	Speaker notes explain the “why” behind every CLI and logging rule.
Perry et al., 2023 (arXiv)	The Stanford study on AI-assisted developers and security confidence. The source for the confidence-trap principle. Course plan links directly to it.
`agenticDevelopmentCoursePlan/day3.5-implementation-lab.md`	The first pass at CLI scripts and the fix loop. This guide deepens what Day 3.5 started.
`agenticDevelopmentCoursePlan/day56-building-your-ai-workflow-system.md`	Where the loop stops being a feature and starts being a workflow. Read after this guide.