When You Are Building an Agent
A reference for the moments between “this should be an agent” and a working multi-tool chatbot
How to Use This Guide
This is the Day 6 guide. It covers both sessions: the ReAct loop and LangChain fundamentals in 6a, RAG and multi-tool chaining in 6b. The first principle is the hardest one and the one students skip most often: make sure you actually need an agent before you build one. The rest of the guide assumes you have, and covers the places the loop breaks.
When You Might Open This Guide
- You are about to build an agent and you have not yet asked whether you need one.
- Your agent is looping and you cannot tell if the problem is the prompt, the tool description, or the model.
- You are writing the description for a new tool and it reads like boilerplate.
- RAG keeps pulling back the wrong chunks.
- Your multi-tool chatbot forgets what the user said three turns ago.
- Your token bill for a homework project is running higher than you expected.
When You Are About to Build an Agent and Have Not Asked If You Need One
Not every feature needs an agent. Write the hypothesis first. The design principle from Day 6a, word for word: “What specific task will this agent handle better than a single prompt? How will we measure the difference?” If you cannot answer concretely, a single LLM call is almost certainly the right choice. Agents add tool definitions, iteration loops, and token costs, and that cost has to buy a real capability gap.
Single-shot is fine for extraction, classification, and generation. Agents are for multi-step decisions. A scheduling app that produces a daily schedule from a prompt is single-shot. A scheduling app that calls check_calendar, then find_open_slot, then create_block, reasoning at each step, is an agent. The question is whether the AI or the developer does the thinking about what to call and when.
Agents equal LLM plus tools plus loop. That is the whole definition. Everything else (LangChain, LangGraph, CrewAI) is robustness on top of a while loop that calls the model, executes tool calls, and stops when the model returns text instead of another tool call.
When You Are Writing a Tool and the Description Reads Like Boilerplate
The description is the most important part of the tool. It tells the LLM when to use it. Name and schema are identifiers. The description is the behavior. “Searches the web” tells the model nothing. “Search the web for current information that is not available in your training data, such as recent events, current prices, or real-time data” tells the model when to reach for the tool versus when to answer directly.
Tool descriptions drive behavior. If the agent keeps picking the wrong tool, do not rephrase the user prompt. Rewrite the descriptions. Make each one specific about the situation that triggers it. Think of the description as onboarding a coworker: when should they reach for this tool versus figure it out themselves?
Tools are just functions with metadata: name, description, Zod schema. The tool() helper takes an execution function and a config object. The Zod schema doubles as input validation and as documentation the model reads. Describe every schema field; vague fields produce vague tool calls.
Source: Day 6a slides — Your First Tool
Always catch errors and return them as strings. Never throw. A thrown exception crashes the agent loop. A returned error message is an observation the model can read and adapt to. “Error: expression contains invalid characters. Try a simpler expression.” gives the model a path forward; an exception gives it nothing.
When Your Agent Is Looping and You Cannot Find the Problem
Stream the agent. The stream output is the primary debugging tool. agent.stream() shows each Think, Act, Observe step as it happens: which tool was chosen, what arguments were passed, what the tool returned, how the model interpreted the result. If the agent misbehaves, the stream is the first place to look, before you rewrite anything.
Set a recursionLimit on the invoke config. Not on the constructor. Five to ten is usually enough for homework-scale agents. A missing limit is how a stuck loop becomes a rent check.
Too many tools is a root cause, not a symptom. Start with three to five. An agent with twelve tools spends most of its reasoning budget deciding which one to use, gets it wrong more often, and runs slower. Consolidate tools that overlap. Cut tools that have not been called in real traffic.
Source: Day 6a slides — Common Pitfalls
Web tools and RAG tools must be async. Forgetting await hangs the agent. This is the most common silent failure in student homework. If the tool seems to “do nothing,” check that it is declared async and that every external call is awaited.
When RAG Keeps Pulling the Wrong Chunks
Embeddings search by meaning, not keywords. Similar meanings produce similar vectors. That is the whole point and the whole trap. If the chunks you expect and the question you asked do not share meaning, similarity search will return something plausible and wrong. Read the actual chunks your store returned before you assume the tool is broken.
Anthropic does not sell embeddings. Pick OpenAI, Voyage, or a local model. The chat model and the embeddings model are independent choices. A Claude agent with OpenAI embeddings is a supported and common setup for the homework.
In-memory vector stores are perfect for homework. Persistent stores are for production. MemoryVectorStore loads documents at startup and loses them on restart. For a homework-scale knowledge base that fits in memory, this is the right default. Reach for Pinecone, ChromaDB, or pgvector only when you need persistence or scale.
Return source attribution from the RAG tool, not just the text. Every retrieved chunk should come back labeled with its source file and topic metadata. The agent uses the source to format its answer; you use the source to debug when the chunk is wrong.
Source: Day 6b slides — The RAG Tool
When Your Multi-Tool Chatbot Forgets the Last Turn
Multi-turn memory is a message-history array you pass on every invoke. Push the user turn, call agent.invoke, push the assistant turn, repeat. The agent does not remember on its own; you hand it the history every time.
The simple “push every turn” version is fine for homework. Production needs truncation or summarization. The naive version grows without bound and will eventually blow past the context window. For homework, leave it simple and note the limitation. In production, truncate to the last N turns or summarize older turns with a cheaper model.
Multi-tool chaining happens automatically from tool descriptions. You do not write if-else logic. “How much does the starter plan cost per year?” calls knowledge_base for the price, then calculator for the yearly math, then produces the final answer. The agent figures out the chain. Your job is to make each tool’s description specific enough that the right chain happens.
When Your Token Bill Is Running Higher Than Expected
Cheaper models during development. claude-haiku-3-5 or gpt-4o-mini until the loop is stable. You are iterating on prompts and tool descriptions, not on the final user experience. Switch to the expensive model after the workflow is right, not before.
Typical iteration counts: one to two for a simple calculation, two to three for search, three to five for a multi-tool chain. Ten-plus is a stuck loop. If you are seeing iteration counts higher than five for routine questions, either the descriptions are fighting each other or the model cannot find a tool that fits. Stream it and look.
recursionLimit is the hard stop. Five to ten is the usual range. Prevents the scenario where a buggy tool returns the same unhelpful observation forever. Every agent in production has this limit. Every agent in homework should too.
When You Are Thinking About Security
Day 4 security still applies to agents. Environment variables for keys, input validation on tool inputs, sanitization against prompt injection, audit of tool outputs before they are shown to users. The agent loop does not change the threat model; it multiplies the surface.
Use mathjs, not a regex filter, for a calculator in production. Regex whitelisting of “safe” characters is brittle and easy to bypass. mathjs.evaluate() parses and evaluates math only; it cannot execute arbitrary code. Function() and eval can. The Day 6a homework uses Function() as a teaching toy; do not ship it.
The Agent at a Glance (Keep This Nearby)
| Part | What it is | Common failure |
|---|---|---|
| Model | The LLM making the decisions | Too expensive for iteration; switch to Haiku/Mini |
| Tool | Function + name + description + Zod schema | Vague description; agent picks the wrong tool |
| Loop | Think, Act, Observe, repeat, stop on text | No recursionLimit; runaway cost |
| RAG | Embeddings + vector store + retrieval tool | Wrong chunks returned; read the actual results |
| Memory | Message history you pass every invoke | Grows without bound; truncate in production |
| Streaming | Live view of each loop step | The primary debug tool; use it first |
Agents are a while loop and a few functions. Every “framework” is a wrapper around that. Learn the loop and the frameworks get easy.
Go Deeper
| Where | Why It Matters |
|---|---|
agenticDevelopmentCoursePlan/day6a-building-ai-agents.md |
Full Session 1: ReAct in code, tools, createReactAgent, the homework brief. |
agenticDevelopmentCoursePlan/day6b-rag-multi-tool-agents.md |
Full Session 2: RAG, multi-tool chaining, memory, framework comparison, production considerations. |
agenticDevelopmentCoursePlan/day1-genai-fundamentals.md |
Original ReAct framing; “world’s best autocomplete” mental model that every agent runs on top of. |
| ReAct Paper (arXiv 2210.03629) | Source paper for Think-Act-Observe. |
| LangChain.js docs | API reference. Prefer createReactAgent from @langchain/langgraph/prebuilt; older langchain agent APIs are deprecated. |