When You Are Building an Agent

A reference for the moments between “this should be an agent” and a working multi-tool chatbot

How to Use This Guide

This is the Day 6 guide. It covers both sessions: the ReAct loop and LangChain fundamentals in 6a, RAG and multi-tool chaining in 6b. The first principle is the hardest one and the one students skip most often: make sure you actually need an agent before you build one. The rest of the guide assumes you have, and covers the places the loop breaks.

When You Might Open This Guide

You are about to build an agent and you have not yet asked whether you need one.
Your agent is looping and you cannot tell if the problem is the prompt, the tool description, or the model.
You are writing the description for a new tool and it reads like boilerplate.
RAG keeps pulling back the wrong chunks.
Your multi-tool chatbot forgets what the user said three turns ago.
Your token bill for a homework project is running higher than you expected.

When You Are About to Build an Agent and Have Not Asked If You Need One

Not every feature needs an agent. Write the hypothesis first. The design principle from Day 6a, word for word: “What specific task will this agent handle better than a single prompt? How will we measure the difference?” If you cannot answer concretely, a single LLM call is almost certainly the right choice. Agents add tool definitions, iteration loops, and token costs, and that cost has to buy a real capability gap.

Source: Day 6a slides — Why Build Agents?

Single-shot is fine for extraction, classification, and generation. Agents are for multi-step decisions. A scheduling app that produces a daily schedule from a prompt is single-shot. A scheduling app that calls check_calendar, then find_open_slot, then create_block, reasoning at each step, is an agent. The question is whether the AI or the developer does the thinking about what to call and when.

Source: Day 6a slides — Single-Shot → Agent-Powered

Agents equal LLM plus tools plus loop. That is the whole definition. Everything else (LangChain, LangGraph, CrewAI) is robustness on top of a while loop that calls the model, executes tool calls, and stops when the model returns text instead of another tool call.

Source: Day 6a slides — The Agent Loop (Pseudocode)

When You Are Writing a Tool and the Description Reads Like Boilerplate

The description is the most important part of the tool. It tells the LLM when to use it. Name and schema are identifiers. The description is the behavior. “Searches the web” tells the model nothing. “Search the web for current information that is not available in your training data, such as recent events, current prices, or real-time data” tells the model when to reach for the tool versus when to answer directly.

Source: Day 6a slides — Anatomy of a Tool

Tool descriptions drive behavior. If the agent keeps picking the wrong tool, do not rephrase the user prompt. Rewrite the descriptions. Make each one specific about the situation that triggers it. Think of the description as onboarding a coworker: when should they reach for this tool versus figure it out themselves?

Source: Day 6a slides — Good vs Bad Tool Descriptions

Tools are just functions with metadata: name, description, Zod schema. The tool() helper takes an execution function and a config object. The Zod schema doubles as input validation and as documentation the model reads. Describe every schema field; vague fields produce vague tool calls.

Source: Day 6a slides — Your First Tool

Always catch errors and return them as strings. Never throw. A thrown exception crashes the agent loop. A returned error message is an observation the model can read and adapt to. “Error: expression contains invalid characters. Try a simpler expression.” gives the model a path forward; an exception gives it nothing.

Source: Day 6a slides — Error Handling in Tools

When Your Agent Is Looping and You Cannot Find the Problem

Stream the agent. The stream output is the primary debugging tool. agent.stream() shows each Think, Act, Observe step as it happens: which tool was chosen, what arguments were passed, what the tool returned, how the model interpreted the result. If the agent misbehaves, the stream is the first place to look, before you rewrite anything.

Source: Day 6a slides — Streaming Agent Output

Set a recursionLimit on the invoke config. Not on the constructor. Five to ten is usually enough for homework-scale agents. A missing limit is how a stuck loop becomes a rent check.

Source: Day 6b slides — Token Costs and Iteration Limits

Too many tools is a root cause, not a symptom. Start with three to five. An agent with twelve tools spends most of its reasoning budget deciding which one to use, gets it wrong more often, and runs slower. Consolidate tools that overlap. Cut tools that have not been called in real traffic.

Source: Day 6a slides — Common Pitfalls

Web tools and RAG tools must be async. Forgetting await hangs the agent. This is the most common silent failure in student homework. If the tool seems to “do nothing,” check that it is declared async and that every external call is awaited.

Source: Day 6b slides — Troubleshooting: Common Errors

When RAG Keeps Pulling the Wrong Chunks

Embeddings search by meaning, not keywords. Similar meanings produce similar vectors. That is the whole point and the whole trap. If the chunks you expect and the question you asked do not share meaning, similarity search will return something plausible and wrong. Read the actual chunks your store returned before you assume the tool is broken.

Source: Day 6b slides — What Are Embeddings?

Anthropic does not sell embeddings. Pick OpenAI, Voyage, or a local model. The chat model and the embeddings model are independent choices. A Claude agent with OpenAI embeddings is a supported and common setup for the homework.

Source: Day 6b slides — Embeddings in Code

In-memory vector stores are perfect for homework. Persistent stores are for production. MemoryVectorStore loads documents at startup and loses them on restart. For a homework-scale knowledge base that fits in memory, this is the right default. Reach for Pinecone, ChromaDB, or pgvector only when you need persistence or scale.

Source: Day 6b slides — RAG: Why In-Memory?

Return source attribution from the RAG tool, not just the text. Every retrieved chunk should come back labeled with its source file and topic metadata. The agent uses the source to format its answer; you use the source to debug when the chunk is wrong.

Source: Day 6b slides — The RAG Tool

When Your Multi-Tool Chatbot Forgets the Last Turn

Multi-turn memory is a message-history array you pass on every invoke. Push the user turn, call agent.invoke, push the assistant turn, repeat. The agent does not remember on its own; you hand it the history every time.

Source: Day 6b slides — Adding Conversation Memory

The simple “push every turn” version is fine for homework. Production needs truncation or summarization. The naive version grows without bound and will eventually blow past the context window. For homework, leave it simple and note the limitation. In production, truncate to the last N turns or summarize older turns with a cheaper model.

Source: Day 6b slides — Conversation Memory: Caveats

Multi-tool chaining happens automatically from tool descriptions. You do not write if-else logic. “How much does the starter plan cost per year?” calls knowledge_base for the price, then calculator for the yearly math, then produces the final answer. The agent figures out the chain. Your job is to make each tool’s description specific enough that the right chain happens.

Source: Day 6b slides — Multi-Tool Reasoning in Action

When Your Token Bill Is Running Higher Than Expected

Cheaper models during development. claude-haiku-3-5 or gpt-4o-mini until the loop is stable. You are iterating on prompts and tool descriptions, not on the final user experience. Switch to the expensive model after the workflow is right, not before.

Source: Day 6b slides — Token Costs and Iteration Limits

Typical iteration counts: one to two for a simple calculation, two to three for search, three to five for a multi-tool chain. Ten-plus is a stuck loop. If you are seeing iteration counts higher than five for routine questions, either the descriptions are fighting each other or the model cannot find a tool that fits. Stream it and look.

Source: Day 6b slides — Token Costs and Iteration Limits

recursionLimit is the hard stop. Five to ten is the usual range. Prevents the scenario where a buggy tool returns the same unhelpful observation forever. Every agent in production has this limit. Every agent in homework should too.

Source: Day 6b slides — Token Costs and Iteration Limits

When You Are Thinking About Security

Day 4 security still applies to agents. Environment variables for keys, input validation on tool inputs, sanitization against prompt injection, audit of tool outputs before they are shown to users. The agent loop does not change the threat model; it multiplies the surface.

Source: Day 6b slides — Security Considerations

Use mathjs, not a regex filter, for a calculator in production. Regex whitelisting of “safe” characters is brittle and easy to bypass. mathjs.evaluate() parses and evaluates math only; it cannot execute arbitrary code. Function() and eval can. The Day 6a homework uses Function() as a teaching toy; do not ship it.

Source: Day 6a slides — Calculator: Key Design Decisions

The Agent at a Glance (Keep This Nearby)

Part	What it is	Common failure
Model	The LLM making the decisions	Too expensive for iteration; switch to Haiku/Mini
Tool	Function + name + description + Zod schema	Vague description; agent picks the wrong tool
Loop	Think, Act, Observe, repeat, stop on text	No `recursionLimit`; runaway cost
RAG	Embeddings + vector store + retrieval tool	Wrong chunks returned; read the actual results
Memory	Message history you pass every invoke	Grows without bound; truncate in production
Streaming	Live view of each loop step	The primary debug tool; use it first

Agents are a while loop and a few functions. Every “framework” is a wrapper around that. Learn the loop and the frameworks get easy.

Go Deeper

Where	Why It Matters
`agenticDevelopmentCoursePlan/day6a-building-ai-agents.md`	Full Session 1: ReAct in code, tools, `createReactAgent`, the homework brief.
`agenticDevelopmentCoursePlan/day6b-rag-multi-tool-agents.md`	Full Session 2: RAG, multi-tool chaining, memory, framework comparison, production considerations.
`agenticDevelopmentCoursePlan/day1-genai-fundamentals.md`	Original ReAct framing; “world’s best autocomplete” mental model that every agent runs on top of.
ReAct Paper (arXiv 2210.03629)	Source paper for Think-Act-Observe.
LangChain.js docs	API reference. Prefer `createReactAgent` from `@langchain/langgraph/prebuilt`; older `langchain` agent APIs are deprecated.