Loading...
LLM & Generative AI Systems • Agent Systems & Tool UseMedium⏱️ ~2 min
Single Step vs Multi Step Agent Patterns
Two Dominant Interaction Patterns:
Production agent systems implement tool use in two fundamentally different ways, each optimized for different use cases and latency requirements.
Single Step Tool Use:
In this pattern, one LLM call can either answer directly or return one or more tool calls. The orchestrator executes all requested tools (potentially in parallel), feeds results back into a final LLM call, and returns the answer. This fits use cases like question answering over internal documents with retrieval.
The latency math is predictable: first LLM call (200 milliseconds) plus tool execution time (100 milliseconds for parallel calls) plus second LLM call (180 milliseconds) equals approximately 480 milliseconds total. This easily stays within 1 to 2 LLM round trips and 1 to 3 tool calls, keeping p95 under 2 seconds.
Example flow: User asks "What were last quarter's top selling products?" The LLM outputs a
The Scratchpad Mechanism:
The scratchpad is a structured log that the LLM sees on each iteration. It might look like: "Thought: I need to find recent deployments. Action: call get_deployments with user_id=12345. Observation: Found 3 deployments, one failing. Thought: I should get logs for the failing deployment. Action: call get_logs with deployment_id=abc. Observation: Error shows timeout connecting to database."
This gives the model context to plan next steps, but it also consumes tokens. A 5 step loop might accumulate 2,000 to 3,000 tokens in the scratchpad, increasing both cost and latency for each subsequent LLM call.
query_sales_db tool call with appropriate SQL parameters. The orchestrator executes it, gets structured results, feeds them back to the LLM, which formats a natural language response with the data.
Multi Step Agent Loop:
In this pattern, the orchestrator maintains an explicit scratchpad of thoughts, actions, and observations. It repeatedly calls the LLM with the evolving scratchpad and available tools until the model emits a finish action. This enables complex workflows, planning, and reflection.
Consider a code generation agent: it might first call search_docs to find API documentation (step 1, 150 milliseconds), then read_file to examine existing code (step 2, 80 milliseconds), then write_file to propose changes (step 3, 100 milliseconds), then run_tests to validate (step 4, 2000 milliseconds), and finally revise if tests fail.
Each LLM call adds 200 to 400 milliseconds. With 5 iterations, that's 1 to 2 seconds just for LLM time, plus tool execution. This is why production systems almost always enforce hard caps on iterations, typically maximum 5 to 8 tool steps.
1
LLM reasons: Outputs tool call with parameters based on current state
2
Orchestrator executes: Validates and runs tool, adds results to scratchpad
3
Loop continues: LLM sees new state, decides next action or finish
"The choice isn't about which pattern is better. It's about whether your task requires planning across multiple steps or can be solved with a single retrieve and respond cycle."
💡 Key Takeaways
✓Single step pattern uses 1 to 2 LLM round trips with parallel tool execution, staying under 500 milliseconds p50 for retrieval augmented generation
✓Multi step pattern maintains explicit scratchpad with thoughts and observations, allowing complex planning but adding 200 to 400 milliseconds per iteration
✓Production systems enforce hard caps of 5 to 8 tool steps maximum in multi step loops to prevent unbounded latency and cost
✓Scratchpad accumulates 2,000 to 3,000 tokens over 5 steps, increasing both cost and latency for each subsequent LLM call
✓ReAct (Reason + Act) is the most common multi step prompting strategy, alternating between reasoning about what to do and taking actions
📌 Examples
1Single step: Question answering over docs uses one LLM call to identify query, one vector search (120ms), one LLM call to synthesize, total 480ms
2Multi step: Code review agent searches style guide (step 1), reads code file (step 2), runs linter (step 3), proposes fixes (step 4), validates changes (step 5)
3Hybrid: SQL agent uses single step for simple queries but switches to multi step loop when initial query fails validation, allowing iterative refinement
Loading...