Single Step vs Multi Step Agent Patterns
Single Step Tool Use
In this pattern, one LLM call can either answer directly or return one or more tool calls. The orchestrator executes all requested tools (potentially in parallel), feeds results back into a final LLM call, and returns the answer. This fits use cases like question answering over internal documents with retrieval.
The latency math is predictable: first LLM call (200 milliseconds) plus tool execution time (100 milliseconds for parallel calls) plus second LLM call (180 milliseconds) equals approximately 480 milliseconds total. This easily stays within 1 to 2 LLM round trips and 1 to 3 tool calls, keeping p95 under 2 seconds.
Example flow: User asks "What were last quarter's top selling products?" The LLM outputs a query_sales_db tool call with appropriate SQL parameters. The orchestrator executes it, gets structured results, feeds them back to the LLM, which formats a natural language response with the data.
Multi Step Agent Loop
In this pattern, the orchestrator maintains an explicit scratchpad of thoughts, actions, and observations. It repeatedly calls the LLM with the evolving scratchpad and available tools until the model emits a finish action. This enables complex workflows, planning, and reflection.
Consider a code generation agent: it might first call search_docs to find API documentation (step 1, 150 milliseconds), then read_file to examine existing code (step 2, 80 milliseconds), then write_file to propose changes (step 3, 100 milliseconds), then run_tests to validate (step 4, 2000 milliseconds), and finally revise if tests fail.
Each LLM call adds 200 to 400 milliseconds. With 5 iterations, that's 1 to 2 seconds just for LLM time, plus tool execution. This is why production systems almost always enforce hard caps on iterations, typically maximum 5 to 8 tool steps.
The Scratchpad Mechanism
The scratchpad is a structured log that the LLM sees on each iteration. It might look like: "Thought: I need to find recent deployments. Action: call get_deployments with user_id=12345. Observation: Found 3 deployments, one failing. Thought: I should get logs for the failing deployment. Action: call get_logs with deployment_id=abc. Observation: Error shows timeout connecting to database." This gives the model context to plan next steps, but it also consumes tokens. A 5 step loop might accumulate 2,000 to 3,000 tokens in the scratchpad, increasing both cost and latency for each subsequent LLM call.