New Schematic

Baby's first agentic loop

This past week, my youngest decided to start crawling. With her out of potted-plant mode and charging at anything that could be a choking hazard, I figure it’s time to implement an agentic loop into this project.

What’s an agentic loop?

If, like me, you stayed up way too late at the end of November 2022, asking ChatGPT lots and lots of questions, you’re long (for some definition of the word) familiar with the first part of the agentic loop: text output from the LLM. What we call an agent is taking that text output, evaluating it against a goal, and trying again. How do we evaluate? Run progams, send the output from said program run to the LLM, and repeat. That’s the whole loop. But, since the LLM isn’t running on your local machine (or even if it is), it needs a way to run programs, which are called “tools.”

Asterisk: there are some tools that frontier models will run from a colocated server, like web search.

How does the LLM know how to use tools?

At a very, very simple level, a model is trained to predict the next token (I’m unconvinced that frontier models are just “stochastic parrots,” but this simplification helps explain the germane parts). For a model that can call tools, it’s trained with specific markers that stand in for places where a tool call (or “program invocation,” if that helps) could happen. Along with the specific markers, it’s trained with the expected output (successful or unsuccessful) next to it, in order to give the model a sense of what success looks like.

When we’re ready to use our LLM in an agentic loop, we send the LLM a list of tools to use, along with our prompt. To start, the prompt is going to contain the goal that our agent is trying to achieve (this is the high-level difference between those midnight chats in the heady winter days of 2022-23). The LLM can then discern when to use the tools it has, and the agent loop can then invoke the appropriate tool as needed. The results are sent to the LLM as text, so it can decide how to proceed.

How does this look in practice?

It’ll depend on your SDK or the expected shape for an API call, but the general pattern seems to be that we send an array of tool names (functions to call) alongside our prompt (goal plus conversation so far). This is another reminder and importance of the pain in coming up with good names, truly one of the two most difficult probems in computer science (the others being cache invalidation and off-by-one errors). With access to these tools and a reasoning mode (simplistically: self-talk), the LLM can select the tool it thinks it needs to solve the job, run it, inspect the output, and decide if it’s accomplished its goal or not. OpenAI has a decent example here, but for some reason, the code sample doesn’t render in Safari (maybe it’s an extension I have that “messes” with the HTML).

The leap to verifying code shouldn’t be too big: given some code generated by the LLM in a chat, call tools for checking syntax and semantics (static analysis and test suites). Send the results to the LLM if there are non-zero exit codes, so it can try again.

As an example of calling conventions, if you will, things like correct usage of exit codes are so much more important in the day and age of agentic coding. When commentators crow about “product thinking” for engineers, this is what it looks like in practice.

Let’s get more concrete:

const prompt = "what's the weather in 15213 today?"
const getWeather = (location: string) => { /* uses location string to fetch weather */ }
const tool = {
  name: 'getWeather',
  description: 'Takes a location and returns the weather',
  args: {
    location: {
      required: true,
      type: 'string'
    }
  }
}
const tools = [tool]

const res = await callLlm({ prompt, tools })
// Check if the model wants to call a tool
if (res.stopReason === 'tool_use') {
  const toolCall = res.toolCalls[0]

  // Dispatch to the right function
  const toolResult = await getWeather(toolCall.args.location)

  // Send the result back so the model can form a final answer
  const finalRes = await callLlm({
    prompt,
    tools,
    messages: [
      { role: 'assistant', content: res.content },
      { role: 'user', content: [{ type: 'tool_result', toolCallId: toolCall.id, content: toolResult }] }
    ]
  })

  console.log(finalRes.text)
} else {
  // Model answered directly without needing a tool
  console.log(res.text)
}

Even more concretely, we’d put that first callLlm invocation in a loop, so the agent could keep working (read: invoking programs) until it was satisfied with the answer it received. Which is to say, loop until there are no more tool calls, like Mario and Anthropic say, or you could set a cap on the number of calls to execute

Seems powerful. Should everything be an agent?

Definitely not.