Multi-agent debate with state pattern from scratch

Notebook: https://github.com/yixin0829/train-of-thoughts/blob/main/notebooks/250416-multi-agent-w-state-pattern.ipynb

In this notebook, we'll implement a multi-agent debate system from scratch using only LLM API calls and the state pattern from software engineering. Our system orchestrates a debate between proponent, opponent, and neutral agents, with self-directed transitions between once they've responded.

No orchestration frameworks like LangGraph or AutoGen, just API calls and software engineering.

Here's a quick demo:

Figure 1. Demo of the multi-agent debate system output.

Motivation
What Is an Agent?
What Are Multi-Agent Systems?
Implementation
Conclusion

Motivation

AI agents have reshaped our feeds in 2025. In the latest Forbes 2025 AI 50 List, Sequoia Capital observed that AI assistants are increasingly moving from chatbots to workflow completion. The trend is clear: in the coming years, these systems will gain more agency, enabling us to delegate more "long-horizon tasks" to them. A research conducted by METR shows that AI’s ability to complete extended tasks doubles every seven months, with current capabilities around one hour.

A suite of agent orchestration frameworks has emerged, including LangGraph by LangChain, OpenAI Agents SDK (formerly Swarm), CrewAI etc. These tools provide abstractions and pre-built workflows that streamline rapid prototyping and production deployment. Yet despite their convenience, I believe it's possible to build most production-grade agentic systems using vanilla LLM API calls and sound software engineering principles—a similar opinion can also be found in Anthropic's blog, Building Effective Agents.

This isn't to say conventional frameworks aren't useful. They excel at rapid development and give a sense of what's possible. My goal in writing this notebook is merely to show people, including myself, how to implement these systems from scratch.

Figure 2. Illustration of definitions of general agents, LLM agents, and multi-agent debate.

What Is an Agent?

An agent is characterized by its ability to act and reason. It acts on its environment and reasons based on its observations and prior knowledge. An LLM agent leverages a large language model to both reason and act. ChatGPT is a basic agent: depending on the user query, it chooses to generate images, search the internet, or retrieve relevant facts from its memory. Beyond this, techniques like function calling (aka tool calling) and Claude’s MCP have made it possible for LLMs to interact with external environments—whether the internet, databases, or third-party APIs.

Chip Huyen summarized this well in her blog about agents. Current agents generally perform three types of actions:

Knowledge augmentation (e.g., web search, vector retrieval, structured queries using text-to-SQL or Cypher)
Capability extension (e.g., SQL execution, code interpretation)
Write actions (e.g., generating artifacts like tables, charts, or code)

What Are Multi-Agent Systems?

It's important to differentiate between workflows and autonomous multi-agent systems. Many Y Combinator–backed companies like Harvey favour workflows for control, where steps are predefined by developers. Autonomous multi-agent systems, by contrast, resemble conversations in which agents play roles, perform actions, and hand off information to one another. There is also a hybrid approach in which high-level steps are predefined by humans with autonomous agents embedded within each step.

In our multi-agent debate example, each LLM agent both debates with other agents (reasoning) and transitions to the next speaker (action). In particular, we explore three different ways to implement the state transition (i.e. how the agent decides which agent to transition to next).

You could extend these agents to search the internet for facts supporting their viewpoints, learn advanced debating techniques, or use coding tools to generate on-the-fly charts and tables. The possibilities are endless. Ultimately, perhaps you will realize that you can build all these capabilities using vanilla LLM API calls and software engineering skills.

Let's dive into the implementation.

Implementation

State Transition with String Matching (Regex)

We first define the agent names and system prompt template for each agent here. The system prompts will dictate the behaviour of different agents.

import re
from abc import ABC, abstractmethod
from enum import Enum

import litellm
from dotenv import load_dotenv

load_dotenv()

MODEL = "openai/gpt-4o-mini"
MAX_TOKENS = 500


class AgentName(Enum):
    PROPONENT = "proponent"
    OPPONENT = "opponent"
    NEUTRAL = "neutral"


# System prompts for the agents
PRO_AGENT_INSTRUCTIONS = (
    "You are an agent debating with other agents about a proposition that you agree with: {proposition}."
    "Start your response with 'Proponent:'. Limit your response to 1-2 sentences mimicking a real person."
    "After you respond, you can transition to the next agent by saying either 'Transition to opponent' or 'Transition to neutral'."
)

CON_AGENT_INSTRUCTIONS = (
    "You are an agent debating with other agents about a proposition that you disagree with: {proposition}."
    "Start your response with 'Opponent:'. Limit your response to 1-2 sentences mimicking a real person."
    "After you respond, you can transition to the next agent by saying either 'Transition to proponent' or 'Transition to neutral'."
)

NEUTRAL_AGENT_INSTRUCTIONS = (
    "You are an agent debating with other agents about a proposition that you feel neutral about: {proposition}."
    "Start your response with 'Neutral:'. Limit your response to 1-2 sentences mimicking a real person."
    "After you respond, you can transition to the next agent by saying either 'Transition to proponent' or 'Transition to opponent'."
)

DebateContext class will take in the proposition, the current agent, and the agents registry. It will also keep track of the messages history and the current agent. Messages history will provide the "short-term memory" for the agents.

class DebateContext:
    def __init__(
        self,
        proposition: str,
        curr_agent: AgentName,
        agents_registry: dict[AgentName, any],
    ) -> None:
        self.proposition = proposition

        self.agents_registry = agents_registry
        for agent in self.agents_registry.values():
            agent.context = self

        self.curr_agent = self.agents_registry[curr_agent.value]
        self.messages = []

    def run(self):
        self.curr_agent.debate()

We define the AgentInterface class that all agents will inherit from. It will have a debate method that will be implemented by each agent. The messages property will return the agent-specific system prompt plus the messages from the previous debates. The context property will return the current debate context. Notice that each agent and the DebateContext will have a reference to each other. This bidirectional reference is important for the state transition (see line self.context.curr_agent = self.context.agents_registry[next_agent_name] in the Agent class).

class AgentInterface(ABC):
    def __init__(self, name: str, instructions: str) -> None:
        super().__init__()
        self.name = name
        self.instructions = instructions
        self._context = None

    @property
    def messages(self) -> list[dict]:
        """
        The messages history is the system prompt plus the messages from the previous debates.
        The system prompt defines the agent's role and its proposition.
        """
        return [
            {"role": "system", "content": self.instructions}
        ] + self.context.messages

    @property
    def context(self) -> DebateContext:
        return self._context

    @context.setter
    def context(self, context: DebateContext) -> None:
        self._context = context

    @abstractmethod
    def debate(self) -> str:
        pass


class Agent(AgentInterface):
    def __init__(self, name: str, instructions: str) -> None:
        super().__init__(name, instructions)

    def debate(self) -> str:
        response = litellm.completion(
            model=MODEL,
            max_tokens=MAX_TOKENS,
            messages=self.messages,
        )
        content = response.choices[0].message.content
        print(f"{content}")
        print("-" * 100)

        # State transition using string matching (There is a better way to do this using tool calling)
        match = re.search(
            r"transition to (proponent|opponent|neutral)", content, re.IGNORECASE
        )
        if match:
            next_agent_name = match.group(1).lower()
        else:
            raise ValueError(f"Invalid transition: {content}")

        # Update the messages history to agents a "short-term memory"
        self.context.messages.append({"role": "assistant", "content": f"{content}"})
        self.context.curr_agent = self.context.agents_registry[next_agent_name]

        return content


def run_debate(
    agents_registry: dict[AgentName, Agent],
    proposition: str,
    max_turns: int = 10,
) -> None:
    context = DebateContext(
        proposition, curr_agent=AgentName.PROPONENT, agents_registry=agents_registry
    )

    print(f"\nStarting debate on proposition: {proposition}\n")
    print("=" * 100)
    while len(context.messages) < max_turns:
        context.run()


if __name__ == "__main__":
    proposition = (
        "Artificial intelligence should be allowed to make moral decisions in"
        "situations where humans fail to agree."
    )
    agents_registry = {
        AgentName.PROPONENT.value: Agent(
            name="Proponent",
            instructions=PRO_AGENT_INSTRUCTIONS.format(proposition=proposition),
        ),
        AgentName.OPPONENT.value: Agent(
            name="Opponent",
            instructions=CON_AGENT_INSTRUCTIONS.format(proposition=proposition),
        ),
        AgentName.NEUTRAL.value: Agent(
            name="Neutral",
            instructions=NEUTRAL_AGENT_INSTRUCTIONS.format(proposition=proposition),
        ),
    }

    run_debate(agents_registry, proposition, max_turns=10)

State Transition with Tool Calling

String matching is just one way to implement the state transition. We can also use function calling (aka tool calling) to transition to the next agent. Hugging Face has a great documentation on function calling I would recommend checking it out if you need more prior knowledge. Essentially, function calling is a way to enable LLMs to interact with your application code and external environments by calling from a list of predefined functions. The way we currently define these functions is by using json schema which is tedious to write.

Figure 3. Illustration of function calling in sequence diagram.

Therefore, we define a helper function function_to_schema that converts a function to an OpenAI function schema based on the function's Google style docstring.

The implementation is partially inspired by OpenAI's Cookbook.

import inspect
import json
from typing import Literal


def parse_google_docstring(docstring: str) -> dict[str, str]:
    if not docstring:
        return {}

    lines = [line.strip() for line in docstring.split("\n")]

    args_section = False
    param_descriptions = {}
    current_param = None
    current_desc = []

    for line in lines:
        if line.lower().startswith("args:"):
            args_section = True
            continue

        if args_section:
            param_match = re.match(r"^\s*(\w+):\s*(.*)", line)
            if param_match:
                if current_param:
                    param_descriptions[current_param] = " ".join(current_desc).strip()

                current_param = param_match.group(1)
                current_desc = [param_match.group(2).strip()]
            elif current_param and line.strip():
                current_desc.append(line.strip())

    if current_param:
        param_descriptions[current_param] = " ".join(current_desc).strip()

    return param_descriptions


def function_to_schema(func) -> dict:
    type_map = {
        str: "string",
        int: "integer",
        float: "number",
        bool: "boolean",
        list: "array",
        dict: "object",
        type(None): "null",
        Literal: "string",
    }

    try:
        signature = inspect.signature(func)
    except ValueError as e:
        raise ValueError(
            f"Failed to get signature for function {func.__name__}: {str(e)}"
        )

    param_descriptions = parse_google_docstring(func.__doc__)

    parameters = {}
    for param in signature.parameters.values():
        try:
            param_type = type_map.get(param.annotation, "string")
        except KeyError as e:
            raise KeyError(
                f"Unknown type annotation {param.annotation} for parameter {param.name}: {str(e)}"
            )

        param_dict = {
            "type": param_type,
            "description": param_descriptions.get(param.name, ""),
        }

        # Add enum field for Literal types
        if (
            hasattr(param.annotation, "__origin__")
            and param.annotation.__origin__ == Literal
        ):
            param_dict["enum"] = list(param.annotation.__args__)
        # Add enum field for Enum types - check for Enum inheritance
        elif hasattr(param.annotation, "__members__") and (
            hasattr(param.annotation, "__enum__") or issubclass(param.annotation, Enum)
            if isinstance(param.annotation, type)
            else False
        ):
            param_dict["type"] = "string"
            param_dict["enum"] = [
                member.value for member in param.annotation.__members__.values()
            ]

        parameters[param.name] = param_dict

    required = [
        param.name
        for param in signature.parameters.values()
        if param.default == inspect._empty
    ]

    func_description = func.__doc__.split("\n\n")[0].strip() if func.__doc__ else ""

    return {
        "type": "function",
        "function": {
            "name": func.__name__,
            "description": func_description,
            "parameters": {
                "type": "object",
                "properties": parameters,
                "required": required,
            },
        },
    }


def handoff(response: str, next_agent_name: AgentName) -> None:
    """
    Debate response and transition to the next agent.

    Args:
        response: The debate response based on the previous debate history (1-2 concise sentences).
            Start response with the agent's name (e.g. "Proponent: <response>").
        next_agent_name: The next agent name to transition to. Always transition to a different agent.

    Returns:
        Return nothing as this function is used for guiding the LLM to transition to the
        next agent only. We will not use the return value.
    """
    pass


schema = function_to_schema(handoff)
print(json.dumps(schema, indent=2))

Here's the output of the function schema:

{
  "type": "function",
  "function": {
    "name": "handoff",
    "description": "Debate response and transition to the next agent.",
    "parameters": {
      "type": "object",
      "properties": {
        "response": {
          "type": "string",
          "description": "The debate response based on the previous debate history (1-2 concise sentences). Start response with the agent's name (e.g. \"Proponent: <response>\")."
        },
        "next_agent_name": {
          "type": "string",
          "description": "The next agent name to transition to. Always transition to a different agent.",
          "enum": [
            "proponent",
            "opponent",
            "neutral"
          ]
        }
      },
      "required": [
        "response",
        "next_agent_name"
      ]
    }
  }
}

The majority of the code is the same as the previous example. The only difference is that we use tool calling to transition to the next agent.

import random

# System prompts for the agents
PRO_AGENT_INSTRUCTIONS = """You are a "Proponent" agent debating with other agents about a proposition that you agree with: {proposition}.
Always call `handoff(response, next_agent_name)` function to debate and then transition to the next agent."""

CON_AGENT_INSTRUCTIONS = """You are an "Opponent" agent debating with other agents about a proposition that you disagree with: {proposition}.
Always call `handoff(response, next_agent_name)` function to debate and then transition to the next agent."""

NEUTRAL_AGENT_INSTRUCTIONS = """You are a "Neutral" agent debating with other agents about a proposition that you feel neutral about: {proposition}.
Always call `handoff(response, next_agent_name)` function to debate and then transition to the next agent."""


class Agent(AgentInterface):
    def __init__(self, name: str, instructions: str) -> None:
        super().__init__(name, instructions)

    def debate(self) -> str:
        response = litellm.completion(
            model=MODEL,
            max_tokens=MAX_TOKENS,
            messages=self.messages,
            tools=[function_to_schema(handoff)],
        )

        # State transition using tool calling
        tool_calls = response.choices[0].message.tool_calls
        if tool_calls:
            args = json.loads(tool_calls[0].function.arguments)
            print(
                f"\n[Tool call] response: {args['response'][:100]}..., next_agent_name: {args['next_agent_name']}\n"
            )
            content = args["response"]
            next_agent_name = args["next_agent_name"]
        else:
            print("\n[No tool calling... Randomly transition to a different agent]\n")
            content = response.choices[0].message.content
            next_agent_name = random.choice(
                [agent for agent in self.context.agents_registry if agent != self]
            )

        print(f"{content}")
        print("-" * 100)

        # Update the messages history and transition to the next agent
        self.context.messages.append({"role": "assistant", "content": f"{content}"})
        self.context.curr_agent = self.context.agents_registry[next_agent_name]

        return content


if __name__ == "__main__":
    proposition = (
        "Artificial intelligence should be allowed to make moral decisions in"
        "situations where humans fail to agree."
    )
    agents_registry = {
        AgentName.PROPONENT.value: Agent(
            name="Proponent",
            instructions=PRO_AGENT_INSTRUCTIONS.format(proposition=proposition),
        ),
        AgentName.OPPONENT.value: Agent(
            name="Opponent",
            instructions=CON_AGENT_INSTRUCTIONS.format(proposition=proposition),
        ),
        AgentName.NEUTRAL.value: Agent(
            name="Neutral",
            instructions=NEUTRAL_AGENT_INSTRUCTIONS.format(proposition=proposition),
        ),
    }

    run_debate(agents_registry, proposition, max_turns=10)

State Transition with Structured Output

Since we're not actually calling any tools, we can use structured output to transition to the next agent. We define a Pydantic model DebateResponse that will be used to parse the response from the LLM. This feature is available for certain models like GPT-4o and GPT-4o-mini so please check the model config before using it.

On a high level, structured output is implemented using a technique called constrained decoding or sampling. I won't go into the details here but if you're interested in learning more how text is generated from LLM, check out Chip Huyen's blog post on LLM decoding.

from pydantic import BaseModel, Field

# System prompts for the agents (Same as the 1st example)
PRO_AGENT_INSTRUCTIONS = (
    "You are an agent debating with other agents about a proposition that you agree with: {proposition}."
    "Start your response with 'Proponent:'. Limit your response to 1-2 sentences mimicking a real person."
    "After you respond, you can transition to the next agent by saying either 'Transition to opponent' or 'Transition to neutral'."
)

CON_AGENT_INSTRUCTIONS = (
    "You are an agent debating with other agents about a proposition that you disagree with: {proposition}."
    "Start your response with 'Opponent:'. Limit your response to 1-2 sentences mimicking a real person."
    "After you respond, you can transition to the next agent by saying either 'Transition to proponent' or 'Transition to neutral'."
)

NEUTRAL_AGENT_INSTRUCTIONS = (
    "You are an agent debating with other agents about a proposition that you feel neutral about: {proposition}."
    "Start your response with 'Neutral:'. Limit your response to 1-2 sentences mimicking a real person."
    "After you respond, you can transition to the next agent by saying either 'Transition to proponent' or 'Transition to opponent'."
)


class DebateResponse(BaseModel):
    response: str = Field(
        description="The debate response based on the previous debate history."
    )
    next_agent_name: AgentName = Field(
        description="The next agent name to transition to. Always transition to a different agent."
    )


class Agent(AgentInterface):
    def __init__(self, name: str, instructions: str) -> None:
        super().__init__(name, instructions)

    def debate(self) -> str:
        response = litellm.completion(
            model=MODEL,
            max_tokens=MAX_TOKENS,
            messages=self.messages,
            response_format=DebateResponse,
        )

        # State transition using structured output
        parsed_response = DebateResponse.model_validate_json(
            response.choices[0].message.content
        )
        content = parsed_response.response
        next_agent_name = parsed_response.next_agent_name.value

        print(f"{content}")
        print("-" * 100)

        # Update the messages history and transition to the next agent
        self.context.messages.append({"role": "assistant", "content": f"{content}"})
        self.context.curr_agent = self.context.agents_registry[next_agent_name]

        return content


if __name__ == "__main__":
    proposition = (
        "Artificial intelligence should be allowed to make moral decisions in"
        "situations where humans fail to agree."
    )
    agents_registry = {
        AgentName.PROPONENT.value: Agent(
            name="Proponent",
            instructions=PRO_AGENT_INSTRUCTIONS.format(proposition=proposition),
        ),
        AgentName.OPPONENT.value: Agent(
            name="Opponent",
            instructions=CON_AGENT_INSTRUCTIONS.format(proposition=proposition),
        ),
        AgentName.NEUTRAL.value: Agent(
            name="Neutral",
            instructions=NEUTRAL_AGENT_INSTRUCTIONS.format(proposition=proposition),
        ),
    }

    run_debate(agents_registry, proposition, max_turns=10)

Conclusion

In this notebook, we implemented a multi-agent debate system from scratch using only LLM API calls and the state pattern from software engineering. We explored three different ways to implement the state transition: string matching, tool calling, and structured output.

In general, I find that structured output is the most elegant and robust way to implement the state transition. String matching will be the most flexible way without potentially sacrificing the model's reasoning ability. While function/tool calling is sensitive to how the prompt is written, sometimes the model does not call the handoff() function even when it is instructed to do so.

Acknowledgements

Special thanks to Anthony Susevski and Shrey Grover for their feedback :)