In this notebook, we'll implement a multi-agent debate system from scratch using only LLM API calls and the state pattern from software engineering. Our system orchestrates a debate between proponent, opponent, and neutral agents, with self-directed transitions between once they've responded.
No orchestration frameworks like LangGraph or AutoGen, just API calls and software engineering.
Here's a quick demo:

Table of Contents
Motivation
AI agents have reshaped our feeds in 2025. In the latest Forbes 2025 AI 50 List, Sequoia Capital observed that AI assistants are increasingly moving from chatbots to workflow completion. The trend is clear: in the coming years, these systems will gain more agency, enabling us to delegate more "long-horizon tasks" to them. A research conducted by METR shows that AI’s ability to complete extended tasks doubles every seven months, with current capabilities around one hour.
A suite of agent orchestration frameworks has emerged, including LangGraph by LangChain, OpenAI Agents SDK (formerly Swarm), CrewAI etc. These tools provide abstractions and pre-built workflows that streamline rapid prototyping and production deployment. Yet despite their convenience, I believe it's possible to build most production-grade agentic systems using vanilla LLM API calls and sound software engineering principles—a similar opinion can also be found in Anthropic's blog, Building Effective Agents.
This isn't to say conventional frameworks aren't useful. They excel at rapid development and give a sense of what's possible. My goal in writing this notebook is merely to show people, including myself, how to implement these systems from scratch.

What Is an Agent?
An agent is characterized by its ability to act and reason. It acts on its environment and reasons based on its observations and prior knowledge. An LLM agent leverages a large language model to both reason and act. ChatGPT is a basic agent: depending on the user query, it chooses to generate images, search the internet, or retrieve relevant facts from its memory. Beyond this, techniques like function calling (aka tool calling) and Claude’s MCP have made it possible for LLMs to interact with external environments—whether the internet, databases, or third-party APIs.
Chip Huyen summarized this well in her blog about agents. Current agents generally perform three types of actions:
- Knowledge augmentation (e.g., web search, vector retrieval, structured queries using text-to-SQL or Cypher)
- Capability extension (e.g., SQL execution, code interpretation)
- Write actions (e.g., generating artifacts like tables, charts, or code)
What Are Multi-Agent Systems?
It's important to differentiate between workflows and autonomous multi-agent systems. Many Y Combinator–backed companies like Harvey favour workflows for control, where steps are predefined by developers. Autonomous multi-agent systems, by contrast, resemble conversations in which agents play roles, perform actions, and hand off information to one another. There is also a hybrid approach in which high-level steps are predefined by humans with autonomous agents embedded within each step.
In our multi-agent debate example, each LLM agent both debates with other agents (reasoning) and transitions to the next speaker (action). In particular, we explore three different ways to implement the state transition (i.e. how the agent decides which agent to transition to next).
You could extend these agents to search the internet for facts supporting their viewpoints, learn advanced debating techniques, or use coding tools to generate on-the-fly charts and tables. The possibilities are endless. Ultimately, perhaps you will realize that you can build all these capabilities using vanilla LLM API calls and software engineering skills.
Let's dive into the implementation.
Implementation
State Transition with String Matching (Regex)
We first define the agent names and system prompt template for each agent here. The system prompts will dictate the behaviour of different agents.
import re
from abc import ABC, abstractmethod
from enum import Enum
import litellm
from dotenv import load_dotenv
load_dotenv()
MODEL = "openai/gpt-4o-mini"
MAX_TOKENS = 500
class AgentName(Enum):
PROPONENT = "proponent"
OPPONENT = "opponent"
NEUTRAL = "neutral"
# System prompts for the agents
PRO_AGENT_INSTRUCTIONS = (
"You are an agent debating with other agents about a proposition that you agree with: {proposition}."
"Start your response with 'Proponent:'. Limit your response to 1-2 sentences mimicking a real person."
"After you respond, you can transition to the next agent by saying either 'Transition to opponent' or 'Transition to neutral'."
)
CON_AGENT_INSTRUCTIONS = (
"You are an agent debating with other agents about a proposition that you disagree with: {proposition}."
"Start your response with 'Opponent:'. Limit your response to 1-2 sentences mimicking a real person."
"After you respond, you can transition to the next agent by saying either 'Transition to proponent' or 'Transition to neutral'."
)
NEUTRAL_AGENT_INSTRUCTIONS = (
"You are an agent debating with other agents about a proposition that you feel neutral about: {proposition}."
"Start your response with 'Neutral:'. Limit your response to 1-2 sentences mimicking a real person."
"After you respond, you can transition to the next agent by saying either 'Transition to proponent' or 'Transition to opponent'."
)
DebateContext
class will take in the proposition, the current agent, and the agents registry. It will also keep track of the messages history and the current agent. Messages history will provide the "short-term memory" for the agents.
class DebateContext:
def __init__(
self,
proposition: str,
curr_agent: AgentName,
agents_registry: dict[AgentName, any],
) -> None:
self.proposition = proposition
self.agents_registry = agents_registry
for agent in self.agents_registry.values():
agent.context = self
self.curr_agent = self.agents_registry[curr_agent.value]
self.messages = []
def run(self):
self.curr_agent.debate()
We define the AgentInterface
class that all agents will inherit from. It will have a debate
method that will be implemented by each agent. The messages
property will return the agent-specific system prompt plus the messages from the previous debates. The context
property will return the current debate context. Notice that each agent and the DebateContext
will have a reference to each other. This bidirectional reference is important for the state transition (see line self.context.curr_agent = self.context.agents_registry[next_agent_name]
in the Agent
class).
class AgentInterface(ABC):
def __init__(self, name: str, instructions: str) -> None:
super().__init__()
self.name = name
self.instructions = instructions
self._context = None
@property
def messages(self) -> list[dict]:
"""
The messages history is the system prompt plus the messages from the previous debates.
The system prompt defines the agent's role and its proposition.
"""
return [
{"role": "system", "content": self.instructions}
] + self.context.messages
@property
def context(self) -> DebateContext:
return self._context
@context.setter
def context(self, context: DebateContext) -> None:
self._context = context
@abstractmethod
def debate(self) -> str:
pass
class Agent(AgentInterface):
def __init__(self, name: str, instructions: str) -> None:
super().__init__(name, instructions)
def debate(self) -> str:
response = litellm.completion(
model=MODEL,
max_tokens=MAX_TOKENS,
messages=self.messages,
)
content = response.choices[0].message.content
print(f"{content}")
print("-" * 100)
# State transition using string matching (There is a better way to do this using tool calling)
match = re.search(
r"transition to (proponent|opponent|neutral)", content, re.IGNORECASE
)
if match:
next_agent_name = match.group(1).lower()
else:
raise ValueError(f"Invalid transition: {content}")
# Update the messages history to agents a "short-term memory"
self.context.messages.append({"role": "assistant", "content": f"{content}"})
self.context.curr_agent = self.context.agents_registry[next_agent_name]
return content
def run_debate(
agents_registry: dict[AgentName, Agent],
proposition: str,
max_turns: int = 10,
) -> None:
context = DebateContext(
proposition, curr_agent=AgentName.PROPONENT, agents_registry=agents_registry
)
print(f"\nStarting debate on proposition: {proposition}\n")
print("=" * 100)
while len(context.messages) < max_turns:
context.run()
if __name__ == "__main__":
proposition = (
"Artificial intelligence should be allowed to make moral decisions in"
"situations where humans fail to agree."
)
agents_registry = {
AgentName.PROPONENT.value: Agent(
name="Proponent",
instructions=PRO_AGENT_INSTRUCTIONS.format(proposition=proposition),
),
AgentName.OPPONENT.value: Agent(
name="Opponent",
instructions=CON_AGENT_INSTRUCTIONS.format(proposition=proposition),
),
AgentName.NEUTRAL.value: Agent(
name="Neutral",
instructions=NEUTRAL_AGENT_INSTRUCTIONS.format(proposition=proposition),
),
}
run_debate(agents_registry, proposition, max_turns=10)
State Transition with Tool Calling
String matching is just one way to implement the state transition. We can also use function calling (aka tool calling) to transition to the next agent. Hugging Face has a great documentation on function calling I would recommend checking it out if you need more prior knowledge. Essentially, function calling is a way to enable LLMs to interact with your application code and external environments by calling from a list of predefined functions. The way we currently define these functions is by using json schema which is tedious to write.

Therefore, we define a helper function function_to_schema
that converts a function to an OpenAI function schema based on the function's Google style docstring.
The implementation is partially inspired by OpenAI's Cookbook.
import inspect
import json
from typing import Literal
def parse_google_docstring(docstring: str) -> dict[str, str]:
if not docstring:
return {}
lines = [line.strip() for line in docstring.split("\n")]
args_section = False
param_descriptions = {}
current_param = None
current_desc = []
for line in lines:
if line.lower().startswith("args:"):
args_section = True
continue
if args_section:
param_match = re.match(r"^\s*(\w+):\s*(.*)", line)
if param_match:
if current_param:
param_descriptions[current_param] = " ".join(current_desc).strip()
current_param = param_match.group(1)
current_desc = [param_match.group(2).strip()]
elif current_param and line.strip():
current_desc.append(line.strip())
if current_param:
param_descriptions[current_param] = " ".join(current_desc).strip()
return param_descriptions
def function_to_schema(func) -> dict:
type_map = {
str: "string",
int: "integer",
float: "number",
bool: "boolean",
list: "array",
dict: "object",
type(None): "null",
Literal: "string",
}
try:
signature = inspect.signature(func)
except ValueError as e:
raise ValueError(
f"Failed to get signature for function {func.__name__}: {str(e)}"
)
param_descriptions = parse_google_docstring(func.__doc__)
parameters = {}
for param in signature.parameters.values():
try:
param_type = type_map.get(param.annotation, "string")
except KeyError as e:
raise KeyError(
f"Unknown type annotation {param.annotation} for parameter {param.name}: {str(e)}"
)
param_dict = {
"type": param_type,
"description": param_descriptions.get(param.name, ""),
}
# Add enum field for Literal types
if (
hasattr(param.annotation, "__origin__")
and param.annotation.__origin__ == Literal
):
param_dict["enum"] = list(param.annotation.__args__)
# Add enum field for Enum types - check for Enum inheritance
elif hasattr(param.annotation, "__members__") and (
hasattr(param.annotation, "__enum__") or issubclass(param.annotation, Enum)
if isinstance(param.annotation, type)
else False
):
param_dict["type"] = "string"
param_dict["enum"] = [
member.value for member in param.annotation.__members__.values()
]
parameters[param.name] = param_dict
required = [
param.name
for param in signature.parameters.values()
if param.default == inspect._empty
]
func_description = func.__doc__.split("\n\n")[0].strip() if func.__doc__ else ""
return {
"type": "function",
"function": {
"name": func.__name__,
"description": func_description,
"parameters": {
"type": "object",
"properties": parameters,
"required": required,
},
},
}
def handoff(response: str, next_agent_name: AgentName) -> None:
"""
Debate response and transition to the next agent.
Args:
response: The debate response based on the previous debate history (1-2 concise sentences).
Start response with the agent's name (e.g. "Proponent: <response>").
next_agent_name: The next agent name to transition to. Always transition to a different agent.
Returns:
Return nothing as this function is used for guiding the LLM to transition to the
next agent only. We will not use the return value.
"""
pass
schema = function_to_schema(handoff)
print(json.dumps(schema, indent=2))
Here's the output of the function schema:
{
"type": "function",
"function": {
"name": "handoff",
"description": "Debate response and transition to the next agent.",
"parameters": {
"type": "object",
"properties": {
"response": {
"type": "string",
"description": "The debate response based on the previous debate history (1-2 concise sentences). Start response with the agent's name (e.g. \"Proponent: <response>\")."
},
"next_agent_name": {
"type": "string",
"description": "The next agent name to transition to. Always transition to a different agent.",
"enum": [
"proponent",
"opponent",
"neutral"
]
}
},
"required": [
"response",
"next_agent_name"
]
}
}
}
The majority of the code is the same as the previous example. The only difference is that we use tool calling to transition to the next agent.
import random
# System prompts for the agents
PRO_AGENT_INSTRUCTIONS = """You are a "Proponent" agent debating with other agents about a proposition that you agree with: {proposition}.
Always call `handoff(response, next_agent_name)` function to debate and then transition to the next agent."""
CON_AGENT_INSTRUCTIONS = """You are an "Opponent" agent debating with other agents about a proposition that you disagree with: {proposition}.
Always call `handoff(response, next_agent_name)` function to debate and then transition to the next agent."""
NEUTRAL_AGENT_INSTRUCTIONS = """You are a "Neutral" agent debating with other agents about a proposition that you feel neutral about: {proposition}.
Always call `handoff(response, next_agent_name)` function to debate and then transition to the next agent."""
class Agent(AgentInterface):
def __init__(self, name: str, instructions: str) -> None:
super().__init__(name, instructions)
def debate(self) -> str:
response = litellm.completion(
model=MODEL,
max_tokens=MAX_TOKENS,
messages=self.messages,
tools=[function_to_schema(handoff)],
)
# State transition using tool calling
tool_calls = response.choices[0].message.tool_calls
if tool_calls:
args = json.loads(tool_calls[0].function.arguments)
print(
f"\n[Tool call] response: {args['response'][:100]}..., next_agent_name: {args['next_agent_name']}\n"
)
content = args["response"]
next_agent_name = args["next_agent_name"]
else:
print("\n[No tool calling... Randomly transition to a different agent]\n")
content = response.choices[0].message.content
next_agent_name = random.choice(
[agent for agent in self.context.agents_registry if agent != self]
)
print(f"{content}")
print("-" * 100)
# Update the messages history and transition to the next agent
self.context.messages.append({"role": "assistant", "content": f"{content}"})
self.context.curr_agent = self.context.agents_registry[next_agent_name]
return content
if __name__ == "__main__":
proposition = (
"Artificial intelligence should be allowed to make moral decisions in"
"situations where humans fail to agree."
)
agents_registry = {
AgentName.PROPONENT.value: Agent(
name="Proponent",
instructions=PRO_AGENT_INSTRUCTIONS.format(proposition=proposition),
),
AgentName.OPPONENT.value: Agent(
name="Opponent",
instructions=CON_AGENT_INSTRUCTIONS.format(proposition=proposition),
),
AgentName.NEUTRAL.value: Agent(
name="Neutral",
instructions=NEUTRAL_AGENT_INSTRUCTIONS.format(proposition=proposition),
),
}
run_debate(agents_registry, proposition, max_turns=10)
State Transition with Structured Output
Since we're not actually calling any tools, we can use structured output to transition to the next agent. We define a Pydantic model DebateResponse
that will be used to parse the response from the LLM. This feature is available for certain models like GPT-4o and GPT-4o-mini so please check the model config before using it.
On a high level, structured output is implemented using a technique called constrained decoding or sampling. I won't go into the details here but if you're interested in learning more how text is generated from LLM, check out Chip Huyen's blog post on LLM decoding.
from pydantic import BaseModel, Field
# System prompts for the agents (Same as the 1st example)
PRO_AGENT_INSTRUCTIONS = (
"You are an agent debating with other agents about a proposition that you agree with: {proposition}."
"Start your response with 'Proponent:'. Limit your response to 1-2 sentences mimicking a real person."
"After you respond, you can transition to the next agent by saying either 'Transition to opponent' or 'Transition to neutral'."
)
CON_AGENT_INSTRUCTIONS = (
"You are an agent debating with other agents about a proposition that you disagree with: {proposition}."
"Start your response with 'Opponent:'. Limit your response to 1-2 sentences mimicking a real person."
"After you respond, you can transition to the next agent by saying either 'Transition to proponent' or 'Transition to neutral'."
)
NEUTRAL_AGENT_INSTRUCTIONS = (
"You are an agent debating with other agents about a proposition that you feel neutral about: {proposition}."
"Start your response with 'Neutral:'. Limit your response to 1-2 sentences mimicking a real person."
"After you respond, you can transition to the next agent by saying either 'Transition to proponent' or 'Transition to opponent'."
)
class DebateResponse(BaseModel):
response: str = Field(
description="The debate response based on the previous debate history."
)
next_agent_name: AgentName = Field(
description="The next agent name to transition to. Always transition to a different agent."
)
class Agent(AgentInterface):
def __init__(self, name: str, instructions: str) -> None:
super().__init__(name, instructions)
def debate(self) -> str:
response = litellm.completion(
model=MODEL,
max_tokens=MAX_TOKENS,
messages=self.messages,
response_format=DebateResponse,
)
# State transition using structured output
parsed_response = DebateResponse.model_validate_json(
response.choices[0].message.content
)
content = parsed_response.response
next_agent_name = parsed_response.next_agent_name.value
print(f"{content}")
print("-" * 100)
# Update the messages history and transition to the next agent
self.context.messages.append({"role": "assistant", "content": f"{content}"})
self.context.curr_agent = self.context.agents_registry[next_agent_name]
return content
if __name__ == "__main__":
proposition = (
"Artificial intelligence should be allowed to make moral decisions in"
"situations where humans fail to agree."
)
agents_registry = {
AgentName.PROPONENT.value: Agent(
name="Proponent",
instructions=PRO_AGENT_INSTRUCTIONS.format(proposition=proposition),
),
AgentName.OPPONENT.value: Agent(
name="Opponent",
instructions=CON_AGENT_INSTRUCTIONS.format(proposition=proposition),
),
AgentName.NEUTRAL.value: Agent(
name="Neutral",
instructions=NEUTRAL_AGENT_INSTRUCTIONS.format(proposition=proposition),
),
}
run_debate(agents_registry, proposition, max_turns=10)
Conclusion
In this notebook, we implemented a multi-agent debate system from scratch using only LLM API calls and the state pattern from software engineering. We explored three different ways to implement the state transition: string matching, tool calling, and structured output.
In general, I find that structured output is the most elegant and robust way to implement the state transition. String matching will be the most flexible way without potentially sacrificing the model's reasoning ability. While function/tool calling is sensitive to how the prompt is written, sometimes the model does not call the handoff()
function even when it is instructed to do so.
Acknowledgements
Special thanks to Anthony Susevski and Shrey Grover for their feedback :)