Designing Multi-Agent Architecture - From Solo to Ensemble

A single agent can accomplish a lot, but complex real-world tasks often exceed what any one specialist can handle. Just as organizations divide work among departments, multi-agent systems distribute responsibilities across specialized agents that collaborate toward shared goals. In this post, I’ll explore how to design architectures where multiple AI agents work together effectively.

Why Multiple Agents?

Consider a restaurant. One person doesn’t do everything - hosts greet customers, waiters take orders, chefs cook, and managers oversee operations. Each role has clear responsibilities and communicates with others to deliver a complete experience.

Multi-agent systems work the same way:

  • Specialization: Each agent focuses on what it does best
  • Scalability: Add more agents as complexity grows
  • Modularity: Change one agent without rewriting the whole system
  • Parallel processing: Multiple agents can work simultaneously
flowchart TD
    U[Customer Request] --> O[Orchestrator]
    O --> A1[Policy Agent]
    O --> A2[Inventory Agent]
    O --> A3[Refund Agent]
    A1 --> O
    A2 --> O
    A3 --> O
    O --> R[Response]

    style O fill:#fff3e0
    style A1 fill:#e3f2fd
    style A2 fill:#e8f5e9
    style A3 fill:#fce4ec

Two Primary Architecture Patterns

The Orchestrator Pattern (Hub-and-Spoke)

A central coordinator directs specialized workers, similar to a project manager delegating tasks:

flowchart TD
    T[Task] --> O[Orchestrator]
    O --> W1[Worker 1]
    O --> W2[Worker 2]
    O --> W3[Worker 3]
    W1 --> O
    W2 --> O
    W3 --> O
    O --> R[Result]

    style O fill:#fff3e0

Characteristics:

  • All communication flows through the central orchestrator
  • Orchestrator decides task breakdown and delegation
  • Workers report back to orchestrator
  • Clear, predictable workflow

Best for:

  • Order fulfillment systems
  • Support ticket routing
  • Complex report generation
  • Structured multi-step workflows

Advantages:

  • Easy to debug (clear flow)
  • Enforces correct execution order
  • Manages state of entire process
  • Clear mapping of business functions

The Peer-to-Peer Pattern

Agents communicate directly with each other without central coordination:

flowchart LR
    A1[Agent 1] <--> A2[Agent 2]
    A2 <--> A3[Agent 3]
    A1 <--> A3
    A3 <--> A4[Agent 4]

    style A1 fill:#e3f2fd
    style A2 fill:#e8f5e9
    style A3 fill:#fce4ec
    style A4 fill:#fff3e0

Characteristics:

  • Agents broadcast requests or directly query peers
  • No single point of control
  • More flexible but less predictable

Best for:

  • Investigative tasks where sequence is unknown
  • Complex diagnosis scenarios
  • Avoiding bottlenecks at a central coordinator

Challenges:

  • Managing overall communication flow
  • Ensuring tasks don’t get dropped
  • Tracking overall state as agents increase

Designing Your Architecture

Before writing code, sketch your design. Even simple boxes and arrows force you to answer critical questions:

  • Who talks to whom?
  • What information does each agent need?
  • How does the overall job get done?

Defining Agent Roles

Each agent needs a clear, bounded responsibility:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Bad: Vague role
research_agent = "You help with research tasks"

# Good: Specific role with boundaries
research_agent = """
You are a Research Specialist. Your responsibilities:
- Search for factual information on requested topics
- Summarize findings with source citations
- Flag uncertainty when information is incomplete

You do NOT:
- Make recommendations or decisions
- Access external APIs or databases
- Perform calculations or analysis
"""

Communication Protocols

Define how agents exchange information:

1
2
3
4
5
6
7
8
9
10
11
12
from dataclasses import dataclass
from typing import Optional, Dict, Any

@dataclass
class AgentMessage:
"""Standard message format between agents"""
sender: str
receiver: str
task: str
payload: Dict[str, Any]
priority: int = 1
requires_response: bool = True

Implementing an Orchestrator Pattern

Here’s a practical implementation using Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
from openai import OpenAI
from typing import List, Any
import json

client = OpenAI()

class BaseAgent:
"""Base class for specialized worker agents"""

def __init__(self, name: str, system_prompt: str, tools: List = None):
self.name = name
self.system_prompt = system_prompt
self.tools = tools or []

def run(self, task: str, context: dict = None) -> str:
messages = [
{"role": "system", "content": self.system_prompt},
{"role": "user", "content": self._build_prompt(task, context)}
]

response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0.7
)

return response.choices[0].message.content

def _build_prompt(self, task: str, context: dict) -> str:
if context:
ctx = "\n".join(f"{k}: {v}" for k, v in context.items())
return f"Context:\n{ctx}\n\nTask: {task}"
return task


class Orchestrator:
"""Central coordinator that manages worker agents"""

def __init__(self):
self.policy_agent = BaseAgent(
name="PolicyChecker",
system_prompt="""You check if requests comply with company policies.
Return JSON: {"compliant": true/false, "reason": "explanation"}"""
)

self.inventory_agent = BaseAgent(
name="InventoryManager",
system_prompt="""You check product inventory levels.
Return JSON: {"in_stock": true/false, "quantity": number}"""
)

self.fulfillment_agent = BaseAgent(
name="FulfillmentProcessor",
system_prompt="""You process order fulfillment.
Return confirmation with delivery estimate."""
)

def handle_order(self, order: dict) -> str:
# Step 1: Check policy compliance
policy_result = self.policy_agent.run(
f"Check if this order is allowed: {order}",
context={"order_type": order.get("type")}
)

policy_data = json.loads(policy_result)
if not policy_data.get("compliant"):
return f"Order rejected: {policy_data.get('reason')}"

# Step 2: Check inventory
inventory_result = self.inventory_agent.run(
f"Check stock for: {order.get('items')}"
)

inventory_data = json.loads(inventory_result)
if not inventory_data.get("in_stock"):
return "Order cannot be fulfilled: Items out of stock"

# Step 3: Process fulfillment
fulfillment_result = self.fulfillment_agent.run(
f"Process order: {order}",
context={
"policy_approved": True,
"inventory_confirmed": True
}
)

return fulfillment_result

Orchestration Patterns

Sequential Execution

Tasks execute one after another, each building on previous results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def sequential_workflow(self, request: str) -> str:
# Step 1 must complete before Step 2
research = self.research_agent.run(request)

# Step 2 uses output from Step 1
analysis = self.analysis_agent.run(
"Analyze these findings",
context={"research": research}
)

# Step 3 uses output from Step 2
report = self.report_agent.run(
"Generate report",
context={"analysis": analysis}
)

return report

Parallel Execution

Independent tasks run simultaneously:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import concurrent.futures

def parallel_workflow(self, company: str) -> dict:
with concurrent.futures.ThreadPoolExecutor() as executor:
# Launch all tasks simultaneously
news_future = executor.submit(
self.news_agent.run, f"Get news for {company}"
)
financials_future = executor.submit(
self.financials_agent.run, f"Get financials for {company}"
)
competitors_future = executor.submit(
self.competitor_agent.run, f"Analyze competitors of {company}"
)

# Collect all results
return {
"news": news_future.result(),
"financials": financials_future.result(),
"competitors": competitors_future.result()
}

Conditional Branching

Workflow branches based on intermediate results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def conditional_workflow(self, request: dict) -> str:
# First, classify the request
classification = self.classifier_agent.run(
f"Classify this request: {request}"
)

# Branch based on classification
if "refund" in classification.lower():
return self.refund_agent.run(str(request))
elif "inquiry" in classification.lower():
return self.inquiry_agent.run(str(request))
elif "complaint" in classification.lower():
return self.complaint_agent.run(str(request))
else:
return self.general_agent.run(str(request))

Building Effective Tools for Agents

Agents need well-defined tools to interact with external systems:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
from langchain_core.tools import tool

@tool
def check_inventory(product_id: str) -> dict:
"""
Check current inventory stock level for a product.

Args:
product_id: The unique product identifier or SKU

Returns:
Stock level and availability status
"""
try:
# Query inventory system
stock = inventory_db.get_stock(product_id)
return {
"product_id": product_id,
"quantity": stock,
"available": stock > 0
}
except Exception as e:
return {"error": f"Failed to check inventory: {str(e)}"}


@tool
def process_refund(order_id: str, amount: float, reason: str) -> dict:
"""
Process a refund for an order.

Args:
order_id: The order to refund
amount: Refund amount in dollars
reason: Reason for the refund

Returns:
Refund confirmation with transaction ID
"""
try:
result = payment_system.refund(order_id, amount, reason)
return {
"success": True,
"transaction_id": result.id,
"amount_refunded": amount
}
except Exception as e:
return {"success": False, "error": str(e)}

Tool Design Principles:

  1. Clear names: check_inventory not ci
  2. Detailed docstrings: LLMs rely on descriptions to choose tools
  3. Type annotations: Required for proper schema generation
  4. Error handling: Return clear error messages, don’t crash

Design Best Practices

1. Separation of Concerns

Each agent has one clear job:

1
2
3
4
5
6
7
8
9
10
11
12
# Good: Focused responsibilities
class DataValidationAgent:
"""Validates incoming data against schemas"""
pass

class DataTransformationAgent:
"""Transforms data between formats"""
pass

class DataStorageAgent:
"""Persists data to appropriate storage"""
pass

2. Clear Interfaces

Define what each agent expects and produces:

1
2
3
4
5
6
7
8
9
10
class AgentInterface:
"""
Input: Raw customer message (str)
Output: {
"intent": "refund|inquiry|complaint",
"confidence": float,
"entities": {"order_id": str, "product": str}
}
"""
pass

3. Graceful Degradation

Plan for failures at every step:

1
2
3
4
5
6
7
8
9
10
11
12
def robust_workflow(self, request: str) -> str:
try:
result = self.primary_agent.run(request)
except Exception as e:
# Try backup agent
try:
result = self.backup_agent.run(request)
except Exception:
# Escalate to human
return self.escalate_to_human(request, error=str(e))

return result

4. Observable Execution

Log everything for debugging:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import logging

logger = logging.getLogger("multi_agent")

def logged_agent_call(self, agent: BaseAgent, task: str) -> str:
logger.info(f"[{agent.name}] Starting task: {task[:100]}...")

try:
result = agent.run(task)
logger.info(f"[{agent.name}] Completed successfully")
logger.debug(f"[{agent.name}] Result: {result[:200]}...")
return result
except Exception as e:
logger.error(f"[{agent.name}] Failed: {str(e)}")
raise

Key Takeaways

  1. Choose the right pattern: Orchestrator for structured workflows, peer-to-peer for flexible exploration
  2. Define clear roles: Each agent should have bounded, non-overlapping responsibilities
  3. Design communication protocols: Standardize how agents exchange information
  4. Build robust tools: Well-documented, typed, error-handling tools enable agent capabilities
  5. Plan for failure: Every agent call can fail - design graceful degradation
  6. Start with diagrams: Sketch architecture before coding to catch design issues early

Multi-agent architecture transforms complex problems into manageable, specialized tasks. In the next post, I’ll explore how to route requests, manage data flow, and coordinate state across multiple agents.


This is Part 12 of my series on building intelligent AI systems. Next: multi-agent routing, data flow, and state coordination.

Agentic RAG and Agent Evaluation Strategies

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×