Routing and Parallelization Patterns for AI Agents

Sequential chains work well when tasks have a clear order, but real-world problems often require more flexibility. Sometimes you need to route tasks to different specialists based on their content. Other times, multiple agents should work simultaneously on different aspects of a problem. In this post, I’ll cover two powerful patterns: routing for intelligent task dispatch and parallelization for concurrent processing.

The Routing Pattern

Think of routing like a sophisticated mail sorting facility. Instead of one person handling every type of mail, specialized systems ensure each piece reaches the right destination quickly.

flowchart TD
    I[Input] --> R{Router}
    R -->|Type A| A1[Specialist A]
    R -->|Type B| A2[Specialist B]
    R -->|Type C| A3[Specialist C]
    A1 --> O[Output]
    A2 --> O
    A3 --> O

    style R fill:#fff3e0
    style A1 fill:#e3f2fd
    style A2 fill:#e8f5e9
    style A3 fill:#fce4ec

Why Route?

Routing offers four key benefits:

  1. Task Specialization: Agents optimized for specific tasks perform better than generalists
  2. Resource Optimization: Route simple tasks to faster/cheaper models, complex ones to powerful models
  3. Flexibility: Handle diverse request types through a single entry point
  4. Scalability: Add new specialists without restructuring the entire system

Two Stages of Routing

Every routing workflow has two core stages:

Stage 1: Classification

Analyze the incoming task to determine its type, category, or complexity.

Rule-based classification uses programmatic logic:

1
2
3
4
5
6
7
8
9
10
11
def classify_by_rules(message: str) -> str:
message_lower = message.lower()

if any(word in message_lower for word in ["refund", "return", "money back"]):
return "billing"
elif any(word in message_lower for word in ["broken", "not working", "bug"]):
return "technical"
elif any(word in message_lower for word in ["how to", "tutorial", "guide"]):
return "documentation"
else:
return "general"

LLM-based classification handles nuance better:

1
2
3
4
5
6
7
8
9
10
11
12
def classify_with_llm(message: str) -> str:
system_prompt = """
Classify the customer message into one category:
- billing: Payment, refunds, subscription issues
- technical: Bugs, errors, technical problems
- documentation: How-to questions, feature explanations
- general: Everything else

Respond with only the category name.
"""

return call_llm(system_prompt, message, temperature=0)

LLM classification excels when:

  • Categories have fuzzy boundaries
  • Context matters for classification
  • New categories emerge over time

Stage 2: Task Dispatch

Direct the classified input to the appropriate specialist:

1
2
3
4
5
6
7
8
9
10
def dispatch_to_agent(category: str, message: str) -> str:
agents = {
"billing": billing_agent,
"technical": technical_agent,
"documentation": docs_agent,
"general": general_agent
}

agent = agents.get(category, general_agent)
return agent(message)

Complete Routing Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
from openai import OpenAI

client = OpenAI()

def call_llm(system_prompt: str, user_prompt: str, temperature: float = 0.7) -> str:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
],
temperature=temperature
)
return response.choices[0].message.content


def router_agent(query: str) -> str:
"""Classify and route to appropriate specialist"""

# Stage 1: Classification
classification_prompt = """
You are a task router. Analyze the query and select the best agent:

- ResearchAgent: Factual questions, information gathering
- AnalysisAgent: Data interpretation, comparisons, evaluations
- CreativeAgent: Writing, brainstorming, content creation

Respond with only the agent name.
"""

agent_choice = call_llm(classification_prompt, query, temperature=0)

# Stage 2: Dispatch
if "Research" in agent_choice:
return research_agent(query)
elif "Analysis" in agent_choice:
return analysis_agent(query)
elif "Creative" in agent_choice:
return creative_agent(query)
else:
return general_agent(query)


def research_agent(query: str) -> str:
system_prompt = """
You are a research specialist. Provide factual, well-sourced
information. Be thorough but concise.
"""
return call_llm(system_prompt, query)


def analysis_agent(query: str) -> str:
system_prompt = """
You are a data analyst. Evaluate information critically,
identify patterns, and provide structured insights.
"""
return call_llm(system_prompt, query)


def creative_agent(query: str) -> str:
system_prompt = """
You are a creative specialist. Generate engaging, original
content with a fresh perspective.
"""
return call_llm(system_prompt, query, temperature=0.9)

Advanced: Routing with Orchestration

Sometimes the router needs to gather information from multiple sources before final dispatch:

flowchart TD
    Q[Query] --> R{Router}
    R -->|Pricing Query| P[Product Research]
    R -->|Pricing Query| C[Customer Analysis]
    P --> S[Pricing Strategist]
    C --> S
    S --> O[Response]

    style R fill:#fff3e0
    style P fill:#e3f2fd
    style C fill:#e8f5e9
    style S fill:#fce4ec

The router identifies that pricing queries need context from both product and customer data before the pricing specialist can respond.


The Parallelization Pattern

When subtasks don’t depend on each other, run them simultaneously. Parallelization follows a scatter-gather pattern: distribute work to multiple agents, then consolidate results.

flowchart TD
    I[Input] --> S[Scatter]
    S --> A1[Agent 1]
    S --> A2[Agent 2]
    S --> A3[Agent 3]
    A1 --> G[Gather]
    A2 --> G
    A3 --> G
    G --> O[Output]

    style S fill:#e3f2fd
    style G fill:#e8f5e9

The Golden Rule: Independence

Parallelization only works when subtasks are independent:

  • Agent A shouldn’t need to wait for Agent B’s output
  • If there’s a dependency, use sequential chaining instead

Task Decomposition Strategies

Three ways to split work for parallel processing:

1. Sectioning (Sharding)

Split large inputs into chunks processed simultaneously:

1
2
3
4
5
6
7
8
9
10
11
12
def parallel_summarize(long_document: str) -> str:
# Split into sections
sections = split_into_sections(long_document)

# Summarize each in parallel
summaries = parallel_execute([
lambda s=section: summarize_agent(s)
for section in sections
])

# Combine
return combine_summaries(summaries)

Use case: Summarizing a 100-page report by processing each chapter simultaneously.

2. Aspect-Based Decomposition

Different agents analyze different facets of the same input:

flowchart LR
    P[Product] --> T[Technical Specs]
    P --> S[Sentiment Analysis]
    P --> C[Competitive Pricing]
    T --> R[Combined Report]
    S --> R
    C --> R

    style T fill:#e3f2fd
    style S fill:#fff3e0
    style C fill:#e8f5e9

Use case: Product analysis where technical specs, customer sentiment, and pricing each require different expertise.

3. Diversity/Voting

Run the same task multiple times for reliability or creative variety:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def diverse_generation(prompt: str, num_variations: int = 3) -> list:
"""Generate multiple creative variations"""
return parallel_execute([
lambda: creative_agent(prompt, temperature=0.9)
for _ in range(num_variations)
])


def majority_vote(question: str, num_voters: int = 5) -> str:
"""Get consensus answer through voting"""
answers = parallel_execute([
lambda: classifier_agent(question)
for _ in range(num_voters)
])

# Return most common answer
return max(set(answers), key=answers.count)

Aggregation Strategies

After parallel execution, combine results:

Strategy Description Use Case
Concatenation Join outputs together Chapter summaries → document
Selection Pick best output Choose top creative option
Voting Majority wins Classification consensus
Synthesis LLM combines intelligently Multi-perspective report

Implementation with Threading

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
import threading
from typing import Dict, Any

# Shared storage for results
agent_outputs: Dict[str, Any] = {}


class PolicyAgent:
def run(self, query: str):
system_prompt = "You are a policy expert. Analyze regulatory implications."
agent_outputs["policy"] = call_llm(system_prompt, query)


class TechnologyAgent:
def run(self, query: str):
system_prompt = "You are a technology expert. Analyze technical feasibility."
agent_outputs["technology"] = call_llm(system_prompt, query)


class MarketAgent:
def run(self, query: str):
system_prompt = "You are a market analyst. Analyze market dynamics."
agent_outputs["market"] = call_llm(system_prompt, query)


class SynthesizerAgent:
def run(self, query: str, inputs: dict) -> str:
system_prompt = """
You are a senior analyst. Synthesize multiple perspectives
into a comprehensive, coherent report.
"""

combined_input = f"""
Original Query: {query}

Policy Analysis:
{inputs['policy']}

Technology Analysis:
{inputs['technology']}

Market Analysis:
{inputs['market']}

Provide an integrated analysis addressing all perspectives.
"""

return call_llm(system_prompt, combined_input)


def analyze_parallel(query: str) -> str:
"""Run parallel analysis and synthesize"""

# Create agents
policy = PolicyAgent()
tech = TechnologyAgent()
market = MarketAgent()
synthesizer = SynthesizerAgent()

# Create threads
threads = [
threading.Thread(target=policy.run, args=(query,)),
threading.Thread(target=tech.run, args=(query,)),
threading.Thread(target=market.run, args=(query,)),
]

# Start all threads
for t in threads:
t.start()

# Wait for completion
for t in threads:
t.join()

# Synthesize results
return synthesizer.run(query, agent_outputs)


# Usage
result = analyze_parallel("What are the implications of AI regulation in healthcare?")

Real-World Example: Contract Analysis

A 50-page enterprise contract needs review. Sequential review by one expert takes too long.

Parallel approach:

flowchart TD
    C[Contract] --> L[Legal Terms
Checker] C --> CO[Compliance
Validator] C --> F[Financial Risk
Assessor] L --> S[Summary
Agent] CO --> S F --> S S --> R[Executive
Report] style L fill:#e3f2fd style CO fill:#fff3e0 style F fill:#fce4ec style S fill:#e8f5e9

Three specialists work simultaneously:

  • Legal Terms Checker: Identifies problematic clauses
  • Compliance Validator: Checks regulatory requirements
  • Financial Risk Assessor: Evaluates financial exposure

A synthesizer combines their findings into a comprehensive executive summary.


Combining Routing and Parallelization

These patterns work well together. A router can trigger parallel workflows for complex queries:

1
2
3
4
5
6
7
8
9
10
11
12
def smart_router(query: str) -> str:
category = classify_query(query)

if category == "simple":
return general_agent(query)

elif category == "research":
return research_agent(query)

elif category == "complex_analysis":
# Route to parallel analysis workflow
return analyze_parallel(query)

Key Takeaways

Routing

  1. Classify then dispatch: Two-stage process for intelligent task direction
  2. LLM classification for nuance: Handles fuzzy boundaries better than rules
  3. Specialist agents perform better: Focused prompts outperform generalists
  4. Can orchestrate sub-workflows: Router may gather context before final dispatch

Parallelization

  1. Independence is required: Subtasks must not depend on each other
  2. Three decomposition strategies: Sectioning, aspect-based, diversity/voting
  3. Four aggregation strategies: Concatenation, selection, voting, synthesis
  4. Threading for true concurrency: Python threading enables simultaneous API calls

These patterns handle task variety (routing) and task complexity (parallelization). But what about tasks that require iterative improvement or dynamic planning? In the next post, I’ll explore evaluator-optimizer loops and orchestrator-worker patterns.


This is Part 6 of my series on building intelligent AI systems. Next: evaluator-optimizer and orchestrator-worker patterns.

Evaluator-Optimizer and Orchestrator-Worker Patterns Prompt Chaining Workflows - Sequential Task Decomposition

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×