Evaluator-Optimizer and Orchestrator-Worker Patterns

Some tasks require iteration - generating, evaluating, and refining until quality standards are met. Others need dynamic orchestration - a central coordinator breaking down novel problems and delegating to specialists. In this post, I’ll cover two sophisticated patterns that enable these capabilities: the evaluator-optimizer loop and the orchestrator-worker architecture.

The Evaluator-Optimizer Pattern

This pattern creates an iterative refinement loop where two agents collaborate: one generates content, the other critiques it, and the feedback drives improvement.

flowchart TD
    I[Input] --> O[Optimizer
Generate] O --> E{Evaluator
Assess} E -->|Passes| Done[Output] E -->|Fails| F[Feedback] F --> O style O fill:#e3f2fd style E fill:#fff3e0 style Done fill:#c8e6c9

Think of it like a writer-editor relationship: the writer drafts, the editor critiques, the writer revises based on feedback, and this continues until publication standards are met.

Key Components

Optimizer (Generator) Agent: Creates initial output and refines based on feedback. Uses moderate-to-high temperature for creativity.

Evaluator (Critic) Agent: Assesses output against criteria and provides actionable feedback. Uses low temperature for consistent evaluation.

Three Critical Elements

1. Clear Evaluation Criteria

Vague criteria produce vague feedback. Be specific:

1
2
3
4
5
6
7
8
9
10
11
12
# Bad: Vague criteria
criteria = "Make sure it's good and professional"

# Good: Specific, measurable criteria
criteria = """
Evaluate the report against these criteria:
1. Contains executive summary (max 200 words)
2. All claims supported by data
3. No speculative language ("might", "could potentially")
4. Includes risk assessment section
5. Professional tone throughout
"""

2. Actionable Feedback

Feedback should tell the optimizer what to fix and how:

1
2
3
4
5
6
7
8
9
10
# Bad: Non-actionable
feedback = "The writing needs improvement"

# Good: Actionable
feedback = """
Issues found:
1. Executive summary is 347 words (exceeds 200 limit) - condense key points
2. Paragraph 3 claims "significant growth" without supporting data - add specific metrics
3. Line 45 uses "might increase" - rephrase with certainty or remove
"""

3. Stopping Conditions

Without clear stopping conditions, loops run forever. Define when to stop:

  • Success: All criteria pass
  • Max iterations: Hard limit on attempts
  • Diminishing returns: Improvements become minimal
  • Timeout: Wall-clock time limit

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
from openai import OpenAI
from typing import Tuple

client = OpenAI()

def call_llm(system_prompt: str, user_prompt: str, temperature: float = 0.7) -> str:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
],
temperature=temperature
)
return response.choices[0].message.content


def optimizer_agent(task: str, feedback: str = None) -> str:
"""Generate or refine content based on feedback"""

system_prompt = """
You are a skilled content creator. Generate high-quality content
that meets professional standards. If feedback is provided,
carefully address each point in your revision.
"""

if feedback:
user_prompt = f"""
Task: {task}

Previous feedback to address:
{feedback}

Create an improved version addressing all feedback points.
"""
else:
user_prompt = f"Task: {task}"

return call_llm(system_prompt, user_prompt, temperature=0.7)


def evaluator_agent(content: str, criteria: str) -> Tuple[bool, str]:
"""Evaluate content and provide feedback"""

system_prompt = """
You are a strict quality evaluator. Assess content against
the provided criteria. Be thorough and specific in your feedback.

Respond in this format:
PASSED: true/false
FEEDBACK: [specific issues and how to fix them, or "All criteria met"]
"""

user_prompt = f"""
Criteria:
{criteria}

Content to evaluate:
{content}
"""

response = call_llm(system_prompt, user_prompt, temperature=0)

# Parse response
passed = "PASSED: true" in response.lower()
feedback_start = response.find("FEEDBACK:")
feedback = response[feedback_start + 9:].strip() if feedback_start != -1 else response

return passed, feedback


def evaluator_optimizer_loop(
task: str,
criteria: str,
max_iterations: int = 5
) -> str:
"""Run the evaluator-optimizer loop until success or max iterations"""

content = None
feedback = None

for iteration in range(max_iterations):
print(f"Iteration {iteration + 1}/{max_iterations}")

# Generate or refine
content = optimizer_agent(task, feedback)

# Evaluate
passed, feedback = evaluator_agent(content, criteria)

if passed:
print(f"Success on iteration {iteration + 1}")
return content

print(f"Feedback: {feedback[:100]}...")

print(f"Max iterations reached, returning best effort")
return content


# Usage
task = "Write a professional email announcing a product delay to customers"
criteria = """
1. Acknowledges the delay with specific new timeline
2. Apologizes sincerely without making excuses
3. Explains what steps are being taken
4. Offers compensation or goodwill gesture
5. Under 200 words
6. Professional but empathetic tone
"""

result = evaluator_optimizer_loop(task, criteria)

Example Flow

Iteration Action Result
1 Generate initial draft 280 words, missing compensation offer
2 Refine based on feedback 195 words, added compensation, tone too formal
3 Adjust tone Passes all criteria

Use Cases

  • Compliance checking: Financial reports meeting regulatory standards
  • Code generation: Produce code that passes tests
  • Content creation: Articles meeting editorial guidelines
  • Recipe generation: Meet nutritional constraints

The Orchestrator-Worker Pattern

While evaluator-optimizer handles iterative refinement, orchestrator-worker handles dynamic task decomposition. A central orchestrator analyzes complex problems, creates plans, delegates to specialists, and synthesizes results.

flowchart TD
    I[Complex Task] --> O[Orchestrator]
    O -->|Plan| P[Create Subtasks]
    P --> W1[Worker 1]
    P --> W2[Worker 2]
    P --> W3[Worker 3]
    W1 --> S[Synthesize]
    W2 --> S
    W3 --> S
    S --> O
    O --> R[Final Result]

    style O fill:#fff3e0
    style W1 fill:#e3f2fd
    style W2 fill:#e8f5e9
    style W3 fill:#fce4ec

Think of it like a project manager who receives a complex brief, breaks it into tasks, assigns specialists, collects their work, and assembles the final deliverable.

Orchestrator vs. Simple Parallelization

Aspect Parallelization Orchestrator
Task breakdown Predefined/static Dynamic at runtime
Adaptability Fixed workflow Adapts to novel problems
Intelligence Follows script Makes decisions
Use case Repeatable processes Varied/complex requests

Orchestrator Responsibilities

  1. Analyze: Understand the complex task
  2. Plan: Break into subtasks dynamically
  3. Delegate: Route subtasks to appropriate workers
  4. Synthesize: Combine worker outputs coherently

Worker Characteristics

  • Specialized expertise (research, analysis, writing, etc.)
  • Clear input/output contracts
  • No awareness of the larger task
  • Execute independently

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
from openai import OpenAI
import json
from typing import List, Dict

client = OpenAI()


def call_llm(system_prompt: str, user_prompt: str, temperature: float = 0.7) -> str:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
],
temperature=temperature
)
return response.choices[0].message.content


# Worker Agents
def research_worker(subtask: str) -> str:
system_prompt = "You are a research specialist. Gather and summarize relevant information."
return call_llm(system_prompt, subtask)


def analysis_worker(subtask: str) -> str:
system_prompt = "You are a data analyst. Analyze information and identify patterns."
return call_llm(system_prompt, subtask)


def writing_worker(subtask: str) -> str:
system_prompt = "You are a professional writer. Create clear, engaging content."
return call_llm(system_prompt, subtask)


# Worker registry
WORKERS = {
"research": research_worker,
"analysis": analysis_worker,
"writing": writing_worker
}


def orchestrator_agent(task: str) -> str:
"""Central orchestrator that plans, delegates, and synthesizes"""

# Step 1: Create plan
planning_prompt = f"""
Analyze this task and create a plan with subtasks.

Task: {task}

Available workers: research, analysis, writing

Respond with JSON:
{{
"subtasks": [
{{"worker": "research", "task": "specific subtask description"}},
{{"worker": "analysis", "task": "specific subtask description"}}
]
}}
"""

plan_response = call_llm(
"You are a project planner. Create efficient task breakdowns.",
planning_prompt,
temperature=0
)

# Parse plan
try:
# Extract JSON from response
json_start = plan_response.find('{')
json_end = plan_response.rfind('}') + 1
plan = json.loads(plan_response[json_start:json_end])
except json.JSONDecodeError:
# Fallback to simple research task
plan = {"subtasks": [{"worker": "research", "task": task}]}

# Step 2: Execute subtasks
results = []
for subtask in plan["subtasks"]:
worker_name = subtask["worker"]
worker_task = subtask["task"]

print(f"Delegating to {worker_name}: {worker_task[:50]}...")

worker = WORKERS.get(worker_name, research_worker)
result = worker(worker_task)
results.append({
"worker": worker_name,
"task": worker_task,
"result": result
})

# Step 3: Synthesize results
synthesis_prompt = f"""
Original task: {task}

Worker results:
{json.dumps(results, indent=2)}

Synthesize these results into a comprehensive, coherent response
that fully addresses the original task.
"""

final_result = call_llm(
"You are an expert synthesizer. Combine diverse inputs into coherent outputs.",
synthesis_prompt,
temperature=0.5
)

return final_result


# Usage
complex_task = """
Analyze the impact of remote work on software development teams.
Include research on productivity studies, analysis of collaboration
patterns, and recommendations for team leads.
"""

result = orchestrator_agent(complex_task)
print(result)

Dynamic Planning in Action

For the task “Analyze remote work impact on dev teams”:

1
2
3
4
5
6
7
8
9
Orchestrator creates plan:
1. Research worker → "Find recent studies on remote work productivity"
2. Research worker → "Gather data on collaboration tool usage"
3. Analysis worker → "Compare productivity metrics pre/post remote"
4. Writing worker → "Draft recommendations for team leads"

Workers execute independently...

Orchestrator synthesizes into final report

Real-World Use Cases

Domain Orchestrator Role Workers
Market Analysis Break down research request News, competitors, trends
Medical Diagnosis Coordinate test interpretation Hematology, cardiology, radiology
Legal Review Manage contract analysis Compliance, liability, terms
Content Creation Plan article structure Research, writing, editing

Combining Patterns

These patterns can be composed for sophisticated workflows:

flowchart TD
    T[Complex Task] --> O[Orchestrator]
    O --> W1[Worker 1]
    O --> W2[Worker 2]
    W1 --> E1{Evaluate}
    E1 -->|Fail| W1
    E1 -->|Pass| S[Synthesize]
    W2 --> E2{Evaluate}
    E2 -->|Fail| W2
    E2 -->|Pass| S
    S --> O
    O --> R[Result]

    style O fill:#fff3e0
    style E1 fill:#fce4ec
    style E2 fill:#fce4ec

The orchestrator delegates to workers, each worker’s output goes through an evaluator-optimizer loop, and the orchestrator synthesizes validated results.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def orchestrator_with_validation(task: str) -> str:
# Get plan from orchestrator
plan = create_plan(task)

validated_results = []
for subtask in plan["subtasks"]:
# Each worker output goes through evaluation loop
result = evaluator_optimizer_loop(
task=subtask["task"],
criteria=get_criteria_for_worker(subtask["worker"]),
max_iterations=3
)
validated_results.append(result)

# Synthesize validated results
return synthesize(task, validated_results)

Key Takeaways

Evaluator-Optimizer

  1. Two-agent collaboration: Generator creates, evaluator critiques
  2. Clear criteria essential: Vague criteria produce vague feedback
  3. Actionable feedback: Tell optimizer what and how to fix
  4. Define stopping conditions: Prevent infinite loops

Orchestrator-Worker

  1. Dynamic decomposition: Break tasks at runtime, not design time
  2. Specialized workers: Each expert in their domain
  3. Central synthesis: Orchestrator combines results coherently
  4. Handles novelty: Adapts to unfamiliar requests

Composition

  • Patterns can be combined for complex workflows
  • Orchestrator can use evaluator-optimizer for quality control
  • Workers can be simple agents or full sub-workflows

These workflow patterns - chaining, routing, parallelization, evaluation loops, and orchestration - form the foundation for building sophisticated AI systems. In the next part of this series, I’ll move from workflow patterns to building full-fledged agents with tools, state, and memory.


This wraps up the agentic workflow patterns. Next: extending agents with tools and function calling.

Routing and Parallelization Patterns for AI Agents

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×