Evaluator-Optimizer and Orchestrator-Worker Patterns

Nov 15 2025 AI agentic-ai

Some tasks require iteration - generating, evaluating, and refining until quality standards are met. Others need dynamic orchestration - a central coordinator breaking down novel problems and delegating to specialists. In this post, I’ll cover two sophisticated patterns that enable these capabilities: the evaluator-optimizer loop and the orchestrator-worker architecture.

The Evaluator-Optimizer Pattern

This pattern creates an iterative refinement loop where two agents collaborate: one generates content, the other critiques it, and the feedback drives improvement.

flowchart TD
    I[Input] --> O[Optimizer
Generate]
    O --> E{Evaluator
Assess}
    E -->|Passes| Done[Output]
    E -->|Fails| F[Feedback]
    F --> O

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff

    class O blueClass
    class E orangeClass
    class Done greenClass

Think of it like a writer-editor relationship: the writer drafts, the editor critiques, the writer revises based on feedback, and this continues until publication standards are met.

Key Components

Optimizer (Generator) Agent: Creates initial output and refines based on feedback. Uses moderate-to-high temperature for creativity.

Evaluator (Critic) Agent: Assesses output against criteria and provides actionable feedback. Uses low temperature for consistent evaluation.

Three Critical Elements

1. Clear Evaluation Criteria

Vague criteria produce vague feedback. Be specific:

# Bad: Vague criteria
criteria = "Make sure it's good and professional"

# Good: Specific, measurable criteria
criteria = """
Evaluate the report against these criteria:
1. Contains executive summary (max 200 words)
2. All claims supported by data
3. No speculative language ("might", "could potentially")
4. Includes risk assessment section
5. Professional tone throughout
"""

2. Actionable Feedback

Feedback should tell the optimizer what to fix and how:

# Bad: Non-actionable
feedback = "The writing needs improvement"

# Good: Actionable
feedback = """
Issues found:
1. Executive summary is 347 words (exceeds 200 limit) - condense key points
2. Paragraph 3 claims "significant growth" without supporting data - add specific metrics
3. Line 45 uses "might increase" - rephrase with certainty or remove
"""

3. Stopping Conditions

Without clear stopping conditions, loops run forever. Define when to stop:

Success: All criteria pass
Max iterations: Hard limit on attempts
Diminishing returns: Improvements become minimal
Timeout: Wall-clock time limit

Implementation

from openai import OpenAI
from typing import Tuple

client = OpenAI()

def call_llm(system_prompt: str, user_prompt: str, temperature: float = 0.7) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=temperature
    )
    return response.choices[0].message.content


def optimizer_agent(task: str, feedback: str = None) -> str:
    """Generate or refine content based on feedback"""

    system_prompt = """
    You are a skilled content creator. Generate high-quality content
    that meets professional standards. If feedback is provided,
    carefully address each point in your revision.
    """

    if feedback:
        user_prompt = f"""
        Task: {task}

        Previous feedback to address:
        {feedback}

        Create an improved version addressing all feedback points.
        """
    else:
        user_prompt = f"Task: {task}"

    return call_llm(system_prompt, user_prompt, temperature=0.7)


def evaluator_agent(content: str, criteria: str) -> Tuple[bool, str]:
    """Evaluate content and provide feedback"""

    system_prompt = """
    You are a strict quality evaluator. Assess content against
    the provided criteria. Be thorough and specific in your feedback.

    Respond in this format:
    PASSED: true/false
    FEEDBACK: [specific issues and how to fix them, or "All criteria met"]
    """

    user_prompt = f"""
    Criteria:
    {criteria}

    Content to evaluate:
    {content}
    """

    response = call_llm(system_prompt, user_prompt, temperature=0)

    # Parse response
    passed = "PASSED: true" in response.lower()
    feedback_start = response.find("FEEDBACK:")
    feedback = response[feedback_start + 9:].strip() if feedback_start != -1 else response

    return passed, feedback


def evaluator_optimizer_loop(
    task: str,
    criteria: str,
    max_iterations: int = 5
) -> str:
    """Run the evaluator-optimizer loop until success or max iterations"""

    content = None
    feedback = None

    for iteration in range(max_iterations):
        print(f"Iteration {iteration + 1}/{max_iterations}")

        # Generate or refine
        content = optimizer_agent(task, feedback)

        # Evaluate
        passed, feedback = evaluator_agent(content, criteria)

        if passed:
            print(f"Success on iteration {iteration + 1}")
            return content

        print(f"Feedback: {feedback[:100]}...")

    print(f"Max iterations reached, returning best effort")
    return content


# Usage
task = "Write a professional email announcing a product delay to customers"
criteria = """
1. Acknowledges the delay with specific new timeline
2. Apologizes sincerely without making excuses
3. Explains what steps are being taken
4. Offers compensation or goodwill gesture
5. Under 200 words
6. Professional but empathetic tone
"""

result = evaluator_optimizer_loop(task, criteria)

Example Flow

Iteration	Action	Result
1	Generate initial draft	280 words, missing compensation offer
2	Refine based on feedback	195 words, added compensation, tone too formal
3	Adjust tone	Passes all criteria

Use Cases

Compliance checking: Financial reports meeting regulatory standards
Code generation: Produce code that passes tests
Content creation: Articles meeting editorial guidelines
Recipe generation: Meet nutritional constraints

The Orchestrator-Worker Pattern

While evaluator-optimizer handles iterative refinement, orchestrator-worker handles dynamic task decomposition. A central orchestrator analyzes complex problems, creates plans, delegates to specialists, and synthesizes results.

flowchart TD
    I[Complex Task] --> O[Orchestrator]
    O -->|Plan| P[Create Subtasks]
    P --> W1[Worker 1]
    P --> W2[Worker 2]
    P --> W3[Worker 3]
    W1 --> S[Synthesize]
    W2 --> S
    W3 --> S
    S --> O
    O --> R[Final Result]

    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
    classDef pinkClass fill:#E74C3C,stroke:#333,stroke-width:2px,color:#fff

    class O orangeClass
    class W1 blueClass
    class W2 greenClass
    class W3 pinkClass

Think of it like a project manager who receives a complex brief, breaks it into tasks, assigns specialists, collects their work, and assembles the final deliverable.

Orchestrator vs. Simple Parallelization

Aspect	Parallelization	Orchestrator
Task breakdown	Predefined/static	Dynamic at runtime
Adaptability	Fixed workflow	Adapts to novel problems
Intelligence	Follows script	Makes decisions
Use case	Repeatable processes	Varied/complex requests

Orchestrator Responsibilities

Analyze: Understand the complex task
Plan: Break into subtasks dynamically
Delegate: Route subtasks to appropriate workers
Synthesize: Combine worker outputs coherently

Worker Characteristics

Specialized expertise (research, analysis, writing, etc.)
Clear input/output contracts
No awareness of the larger task
Execute independently

Implementation

from openai import OpenAI
import json
from typing import List, Dict

client = OpenAI()


def call_llm(system_prompt: str, user_prompt: str, temperature: float = 0.7) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=temperature
    )
    return response.choices[0].message.content


# Worker Agents
def research_worker(subtask: str) -> str:
    system_prompt = "You are a research specialist. Gather and summarize relevant information."
    return call_llm(system_prompt, subtask)


def analysis_worker(subtask: str) -> str:
    system_prompt = "You are a data analyst. Analyze information and identify patterns."
    return call_llm(system_prompt, subtask)


def writing_worker(subtask: str) -> str:
    system_prompt = "You are a professional writer. Create clear, engaging content."
    return call_llm(system_prompt, subtask)


# Worker registry
WORKERS = {
    "research": research_worker,
    "analysis": analysis_worker,
    "writing": writing_worker
}


def orchestrator_agent(task: str) -> str:
    """Central orchestrator that plans, delegates, and synthesizes"""

    # Step 1: Create plan
    planning_prompt = f"""
    Analyze this task and create a plan with subtasks.

    Task: {task}

    Available workers: research, analysis, writing

    Respond with JSON:
    {{
        "subtasks": [
            {{"worker": "research", "task": "specific subtask description"}},
            {{"worker": "analysis", "task": "specific subtask description"}}
        ]
    }}
    """

    plan_response = call_llm(
        "You are a project planner. Create efficient task breakdowns.",
        planning_prompt,
        temperature=0
    )

    # Parse plan
    try:
        # Extract JSON from response
        json_start = plan_response.find('{')
        json_end = plan_response.rfind('}') + 1
        plan = json.loads(plan_response[json_start:json_end])
    except json.JSONDecodeError:
        # Fallback to simple research task
        plan = {"subtasks": [{"worker": "research", "task": task}]}

    # Step 2: Execute subtasks
    results = []
    for subtask in plan["subtasks"]:
        worker_name = subtask["worker"]
        worker_task = subtask["task"]

        print(f"Delegating to {worker_name}: {worker_task[:50]}...")

        worker = WORKERS.get(worker_name, research_worker)
        result = worker(worker_task)
        results.append({
            "worker": worker_name,
            "task": worker_task,
            "result": result
        })

    # Step 3: Synthesize results
    synthesis_prompt = f"""
    Original task: {task}

    Worker results:
    {json.dumps(results, indent=2)}

    Synthesize these results into a comprehensive, coherent response
    that fully addresses the original task.
    """

    final_result = call_llm(
        "You are an expert synthesizer. Combine diverse inputs into coherent outputs.",
        synthesis_prompt,
        temperature=0.5
    )

    return final_result


# Usage
complex_task = """
Analyze the impact of remote work on software development teams.
Include research on productivity studies, analysis of collaboration
patterns, and recommendations for team leads.
"""

result = orchestrator_agent(complex_task)
print(result)

Dynamic Planning in Action

For the task “Analyze remote work impact on dev teams”:

Orchestrator creates plan:
1. Research worker → "Find recent studies on remote work productivity"
2. Research worker → "Gather data on collaboration tool usage"
3. Analysis worker → "Compare productivity metrics pre/post remote"
4. Writing worker → "Draft recommendations for team leads"

Workers execute independently...

Orchestrator synthesizes into final report

Real-World Use Cases

Domain	Orchestrator Role	Workers
Market Analysis	Break down research request	News, competitors, trends
Medical Diagnosis	Coordinate test interpretation	Hematology, cardiology, radiology
Legal Review	Manage contract analysis	Compliance, liability, terms
Content Creation	Plan article structure	Research, writing, editing

Combining Patterns

These patterns can be composed for sophisticated workflows:

flowchart TD
    T[Complex Task] --> O[Orchestrator]
    O --> W1[Worker 1]
    O --> W2[Worker 2]
    W1 --> E1{Evaluate}
    E1 -->|Fail| W1
    E1 -->|Pass| S[Synthesize]
    W2 --> E2{Evaluate}
    E2 -->|Fail| W2
    E2 -->|Pass| S
    S --> O
    O --> R[Result]

    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef pinkClass fill:#E74C3C,stroke:#333,stroke-width:2px,color:#fff

    class O orangeClass
    class E1 pinkClass
    class E2 pinkClass

The orchestrator delegates to workers, each worker’s output goes through an evaluator-optimizer loop, and the orchestrator synthesizes validated results.

def orchestrator_with_validation(task: str) -> str:
    # Get plan from orchestrator
    plan = create_plan(task)

    validated_results = []
    for subtask in plan["subtasks"]:
        # Each worker output goes through evaluation loop
        result = evaluator_optimizer_loop(
            task=subtask["task"],
            criteria=get_criteria_for_worker(subtask["worker"]),
            max_iterations=3
        )
        validated_results.append(result)

    # Synthesize validated results
    return synthesize(task, validated_results)

Key Takeaways