Extending Agents with Tools and Structured Outputs

Language models are impressive reasoners, but without tools they can only generate text. They can’t check real-time data, perform precise calculations, or interact with external systems. In this post, I’ll explore how to extend agents with tools through function calling, and ensure reliable outputs using Pydantic for structured data validation.

From Passive to Active AI

Ask a basic LLM “What’s the weather in Tokyo right now?” and you’ll get one of two responses: a hallucinated answer or an admission that it doesn’t have current data. Neither is useful.

Tools transform agents from passive responders into active problem-solvers:

flowchart LR
    U[User Query] --> A[Agent]
    A --> D{Decision}
    D -->|Need Data| T[Tool Call]
    T --> R[Result]
    R --> A
    D -->|Can Answer| O[Response]

    style T fill:#e3f2fd
    style A fill:#fff3e0

With tools, the agent can:

  • Fetch real-time weather data
  • Calculate precise results
  • Query databases
  • Send notifications
  • Execute code

Function Calling: The Bridge

Early approaches to tool use relied on fragile prompt engineering - asking the model to output specific strings that could be parsed. Modern function calling is far more robust.

How It Works

  1. Define tools with clear schemas (name, description, parameters)
  2. Model decides when a tool is needed based on the query
  3. API returns a structured tool call (not free text)
  4. Backend executes the tool and returns results
  5. Model generates final response using tool output
sequenceDiagram
    participant U as User
    participant A as Agent
    participant T as Tool

    U->>A: "What's the weather in Tokyo?"
    A->>A: Decides to use weather tool
    A->>T: get_weather(city="Tokyo")
    T->>A: {"temp": "22°C", "condition": "Sunny"}
    A->>U: "It's currently 22°C and sunny in Tokyo"

Defining Tools

Tools need clear documentation so the model knows when and how to use them:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from langchain_core.tools import tool

@tool
def get_weather(city: str) -> dict:
"""
Get the current weather for a city.

Args:
city: Name of the city to get weather for

Returns:
Dictionary with temperature and conditions
"""
# In production, call actual weather API
weather_data = {
"Tokyo": {"temp": "22°C", "condition": "Sunny"},
"London": {"temp": "15°C", "condition": "Cloudy"},
"New York": {"temp": "18°C", "condition": "Clear"}
}
return weather_data.get(city, {"temp": "Unknown", "condition": "Unknown"})


@tool
def calculate(expression: str) -> float:
"""
Evaluate a mathematical expression.

Args:
expression: Math expression to evaluate (e.g., "23 * 45")

Returns:
Numerical result
"""
return eval(expression) # Use safer evaluation in production

Key requirements for effective tool definitions:

  • Clear docstring: Explains what the tool does
  • Typed parameters: Model knows what arguments to provide
  • Descriptive names: get_weather not gw

Binding Tools to the Model

1
2
3
4
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools([get_weather, calculate])

The Tool Execution Loop

Here’s the critical pattern - tools require a conversation loop:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage

def run_agent(query: str, tools: list) -> str:
messages = [HumanMessage(content=query)]

while True:
# Get model response
response = llm_with_tools.invoke(messages)
messages.append(response)

# Check if model wants to use tools
if not response.tool_calls:
# No tools needed, return final answer
return response.content

# Execute each tool call
for tool_call in response.tool_calls:
tool_name = tool_call["name"]
tool_args = tool_call["args"]
tool_id = tool_call["id"]

# Find and execute the tool
tool_fn = next(t for t in tools if t.name == tool_name)
result = tool_fn.invoke(tool_args)

# Add tool result to conversation
messages.append(ToolMessage(
content=str(result),
tool_call_id=tool_id
))

# Loop continues - model will process tool results

The message flow looks like:

Step Message Type Content
1 HumanMessage “What’s 23 * 45?”
2 AIMessage tool_call: calculate(“23 * 45”)
3 ToolMessage “1035”
4 AIMessage “23 multiplied by 45 equals 1,035”

Building a Complete Agent Class

Let’s wrap this into a reusable agent:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
from openai import OpenAI
from typing import List, Any
import json

client = OpenAI()

class ToolAgent:
def __init__(
self,
role: str,
instructions: str,
tools: List[Any],
model: str = "gpt-4o-mini"
):
self.role = role
self.instructions = instructions
self.tools = tools
self.model = model

def _get_tool_schemas(self) -> list:
"""Convert tools to OpenAI function schemas"""
return [
{
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.args_schema.schema()
}
}
for tool in self.tools
]

def invoke(self, query: str) -> str:
messages = [
{"role": "system", "content": f"You are {self.role}. {self.instructions}"},
{"role": "user", "content": query}
]

while True:
response = client.chat.completions.create(
model=self.model,
messages=messages,
tools=self._get_tool_schemas(),
temperature=0
)

message = response.choices[0].message

# No tool calls - return final answer
if not message.tool_calls:
return message.content

# Process tool calls
messages.append(message)

for tool_call in message.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)

# Execute tool
tool = next(t for t in self.tools if t.name == func_name)
result = tool.invoke(func_args)

messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})


# Usage
weather_agent = ToolAgent(
role="Weather Assistant",
instructions="Help users with weather information. Always be concise.",
tools=[get_weather]
)

response = weather_agent.invoke("What's the weather like in Tokyo?")

The Problem with Unstructured Output

Tools solve the input side - agents can now gather data. But what about outputs? Consider an agent that extracts meeting information:

Unstructured response:

1
2
The meeting is about Q4 planning. Alice will prepare slides by Friday.
Bob needs to review the budget. It's scheduled for next Tuesday.

What downstream systems need:

1
2
3
4
5
6
7
8
9
{
"title": "Q4 Planning",
"date": "2025-01-14",
"attendees": ["Alice", "Bob"],
"action_items": [
{"task": "Prepare slides", "owner": "Alice", "due": "Friday"},
{"task": "Review budget", "owner": "Bob", "due": null}
]
}

Free-form text can’t reliably trigger automated workflows, populate databases, or integrate with other systems.

Structured Outputs with Pydantic

Pydantic provides type-safe data validation - perfect for ensuring agent outputs match expected schemas.

Defining Output Schemas

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from pydantic import BaseModel, Field
from typing import List, Optional
from datetime import date

class ActionItem(BaseModel):
"""A task extracted from meeting notes"""
task: str = Field(description="Description of the task")
owner: str = Field(description="Person responsible")
due_date: Optional[str] = Field(description="Due date if mentioned")

class MeetingSummary(BaseModel):
"""Structured meeting summary"""
title: str = Field(description="Meeting title or topic")
date: Optional[date] = Field(description="Meeting date")
attendees: List[str] = Field(description="List of attendees")
key_points: List[str] = Field(description="Main discussion points")
action_items: List[ActionItem] = Field(description="Tasks to be completed")

Extracting Structured Data

OpenAI’s API supports response format specification:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
from openai import OpenAI

client = OpenAI()

def extract_meeting_info(notes: str) -> MeetingSummary:
response = client.beta.chat.completions.parse(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": "Extract structured information from meeting notes."
},
{
"role": "user",
"content": notes
}
],
response_format=MeetingSummary
)

return response.choices[0].message.parsed


# Usage
notes = """
Q4 Planning Meeting - January 10th
Attendees: Alice, Bob, Carol

Discussion:
- Reviewed Q3 results, exceeded targets by 15%
- Discussed new product launch timeline
- Budget allocation for marketing campaigns

Action Items:
- Alice to prepare launch slides by Friday
- Bob to finalize budget proposal by EOW
- Carol to schedule customer interviews
"""

summary = extract_meeting_info(notes)
print(summary.title) # "Q4 Planning Meeting"
print(summary.action_items[0].owner) # "Alice"

Validation and Error Handling

Pydantic validates data automatically:

1
2
3
4
5
6
7
8
from pydantic import ValidationError

try:
# If model returns invalid data, Pydantic catches it
summary = MeetingSummary(**raw_data)
except ValidationError as e:
print(f"Validation failed: {e}")
# Retry with clarified prompt or use fallback

Building a Structured Output Agent

Combining tools and structured outputs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class StructuredAgent:
def __init__(
self,
role: str,
instructions: str,
tools: List[Any] = None,
output_schema: type = None
):
self.role = role
self.instructions = instructions
self.tools = tools or []
self.output_schema = output_schema

def invoke(self, query: str):
messages = [
{"role": "system", "content": f"You are {self.role}. {self.instructions}"},
{"role": "user", "content": query}
]

# Tool execution loop (same as before)
messages = self._execute_tools(messages)

# Generate structured output
if self.output_schema:
response = client.beta.chat.completions.parse(
model="gpt-4o-mini",
messages=messages,
response_format=self.output_schema
)
return response.choices[0].message.parsed
else:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
return response.choices[0].message.content

Common Tool Patterns

Data Retrieval Tools

1
2
3
4
5
6
7
8
9
10
11
@tool
def search_products(query: str, category: str = None) -> List[dict]:
"""Search product catalog"""
# Query database or API
pass

@tool
def get_customer_info(customer_id: str) -> dict:
"""Retrieve customer details"""
# Fetch from CRM
pass

Computation Tools

1
2
3
4
5
6
7
8
9
10
@tool
def analyze_sentiment(text: str) -> dict:
"""Analyze sentiment of text"""
# Return positive/negative/neutral with confidence
pass

@tool
def summarize_document(content: str, max_length: int = 200) -> str:
"""Summarize long document"""
pass

Action Tools

1
2
3
4
5
6
7
8
9
10
@tool
def send_email(to: str, subject: str, body: str) -> bool:
"""Send an email"""
# Integrate with email service
pass

@tool
def create_ticket(title: str, description: str, priority: str) -> str:
"""Create support ticket, returns ticket ID"""
pass

Best Practices

Tool Design

  1. Single responsibility: Each tool does one thing well
  2. Clear documentation: Models rely on descriptions to choose tools
  3. Typed parameters: Reduces argument errors
  4. Meaningful returns: Return structured data, not just strings

Error Handling

1
2
3
4
5
6
7
8
9
@tool
def safe_api_call(endpoint: str) -> dict:
"""Call external API with error handling"""
try:
response = requests.get(endpoint, timeout=5)
response.raise_for_status()
return {"success": True, "data": response.json()}
except requests.RequestException as e:
return {"success": False, "error": str(e)}

Output Validation

Always validate structured outputs before using them:

1
2
3
4
5
6
7
def process_agent_output(raw_output: dict, schema: type):
try:
validated = schema(**raw_output)
return validated
except ValidationError:
# Log error, retry, or use fallback
return None

Key Takeaways

  1. Tools bridge reasoning and action: LLMs can think; tools let them do
  2. Function calling is robust: Structured tool calls beat string parsing
  3. The tool loop is essential: Tools require conversation continuation
  4. Structured outputs enable integration: Pydantic ensures machine-readable responses
  5. Design tools carefully: Good documentation helps models choose correctly

With tools and structured outputs, agents can interact with the real world and integrate with other systems reliably. In the next post, I’ll explore how to manage agent state and memory across conversations.


This is Part 8 of my series on building intelligent AI systems. Next: agent state and memory management.

Evaluator-Optimizer and Orchestrator-Worker Patterns Agent State and Memory - Beyond Single Interactions

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×