Connecting Agents to the World - External APIs and Data

An agent that can only process text is fundamentally limited. Real usefulness comes from connecting to external systems - fetching live data, querying databases, calling APIs, and triggering actions in the real world. In this post, I’ll explore how to build these connections, turning isolated language models into integrated systems that can actually get things done.

The Integration Challenge

LLMs excel at understanding and generating text, but they operate in a vacuum. They don’t know:

  • Today’s weather or stock prices
  • Your company’s current inventory
  • The user’s calendar events
  • What happened after their training cutoff

Tools bridge this gap, but designing reliable integrations requires careful thought about authentication, error handling, rate limiting, and data transformation.

flowchart LR
    A[Agent] --> T{Tool Layer}
    T --> W[Web Search]
    T --> D[Databases]
    T --> API[External APIs]
    T --> S[Services]

    W --> R[Results]
    D --> R
    API --> R
    S --> R

    R --> A

    style A fill:#fff3e0
    style T fill:#e3f2fd

Web Search Integration

Web search gives agents access to current information beyond their training data. Here’s how to integrate search capabilities:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from langchain_community.tools import TavilySearchResults
from langchain_core.tools import tool

# Using Tavily (optimized for LLM use)
search_tool = TavilySearchResults(max_results=5)

@tool
def web_search(query: str) -> str:
"""
Search the web for current information.

Args:
query: The search query

Returns:
Search results as formatted text
"""
results = search_tool.invoke({"query": query})

# Format results for the LLM
formatted = []
for result in results:
formatted.append(f"Title: {result['title']}")
formatted.append(f"URL: {result['url']}")
formatted.append(f"Content: {result['content'][:500]}")
formatted.append("---")

return "\n".join(formatted)

Building a Research Agent

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
from openai import OpenAI
import json

client = OpenAI()

class ResearchAgent:
def __init__(self):
self.tools = [web_search]
self.tool_map = {t.name: t for t in self.tools}

def research(self, topic: str) -> str:
messages = [
{
"role": "system",
"content": """You are a research assistant. Use web search
to find current, accurate information. Always cite sources."""
},
{"role": "user", "content": f"Research this topic: {topic}"}
]

# Tool execution loop
while True:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=self._get_tool_schemas(),
temperature=0
)

message = response.choices[0].message

if not message.tool_calls:
return message.content

messages.append(message)

for tool_call in message.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)

result = self.tool_map[func_name].invoke(func_args)

messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result)
})

def _get_tool_schemas(self) -> list:
return [
{
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.args_schema.schema()
}
}
for tool in self.tools
]

Database Connections

Agents often need to query structured data. Here’s a pattern for SQL database access:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import sqlite3
from typing import List, Dict, Any

@tool
def query_database(sql: str) -> str:
"""
Execute a read-only SQL query against the database.

Args:
sql: SELECT query to execute (no modifications allowed)

Returns:
Query results as formatted text
"""
# Safety check - only allow SELECT
if not sql.strip().upper().startswith("SELECT"):
return "Error: Only SELECT queries are allowed"

try:
conn = sqlite3.connect("company.db")
cursor = conn.cursor()
cursor.execute(sql)
columns = [desc[0] for desc in cursor.description]
rows = cursor.fetchall()
conn.close()

# Format as readable table
result = " | ".join(columns) + "\n"
result += "-" * 40 + "\n"
for row in rows:
result += " | ".join(str(v) for v in row) + "\n"

return result

except Exception as e:
return f"Query error: {str(e)}"


@tool
def get_schema() -> str:
"""
Get the database schema to understand available tables and columns.

Returns:
Schema description
"""
conn = sqlite3.connect("company.db")
cursor = conn.cursor()

cursor.execute(
"SELECT name FROM sqlite_master WHERE type='table'"
)
tables = cursor.fetchall()

schema = []
for (table_name,) in tables:
cursor.execute(f"PRAGMA table_info({table_name})")
columns = cursor.fetchall()
col_info = [f"{col[1]} ({col[2]})" for col in columns]
schema.append(f"{table_name}: {', '.join(col_info)}")

conn.close()
return "\n".join(schema)

Natural Language to SQL

Enable agents to translate natural language to SQL:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class DatabaseAgent:
def __init__(self):
self.tools = [query_database, get_schema]

def query(self, question: str) -> str:
messages = [
{
"role": "system",
"content": """You are a database analyst. Convert natural language
questions into SQL queries. Always check the schema first.

Steps:
1. Use get_schema to understand available tables
2. Write appropriate SQL query
3. Execute with query_database
4. Summarize results for the user"""
},
{"role": "user", "content": question}
]

return self._execute_with_tools(messages)

REST API Integration

Many services expose REST APIs. Here’s a robust pattern for API integration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import requests
from typing import Optional
from functools import wraps
import time

def with_retry(max_attempts: int = 3, backoff_factor: float = 2.0):
"""Decorator for retry logic with exponential backoff"""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
last_exception = None
for attempt in range(max_attempts):
try:
return func(*args, **kwargs)
except requests.RequestException as e:
last_exception = e
if attempt < max_attempts - 1:
sleep_time = backoff_factor ** attempt
time.sleep(sleep_time)
raise last_exception
return wrapper
return decorator


class APIClient:
def __init__(self, base_url: str, api_key: str):
self.base_url = base_url
self.session = requests.Session()
self.session.headers.update({
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
})

@with_retry(max_attempts=3)
def get(self, endpoint: str, params: dict = None) -> dict:
response = self.session.get(
f"{self.base_url}/{endpoint}",
params=params,
timeout=30
)
response.raise_for_status()
return response.json()

@with_retry(max_attempts=3)
def post(self, endpoint: str, data: dict) -> dict:
response = self.session.post(
f"{self.base_url}/{endpoint}",
json=data,
timeout=30
)
response.raise_for_status()
return response.json()


# Create tools from API client
weather_client = APIClient(
base_url="https://api.weather.example.com",
api_key=os.environ["WEATHER_API_KEY"]
)

@tool
def get_weather(city: str) -> dict:
"""
Get current weather for a city.

Args:
city: City name

Returns:
Weather data including temperature and conditions
"""
try:
return weather_client.get("current", params={"city": city})
except requests.RequestException as e:
return {"error": f"Failed to fetch weather: {str(e)}"}

File System Operations

Agents may need to read and write files:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
from pathlib import Path

@tool
def read_file(file_path: str) -> str:
"""
Read contents of a text file.

Args:
file_path: Path to the file to read

Returns:
File contents
"""
path = Path(file_path)

# Security: restrict to allowed directories
allowed_dirs = [Path("./data"), Path("./documents")]
if not any(path.resolve().is_relative_to(d.resolve()) for d in allowed_dirs):
return "Error: Access denied - path outside allowed directories"

if not path.exists():
return f"Error: File not found: {file_path}"

try:
return path.read_text()
except Exception as e:
return f"Error reading file: {str(e)}"


@tool
def write_file(file_path: str, content: str) -> str:
"""
Write content to a text file.

Args:
file_path: Path to the file to write
content: Content to write

Returns:
Success or error message
"""
path = Path(file_path)

# Security check
allowed_dirs = [Path("./output")]
if not any(path.resolve().is_relative_to(d.resolve()) for d in allowed_dirs):
return "Error: Access denied - can only write to output directory"

try:
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(content)
return f"Successfully wrote to {file_path}"
except Exception as e:
return f"Error writing file: {str(e)}"

Building a Multi-Tool Agent

Combining multiple integrations into a capable agent:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class IntegratedAgent:
def __init__(self):
self.tools = [
web_search,
query_database,
get_schema,
get_weather,
read_file,
write_file
]
self.tool_map = {t.name: t for t in self.tools}

def run(self, task: str) -> str:
messages = [
{
"role": "system",
"content": """You are a capable assistant with access to multiple tools:

- web_search: Find current information online
- query_database/get_schema: Query company database
- get_weather: Get weather information
- read_file/write_file: Work with files

Use tools when needed. Chain multiple tools for complex tasks.
Always explain what you're doing and why."""
},
{"role": "user", "content": task}
]

max_iterations = 10
for _ in range(max_iterations):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=self._get_tool_schemas(),
temperature=0
)

message = response.choices[0].message

if not message.tool_calls:
return message.content

messages.append(message)

for tool_call in message.tool_calls:
result = self._execute_tool(tool_call)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})

return "Reached maximum iterations without completing task"

def _execute_tool(self, tool_call) -> str:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)

if func_name not in self.tool_map:
return f"Unknown tool: {func_name}"

try:
result = self.tool_map[func_name].invoke(func_args)
return str(result)
except Exception as e:
return f"Tool error: {str(e)}"

Security Considerations

External integrations introduce security risks. Essential safeguards:

Input Validation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from pydantic import BaseModel, validator

class QueryInput(BaseModel):
sql: str

@validator('sql')
def validate_sql(cls, v):
# Block dangerous keywords
dangerous = ['DROP', 'DELETE', 'UPDATE', 'INSERT', 'ALTER', 'TRUNCATE']
upper = v.upper()
for keyword in dangerous:
if keyword in upper:
raise ValueError(f"Dangerous SQL keyword detected: {keyword}")
return v

Rate Limiting

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from functools import wraps
from collections import defaultdict
import time

class RateLimiter:
def __init__(self, calls_per_minute: int = 60):
self.calls_per_minute = calls_per_minute
self.calls = defaultdict(list)

def is_allowed(self, key: str) -> bool:
now = time.time()
minute_ago = now - 60

# Clean old calls
self.calls[key] = [t for t in self.calls[key] if t > minute_ago]

if len(self.calls[key]) >= self.calls_per_minute:
return False

self.calls[key].append(now)
return True

rate_limiter = RateLimiter(calls_per_minute=30)

def rate_limited(func):
@wraps(func)
def wrapper(*args, **kwargs):
if not rate_limiter.is_allowed(func.__name__):
raise Exception("Rate limit exceeded")
return func(*args, **kwargs)
return wrapper

Credential Management

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import os
from dataclasses import dataclass

@dataclass
class APICredentials:
"""Securely manage API credentials"""

@staticmethod
def get(service: str) -> str:
"""Get credential from environment"""
key = f"{service.upper()}_API_KEY"
value = os.environ.get(key)
if not value:
raise ValueError(f"Missing credential: {key}")
return value

# Usage in tools
weather_key = APICredentials.get("weather") # Reads WEATHER_API_KEY

Error Handling Patterns

Robust error handling is critical for production agents:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
from enum import Enum
from typing import Union

class ToolErrorType(Enum):
NETWORK = "network"
AUTHENTICATION = "authentication"
RATE_LIMIT = "rate_limit"
VALIDATION = "validation"
NOT_FOUND = "not_found"
UNKNOWN = "unknown"

@dataclass
class ToolResult:
success: bool
data: Any = None
error_type: ToolErrorType = None
error_message: str = None

def to_string(self) -> str:
if self.success:
return str(self.data)
return f"Error ({self.error_type.value}): {self.error_message}"


def safe_tool_execution(func):
"""Wrapper that catches exceptions and returns structured results"""
@wraps(func)
def wrapper(*args, **kwargs):
try:
result = func(*args, **kwargs)
return ToolResult(success=True, data=result)
except requests.Timeout:
return ToolResult(
success=False,
error_type=ToolErrorType.NETWORK,
error_message="Request timed out"
)
except requests.HTTPError as e:
if e.response.status_code == 429:
return ToolResult(
success=False,
error_type=ToolErrorType.RATE_LIMIT,
error_message="Rate limit exceeded, try again later"
)
elif e.response.status_code == 401:
return ToolResult(
success=False,
error_type=ToolErrorType.AUTHENTICATION,
error_message="Authentication failed"
)
except Exception as e:
return ToolResult(
success=False,
error_type=ToolErrorType.UNKNOWN,
error_message=str(e)
)
return wrapper

Key Takeaways

  1. Layer your integrations: Separate tool definitions from business logic and error handling
  2. Always handle failures: Network calls fail, APIs rate limit, databases time out
  3. Security is non-negotiable: Validate inputs, restrict access, protect credentials
  4. Provide context to LLMs: Format external data clearly so the model can use it effectively
  5. Log everything: External calls should be logged for debugging and monitoring

With external connections, agents transform from text processors into capable systems that can research, query, and act. In the next post, I’ll explore agentic RAG - dynamically retrieving information to augment agent knowledge - and strategies for evaluating agent performance.


This is Part 10 of my series on building intelligent AI systems. Next: agentic RAG and agent evaluation strategies.

Agent State and Memory - Beyond Single Interactions Agentic RAG and Agent Evaluation Strategies

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×