Connecting LangGraph Agents to APIs and Databases

Jun 12 2025 AI langchain-langgraph

Agents become truly useful when they can interact with the real world - fetching live data from APIs, querying databases, and writing results back. This post covers building production-grade integrations: API tools with proper error handling, SQL agents that translate natural language to queries, and security patterns that prevent disasters.

The Grounding Problem

Large language models suffer from a fundamental limitation: they’re trained on static snapshots of text. GPT-4’s training data has a cutoff date. Claude doesn’t know your company’s order database exists. No model can tell you tomorrow’s weather - they can only generate plausible-sounding text about weather.

This creates what we might call the grounding problem: LLMs produce confident answers based on pattern matching, not factual lookup. When a user asks “What’s the status of my order?”, a pure LLM can only hallucinate an answer.

graph LR
    A[User Query] --> B[Agent]
    B --> C{Needs Data?}
    C -->|Yes| D[External System]
    D --> E[API / Database]
    E --> F[Real-time Data]
    F --> B
    C -->|No| G[Response]
    B --> G

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff

    class A,G greenClass
    class B,C orangeClass
    class D,E,F blueClass

External integrations solve this by giving agents access to live data. The agent recognizes it needs current information, calls the appropriate tool, and grounds its response in actual facts.

Integration Type	Purpose	Examples
REST APIs	Live external data	Weather, stocks, news, third-party services
SQL Databases	Structured business data	Orders, customers, inventory, analytics
Vector Databases	Semantic search	Document retrieval, knowledge bases, RAG
Internal Services	Organization-specific systems	CRM, ticketing, authentication

API Integration: Beyond Simple HTTP Calls

Building a production API tool requires more than requests.get(). Consider what happens when:

The API is temporarily down
Rate limits kick in mid-conversation
Network latency causes timeouts
The response format changes unexpectedly

Each failure mode needs handling, and the agent needs to understand what went wrong.

The Anatomy of a Robust API Tool

A production-grade API tool has three layers:

The HTTP client handles low-level concerns: retries, timeouts, connection pooling
The tool wrapper exposes a clean interface with typed parameters and documentation
Error translation converts HTTP failures into messages the LLM can understand

from langchain_core.tools import tool
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential

class APIClient:
    """HTTP client with retry logic and proper error handling."""

    def __init__(self, base_url: str, api_key: str):
        self.base_url = base_url
        self.headers = {"Authorization": f"Bearer {api_key}"}
        self.client = httpx.Client(timeout=30.0)

    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10)
    )
    def get(self, endpoint: str, params: dict = None) -> dict:
        response = self.client.get(
            f"{self.base_url}{endpoint}",
            headers=self.headers,
            params=params
        )
        response.raise_for_status()
        return response.json()

The tenacity library’s @retry decorator implements exponential backoff - waiting 2, 4, then 8 seconds between retries. This pattern respects rate limits without failing immediately on transient errors.

Error Messages for LLMs

When an API call fails, the error message matters. The LLM will see this message and decide how to proceed. Compare:

Unhelpful: "HTTPError: 429"
Helpful: "Rate limit exceeded. The weather service allows 60 requests per minute. Please wait a moment before trying again."

The second version gives the LLM context to either wait or inform the user appropriately. Design error messages as if explaining to a colleague who needs to handle the problem.

Text-to-SQL: Power and Peril

SQL agents represent both the highest potential and highest risk of LLM integrations. A well-designed SQL agent can answer complex analytical questions in natural language. A poorly designed one can DROP TABLE users based on a misunderstood prompt.

Why Text-to-SQL is Valuable

Consider the alternative: every business question requires a developer to write a query. “What were our top-selling products last quarter?” becomes a ticket, waits in a backlog, gets implemented, and returns results days later.

A SQL agent answers immediately. The user asks in natural language; the agent generates SQL, executes it, and explains the results. This democratizes data access without requiring everyone to learn SQL.

The Security Imperative

But LLMs are eager to please. If prompted cleverly - or accidentally - they might generate destructive queries. The agent doesn’t inherently know that DELETE FROM orders is catastrophic while SELECT FROM orders is safe.

Defense in depth applies here:

Query validation: Parse and reject non-SELECT statements
Parameterized templates: Use allowed query patterns, not arbitrary generation
Row-level security: Filter results by user permissions
Read-only connections: Database credentials should only allow SELECT
Audit logging: Record every query for review

The Text-to-SQL Pipeline

A safe SQL agent follows a structured pipeline:

graph TD
    A[START] --> B[Generate SQL]
    B --> C[Validate Query]
    C --> D{Safe?}
    D -->|Yes| E[Execute Query]
    D -->|No| F[Explain Rejection]
    E --> G[Format Results]
    F --> G
    G --> H[END]

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
    classDef redClass fill:#E74C3C,stroke:#333,stroke-width:2px,color:#fff

    class A,H greenClass
    class B,E,G blueClass
    class C,D orangeClass
    class F redClass

The validation step is critical. Using a SQL parser like sqlparse, you can inspect the query structure:

import sqlparse

def validate_sql_query(query: str) -> tuple[bool, str]:
    """Validate SQL query for safety."""
    parsed = sqlparse.parse(query)
    if not parsed:
        return False, "Invalid SQL syntax"

    statement = parsed[0]

    # Only allow SELECT statements
    if statement.get_type() != "SELECT":
        return False, f"Only SELECT queries allowed, got {statement.get_type()}"

    # Check for dangerous patterns
    query_upper = query.upper()
    dangerous_patterns = [
        (";", "Multiple statements not allowed"),
        ("INTO", "SELECT INTO is not allowed"),
        ("OUTFILE", "File operations not allowed"),
    ]

    for pattern, message in dangerous_patterns:
        if pattern in query_upper:
            return False, message

    return True, "Query is valid"

This validation isn’t foolproof - determined attackers find creative bypasses. But it catches the common cases: accidental destructive queries and basic injection attempts.

Schema as Context

LLMs can’t generate valid SQL without knowing the schema. The query “How many orders last month?” could target orders, purchase_orders, sales_transactions, or something else entirely.

LangChain’s SQLDatabase utility provides schema extraction:

from langchain_community.utilities import SQLDatabase

db = SQLDatabase.from_uri("postgresql://user:pass@localhost/company")
schema_info = db.get_table_info()
# Returns: CREATE TABLE orders (id INT, user_id INT, total DECIMAL, ...)

Include this schema in the LLM prompt. The model needs to know what tables exist, their columns, and relationships. For large schemas, consider selective inclusion - only show tables relevant to the current query.

Combining Data Sources

Real-world agents often need multiple sources. A customer service agent might check:

The database for order history
An API for shipping carrier status
Another API for weather affecting delivery

The Routing Pattern

Rather than always querying everything, intelligent routing examines the question and selects appropriate sources:

def route_query(state: HybridAgentState) -> dict:
    """Determine which data source to use."""
    query_lower = state["query"].lower()

    if any(word in query_lower for word in ["weather", "stock", "news", "live"]):
        return {"source": "api"}
    elif any(word in query_lower for word in ["order", "customer", "sales", "history"]):
        return {"source": "database"}
    else:
        return {"source": "both"}

This heuristic approach works for clear cases. For ambiguous queries, you might use the LLM itself to classify the query type before routing.

Result Synthesis

When data comes from multiple sources, the agent must synthesize coherently. A shipping status query might combine:

Database: Order details, items, customer info
Carrier API: Current location, estimated delivery
Weather API: Conditions at delivery address

The synthesis step presents these as a unified narrative, not disconnected data dumps.

Audit and Observability

Every external call should be logged. This serves multiple purposes:

Debugging: When something fails, you can trace exactly what happened
Compliance: Regulated industries require audit trails
Cost tracking: API calls cost money; know where it goes
Security: Detect unusual patterns that might indicate misuse

import logging
from functools import wraps

audit_logger = logging.getLogger("agent_audit")

def audit_external_call(func):
    """Decorator to log all external operations."""
    @wraps(func)
    def wrapper(*args, **kwargs):
        call_info = f"{func.__name__}({args}, {kwargs})"
        audit_logger.info(f"External call started: {call_info[:200]}")

        try:
            result = func(*args, **kwargs)
            audit_logger.info(f"External call succeeded: {func.__name__}")
            return result
        except Exception as e:
            audit_logger.error(f"External call failed: {func.__name__} - {e}")
            raise

    return wrapper

In production, these logs flow to centralized systems (Datadog, Splunk, CloudWatch) where you can search, alert, and analyze patterns.

Key Takeaways

External integration solves the grounding problem: LLMs can’t know live data - tools bridge this gap with real-time information.
API tools need resilience: Implement retries with exponential backoff, meaningful error messages, and timeout handling. APIs fail; your agent shouldn’t.
Text-to-SQL requires defense in depth: Query validation, read-only connections, row-level security, and audit logging. Never trust LLM-generated SQL without verification.
Schema context enables valid queries: LLMs need to see database structure to generate correct SQL. Include relevant schema in prompts.
Route queries intelligently: Not every question needs every source. Classify queries and fetch only relevant data.
Audit everything: Log all external operations for debugging, compliance, and security analysis.

Next: Agentic RAG and Human-in-the-Loop with LangGraph - We’ll explore dynamic retrieval strategies, self-correcting RAG, and adding human approval to sensitive agent workflows.

#llm #python #langchain #langgraph #agents

Connecting LangGraph Agents to APIs and Databases

The Grounding Problem

API Integration: Beyond Simple HTTP Calls

The Anatomy of a Robust API Tool

Error Messages for LLMs

Text-to-SQL: Power and Peril

Why Text-to-SQL is Valuable

The Security Imperative

The Text-to-SQL Pipeline

Schema as Context

Combining Data Sources

The Routing Pattern

Result Synthesis

Audit and Observability

Key Takeaways

Comments

Your browser is out-of-date!

Connecting LangGraph Agents to APIs and Databases

The Grounding Problem

API Integration: Beyond Simple HTTP Calls

The Anatomy of a Robust API Tool

Error Messages for LLMs

Text-to-SQL: Power and Peril

Why Text-to-SQL is Valuable

The Security Imperative

The Text-to-SQL Pipeline

Schema as Context

Combining Data Sources

The Routing Pattern

Result Synthesis

Audit and Observability

Key Takeaways

Related Posts

Comments

Your browser is out-of-date!