Summary: Google's Introduction to Agents

As part of our journey exploring agentic AI systems, Google’s November 2025 whitepaper “Introduction to Agents” offers valuable industry perspective on production-grade agent architecture. This post summarizes the key concepts: core architecture, Agent Ops, security patterns, and self-evolving systems. It complements what we’ve covered in our Agentic AI series.

Source: Introduction to Agents (PDF) by Google

The Paradigm Shift: From Predictive AI to Autonomous Agents

For years, AI focused on passive, discrete tasks: answering questions, translating text, generating images. Each required constant human direction. The paradigm is shifting toward a new class of software capable of autonomous problem-solving and task execution. We explored this shift in From Chatbots to Agents.

An AI agent is not simply an AI model in a static workflow. It’s a complete application that makes plans and takes actions to achieve goals. The critical capability: agents can work on their own, figuring out next steps without a person guiding them at every turn.

Agents are the natural evolution of Language Models, made useful in software.

What is an AI Agent?

An AI agent combines models, tools, an orchestration layer, and runtime services which uses the LM in a loop to accomplish a goal. We covered similar building blocks in Anatomy of an AI Agent. Four essential elements form the architecture:

flowchart TB
    subgraph Agent["AI Agent Architecture"]
        M["Model
(Brain)"] T["Tools
(Hands)"] O["Orchestration
(Nervous System)"] D["Deployment
(Body & Legs)"] end M --> O T --> O O --> D U[User/Trigger] --> Agent Agent --> E[Environment/APIs] classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff classDef purpleClass fill:#9B59B6,stroke:#333,stroke-width:2px,color:#fff class M blueClass class T orangeClass class O greenClass class D purpleClass
Component Role Description
Model Brain Core LM/foundation model serving as reasoning engine to process information and make decisions
Tools Hands Mechanisms connecting reasoning to outside world: APIs, code functions, data stores
Orchestration Nervous System Governing process managing planning, memory, reasoning strategy execution
Deployment Body & Legs Production hosting with monitoring, logging, management services

The Developer Paradigm Shift

Traditional developer = “bricklayer” precisely defining every logical step.

Agent developer = “director” who sets the scene (instructions/prompts), selects the cast (tools/APIs), and provides context (data), guiding an autonomous “actor” to deliver intended performance.

An agent is a system dedicated to the art of context window curation. It’s a relentless loop of assembling context, prompting the model, observing results, and re-assembling context for the next step.

The Agentic Problem-Solving Process

Agents operate on a continuous, cyclical process broken into five fundamental steps:

flowchart LR
    subgraph Loop["Agent Loop"]
        direction LR
        G["1. Get the
Mission"] --> S["2. Scan
the Scene"] S --> T["3. Think It
Through"] T --> A["4. Take
Action"] A --> O["5. Observe
& Iterate"] O --> T end User[User/Trigger] --> G O --> |Goal Achieved| Result[Final Result] classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff classDef pinkClass fill:#E74C3C,stroke:#333,stroke-width:2px,color:#fff classDef purpleClass fill:#9B59B6,stroke:#333,stroke-width:2px,color:#fff class G blueClass class S orangeClass class T greenClass class A pinkClass class O purpleClass

Step 1 - Get the Mission: Process initiates with a high-level goal from user or automated trigger.

Step 2 - Scan the Scene: Agent perceives environment, accessing resources: user request, short-term memory, tools.

Step 3 - Think It Through: Core reasoning loop where agent analyzes mission against scene and devises a plan.

Step 4 - Take Action: Orchestration layer executes the plan by calling APIs, running functions, querying databases.

Step 5 - Observe and Iterate: Agent observes outcome, adds to context, and loops back to Step 3 until mission achieved.

Real-World Example: Customer Support Agent

User asks: “Where is my order #12345?”

Instead of immediately acting, the agent enters “Think It Through” phase:

  1. Identify: Find order in internal database to confirm existence
  2. Track: Extract tracking number, query carrier API for live status
  3. Report: Synthesize information into clear response

Execution:

  • Act: Call find_order("12345") → Observe: Order record with tracking “ZYX987”
  • Act: Call get_shipping_status("ZYX987") → Observe: “Out for Delivery”
  • Report: “Your order #12345 is ‘Out for Delivery’!”

A Taxonomy of Agentic Systems

Understanding the operational loop is one part. Recognizing how it scales in complexity is the second. Five levels classify agentic systems:

flowchart LR
    L0["L0: Core Reasoning"] --> L1["L1: Connected"] --> L2["L2: Strategic"] --> L3["L3: Multi-Agent"] --> L4["L4: Self-Evolving"]

    classDef level0 fill:#9B59B6,stroke:#333,stroke-width:2px,color:#fff
    classDef level1 fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef level2 fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
    classDef level3 fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef level4 fill:#E74C3C,stroke:#333,stroke-width:2px,color:#fff

    class L0 level0
    class L1 level1
    class L2 level2
    class L3 level3
    class L4 level4

Level 0: Core Reasoning System

LM operates in isolation based on pre-trained knowledge. Strength: depth of established concepts. Weakness: no real-time awareness, blind to events outside training data.

Level 1: Connected Problem-Solver

Reasoning engine connects to external tools. Can answer real-time questions by invoking search APIs, financial APIs, or RAG databases.

Level 2: Strategic Problem-Solver

Significant capability expansion, from executing simple tasks to strategically planning complex, multi-part goals. Key skill: context engineering, actively selecting, packaging, managing relevant information for each step. This aligns with our coverage of prompt chaining workflows.

Level 3: Collaborative Multi-Agent System

Paradigm shifts from single “super-agent” to “team of specialists” mirroring human organizations. Agents treat other agents as tools. See our deep dive on multi-agent architecture.

Example: “Project Manager” agent receiving mission “Launch ‘Solaris’ headphones”:

  • Delegates to MarketResearchAgent: Analyze competitor pricing
  • Delegates to MarketingAgent: Draft press release versions
  • Delegates to WebDevAgent: Generate product page HTML

Level 4: Self-Evolving System

Profound leap from delegation to autonomous creation. Systems identify gaps in capabilities and dynamically create new tools or agents.

Core Agent Architecture Deep Dive

Model: The “Brain”

LM selection dictates cognitive capabilities, cost, and speed. Key insight: generic benchmarks don’t predict real-world success.

Real-world success demands models excelling at:

  • Superior reasoning to navigate multi-step problems
  • Reliable tool use to interact with the world

Model Routing Strategy: Use frontier model (Gemini 2.5 Pro) for complex reasoning, route simpler tasks to cost-effective model (Gemini 2.5 Flash). We covered routing patterns here.

Tools: The “Hands”

Tools connect reasoning to reality via a three-part loop: defining what tool can do, invoking it, and observing results. See Extending Agents with Tools for implementation details.

Retrieving Information (Grounding):

  • RAG: Query vector databases, knowledge graphs
  • NL2SQL: Query databases for analytics
  • Google Search: Web knowledge access

Executing Actions (Changing the World):

  • Wrap APIs as tools to send emails, schedule meetings
  • Execute code on-the-fly in secure sandboxes
  • Human-in-the-Loop (HITL): Pause for confirmation via ask_for_confirmation()

Function Calling Standards:

  • OpenAPI specification: Structured contracts for tools
  • Model Context Protocol (MCP): Convenient discovery/connection
  • Native tools: Gemini with native Google Search

The Orchestration Layer

The central nervous system connecting model and tools. Runs the “Think, Act, Observe” loop and governs agent behavior.

Core Design Choices:

  1. Degree of Autonomy: Spectrum from deterministic workflows (LM as tool) to LM-in-driver-seat (dynamic planning)
  2. Implementation Method: No-code builders vs. code-first frameworks (Google ADK)

Production-grade framework requirements:

  • Open: Plug in any model/tool, prevent vendor lock-in
  • Precise control: Hybrid approach where non-deterministic reasoning is governed by hard-coded rules
  • Observability: Detailed traces/logs exposing entire reasoning trajectory

Instruct with Domain Knowledge and Persona: System prompt serves as agent’s constitution: persona, constraints, output schema, tone of voice, tool guidance.

Augment with Context (Memory) (see Agent State and Memory):

  • Short-term memory: Active scratchpad for current conversation, (Action, Observation) pairs
  • Long-term memory: Persistence across sessions via RAG tools connected to vector databases

Multi-Agent Design Patterns

As tasks grow complex, “team of specialists” approach beats single super-agent. We explored these in Evaluator-Optimizer and Orchestrator-Worker Patterns:

Pattern Use Case Description
Coordinator Dynamic/non-linear tasks Manager agent routes sub-tasks to specialists
Sequential Linear workflows Output from one agent becomes input for next
Iterative Refinement Quality control Generator + Critic agents in feedback loop
Human-in-the-Loop High-stakes tasks Deliberate pause for human approval

Agent Ops: A Structured Approach to the Unpredictable

Traditional software: assert output == expected. This fails for stochastic agentic systems where responses are probabilistic by design.

Agent Ops is DevOps + MLOps tailored for AI agents, turning unpredictability into a managed, measurable feature.

flowchart TB
    subgraph AgentOps["Agent Ops Framework"]
        M["Measure
(KPIs, Business Metrics)"] Q["Quality
(LM Judge Evaluation)"] D["Debug
(OpenTelemetry Traces)"] F["Feedback
(Human Input Loop)"] end M --> Q Q --> D D --> F F --> |Improve| M classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff classDef pinkClass fill:#E74C3C,stroke:#333,stroke-width:2px,color:#fff class M blueClass class Q orangeClass class D greenClass class F pinkClass

Key Agent Ops Practices

1. Measure What Matters
Frame observability like an A/B test. Define KPIs that prove business value:

  • Goal completion rates
  • User satisfaction scores
  • Task latency
  • Cost per interaction
  • Revenue/conversion impact

2. Quality via LM Judge
Use powerful model to assess agent output against predefined rubric:

  • Did it give right answer?
  • Was response factually grounded?
  • Did it follow instructions?

Build golden datasets covering full breadth of use cases, reviewed by domain experts.

3. Metrics-Driven Development
Run new version against entire evaluation dataset, compare scores to production version. Use A/B deployments for real-world validation.

4. Debug with OpenTelemetry Traces
High-fidelity, step-by-step recording of entire execution path:

  • Exact prompt sent to model
  • Model’s internal reasoning
  • Tool chosen and parameters generated
  • Raw observation data

5. Cherish Human Feedback
Bug reports and “thumbs down” = gifts. Close the loop: capture feedback → replicate issue → convert to permanent test case.

Agent Interoperability

Agents and Humans

User Interfaces: From chatbots to rich dynamic front-ends powered by structured JSON responses.

Computer Use: LM takes control of UI to navigate pages, highlight buttons, pre-fill forms.

UI Control Protocols:

  • MCP UI: Control UI via MCP tools
  • AG UI: Event passing with shared state
  • A2UI: Generate bespoke interfaces via structured output

Multimodal Communication: Gemini Live API enables bidirectional streaming where you speak to agent and interrupt naturally. Opens use cases impossible with text.

Agents and Agents

Challenge: discovery (how to find agents and their capabilities?) and communication (standard protocol?)

Agent2Agent (A2A) Protocol: Universal handshake for agentic economy.

  • Agent Card: JSON file advertising capabilities, endpoint, security credentials
  • Task-oriented architecture: Asynchronous tasks with streaming updates over long-running connections

Agents and Money

New standards needed for agents to transact securely:

  • Agent Payments Protocol (AP2): Cryptographically-signed “mandates” as verifiable proof of user intent
  • x402: HTTP 402 “Payment Required” for frictionless machine-to-machine micropayments

Securing AI Agents

The Trust Trade-Off

Every ounce of power introduces corresponding risk. Primary concerns:

  • Rogue actions: Unintended/harmful behaviors
  • Sensitive data disclosure: Leaking private information

Defense-in-Depth Approach

Layer 1 - Deterministic Guardrails: Hardcoded rules as security chokepoint outside model’s reasoning (e.g., block purchases over $100).

Layer 2 - Reasoning-Based Defenses:

  • Adversarial training for resilience
  • Guard models screening proposed plans
  • Model Armor for prompt injection, jailbreak, PII leakage detection

Agent Identity: A New Class of Principal

Principal Authentication Notes
Users OAuth/SSO Full autonomy and responsibility
Agents SPIFFE Delegated authority, actions on behalf of users
Service accounts IAM Fully deterministic, no responsibility

Each agent needs cryptographically verifiable identity with specific, least-privilege permissions.

Enterprise Governance

Central Gateway as Control Plane: Mandatory entry point for all agentic traffic:

  • User-to-agent prompts
  • Agent-to-tool calls (MCP)
  • Agent-to-agent collaboration (A2A)
  • Direct inference requests

Functions:

  1. Runtime Policy Enforcement: Single pane of glass for authentication, authorization, observability
  2. Centralized Governance: Central registry serving as enterprise app store for agents/tools with lifecycle management

How Agents Evolve and Learn

Without adaptation, agent performance degrades over time (“aging”). Scalable solution: agents that learn and evolve autonomously.

Learning Sources

  • Runtime Experience: Session logs, traces, memory, HITL feedback
  • External Signals: Updated policies, regulatory guidelines, agent critiques

Adaptation Techniques

Enhanced Context Engineering: Continuously refine prompts, few-shot examples, memory retrieval.

Tool Optimization and Creation: Identify capability gaps, gain access to new tools, create tools on-the-fly, modify existing tools.

Agent Gym: The Next Frontier

Dedicated platform optimizing multi-agent systems in offline processes:

  1. Not in execution path, standalone off-production platform
  2. Simulation environment for trial-and-error
  3. Synthetic data generators for pressure testing
  4. Arsenal of optimization tools via MCP/A2A
  5. Connect to human domain experts for tribal knowledge

Advanced Agent Examples

Google Co-Scientist

Virtual research collaborator accelerating scientific discovery. Multi-agent ecosystem:

  • Supervisor agent: Project manager delegating to specialists
  • Generation Agent: Literature exploration, simulated debate
  • Reflection Agent: Full review, simulation review
  • Ranking Agent: Tournament-style hypothesis comparison
  • Evolution Agent: Inspiration, simplification, research extension
  • Meta-review Agent: Research overview formulation

Agents work for hours/days, improving hypotheses through loops and meta-loops.

AlphaEvolve Agent

Discovers and optimizes algorithms for complex problems in mathematics and computer science.

Core approach:

  1. Gemini generates potential solutions
  2. Automated evaluator scores them
  3. Most promising ideas inspire next generation

Breakthroughs:

  • Improved Google data center efficiency
  • Faster matrix multiplication algorithms
  • Solutions to open mathematical problems

Conclusion

Key takeaways from Google’s framework:

  1. Three Core Components: Model (Brain) + Tools (Hands) + Orchestration (Nervous System) operating in continuous “Think, Act, Observe” loop

  2. Taxonomy for Scoping: From Level 1 Connected Problem-Solver to Level 4 Self-Evolving System

  3. New Developer Paradigm: Shift from “bricklayer” defining explicit logic to “architect/director” guiding autonomous entities

  4. Success = Engineering Rigor: Not found in initial prompt alone, but in robust tool contracts, resilient error handling, sophisticated context management, comprehensive evaluation

  5. Agent Ops is Essential: Manage stochastic systems through measurement, LM-judge evaluation, tracing, and human feedback loops

  6. Security is Foundational: Defense-in-depth combining deterministic guardrails with reasoning-based defenses, proper identity management

  7. Evolution is Key: Build agents that learn from runtime experience and external signals to prevent aging

This whitepaper aligns with concepts we’ve explored in our Agentic AI series:


This post summarizes Google’s “Introduction to Agents” whitepaper (November 2025) by Alan Blount, Antonio Gulli, Shubham Saboo, Michael Zimmermann, and Vladimir Vuskovic. The original whitepaper is part of a five-part series covering agent architecture, tools, memory, quality, and deployment.

Orchestrating Financial Operations

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×