Summary: Google's Context Engineering - Sessions & Memory

This is the third installment in our Agentic AI series, following Google’s Introduction to Agents and Agent Tools & MCP. While those papers covered agent architecture and tool integration, this one focuses on how agents manage context across conversations through sessions and memory.

Source: Context Engineering: Sessions & Memory (PDF) by Kimberly Milam and Antonio Gulli, Google (November 2025)

Why Context Engineering Matters

Without proper context, LLMs suffer from the “goldfish problem”: each request is processed in isolation without memory of past interactions. Context engineering solves this by dynamically assembling the information an LLM needs at request time.

The context window receives six types of information:

Component Purpose
System Instructions Define agent behavior, constraints, persona
Conversation History Prior messages in the current session
Tool Definitions Available functions the agent can call
Memories Long-term facts about the user across sessions
RAG Results Retrieved documents relevant to the query
Output Structure Response format constraints (JSON schema, etc.)

Why does this matter? A well-engineered context reduces hallucinations, improves factual grounding, and enables personalized responses. The challenge is fitting all relevant information within token limits while maintaining coherence.

flowchart LR
    subgraph Inputs["Context Sources"]
        I[Instructions]
        H[History]
        M[Memory]
        R[RAG]
    end

    subgraph Core["Processing"]
        A[Context
Assembly] --> L[LLM]
        L --> O[Response]
    end

    I --> A
    H --> A
    M --> A
    R --> A
    O -.->|Update| H

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff

    class I,H,M,R blueClass
    class A,L orangeClass
    class O greenClass

Sessions: The Conversation Container

A session is the container for a single conversation between a user and an agent. It has two primary components:

  1. Events: The chronological log of messages, tool calls, and results
  2. State: Key-value pairs representing working memory

Sessions are scoped to a single user and a single conversation thread. When the user starts a new conversation, a new session begins. Sessions are typically volatile - they exist only for the duration of the interaction.

flowchart TB
    subgraph Session["Session (Within Conversation)"]
        direction LR
        E1["User
Message"] --> E2["Agent
Response"] --> E3["Tool
Call"] --> E4["Tool
Result"]
        S["State: {preference: dark, order_id: 123}"]
    end

    subgraph Memory["Memory (Across Conversations)"]
        direction LR
        M1["Prefers dark mode"] --> M2["Vegetarian"] --> M3["Premium member"]
    end

    Session -.->|"Extract"| Memory

    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff

    class E1,E2,E3,E4,S orangeClass
    class M1,M2,M3 greenClass

Events: The Conversation Log

Events are the atomic units of conversation history. Each event captures:

  • Type: User message, agent response, tool call, tool result
  • Content: The actual payload (text, function arguments, return values)
  • Timestamp: When the event occurred
  • Metadata: Additional context (model used, latency, etc.)

Events are append-only. Once recorded, they form an immutable log of the conversation flow. This log becomes the “conversation history” component of the context window.

State: Working Memory

State provides a key-value store for working memory within a session. Unlike events, state is mutable - values can be updated as the conversation progresses.

Common uses for session state:

  • User preferences set during the conversation
  • Extracted entities from tool calls (order IDs, product names)
  • Workflow progress tracking multi-step processes
  • Temporary calculations that inform later responses

State is accessible to tools, allowing them to read context and write results that persist across turns.

Session State Management Patterns

The ADK provides a clean API for session management. Sessions are created via a session store, which handles persistence:

1
2
3
4
5
6
7
from google.adk.sessions import InMemorySessionStore

session_store = InMemorySessionStore()
session = session_store.create_session(
app_name="support-agent",
user_id="user-123"
)

Once created, state operates like a dictionary with prefix-based access for tools:

1
2
3
4
5
6
7
8
9
10
11
12
# Write to session state
session.state["user_preference"] = "dark_mode"
session.state["last_order_id"] = "ORD-456"

# Read from session state (available to tools via ToolContext)
preference = session.state.get("user_preference")

# Tools access state via context
def check_order(order_id: str, context: ToolContext) -> dict:
# Read state set earlier in conversation
user_pref = context.state.get("user_preference")
return {"status": "shipped", "theme": user_pref}

Key patterns for state management:

Pattern Description
Read-through Check state before external calls to avoid redundant lookups
Write-behind Update state after tool execution for future reference
Prefix isolation Use tool_name:key prefixes to avoid collisions
TTL handling Clear stale state after session timeout

Compaction Strategies for Efficiency

As conversations grow, the event log can exceed context window limits. Compaction reduces the history size while preserving essential information.

Three primary strategies exist:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def truncate_context(events: list, max_tokens: int) -> list:
"""Keep newest events within token limit."""
total = 0
result = []
for event in reversed(events):
tokens = count_tokens(event)
if total + tokens > max_tokens:
break
result.insert(0, event)
total += tokens
return result

def keep_last_n(events: list, n: int) -> list:
"""Keep the N most recent events."""
return events[-n:] if len(events) > n else events

def recursive_summarize(events: list, llm) -> str:
"""LLM-generated summary preserving key facts."""
return llm.generate(
f"Summarize conversation preserving: user preferences, "
f"decisions made, pending actions:\n{events}"
)
Strategy Information Loss Cost Latency Best For
Truncation High (old context lost) Low Low Simple queries
Keep-Last-N Medium Low Low Multi-turn dialogs
Recursive Summarization Low High (LLM calls) High Long sessions

The whitepaper recommends hybrid approaches: use truncation for initial reduction, then summarization to preserve semantic content from discarded events.

Memory: Persistence Across Conversations

While sessions handle within-conversation context, memory handles cross-conversation persistence. Memory answers: “What should the agent remember about this user after the conversation ends?”

Memory differs from sessions in scope and lifecycle:

Aspect Session Memory
Scope Single conversation All conversations
Lifecycle Ephemeral Persistent
Content Raw events + state Extracted facts
Update Append-only events Consolidated facts

Declarative vs Procedural Memory

The whitepaper distinguishes two memory types:

Declarative Memory (facts and events):

  • “User prefers dark mode”
  • “User is vegetarian”
  • “User ordered product X on date Y”

Procedural Memory (skills and behaviors):

  • “When user asks about orders, check the last 5 first”
  • “Always greet returning users by name”
  • “Escalate billing issues to human support”

Declarative memory stores what the agent knows. Procedural memory stores how the agent should behave. Both are extracted from conversation history and refined over time.

The Memory Generation Pipeline

Memory generation follows an ETL pattern: Extract facts from sessions, Transform via consolidation, Load into storage for retrieval.

flowchart LR
    subgraph Extract["1. Extraction"]
        S[Session
Events] --> E[LLM
Extraction]
    end

    subgraph Transform["2. Consolidation"]
        E --> C[Merge &
Dedupe]
        C --> V[Validate &
Score]
    end

    subgraph Load["3. Storage"]
        V --> DB[(Memory
Store)]
        DB --> I[Vector
Index]
    end

    subgraph Retrieve["4. Retrieval"]
        Q[Query] --> I
        I --> R[Relevant
Memories]
    end

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
    classDef tealClass fill:#16A085,stroke:#333,stroke-width:2px,color:#fff

    class S,Q blueClass
    class E,C,V orangeClass
    class DB,I greenClass
    class R tealClass

The ADK provides memory services that abstract this pipeline:

1
2
3
4
5
6
7
8
9
10
11
12
13
from google.adk.memory import InMemoryMemoryService

memory_service = InMemoryMemoryService()

# Extract memories from completed session
await memory_service.add_session_to_memory(session)

# Query memories for context assembly
memories = await memory_service.search_memory(
app_name="support-agent",
user_id="user-123",
query="user preferences and past orders"
)

Key implementation considerations:

  • Async extraction: Don’t block the conversation. Extract memories in background jobs.
  • Batch consolidation: Merge related facts periodically (nightly or on-demand).
  • Confidence scoring: Track memory reliability. User-stated facts outrank inferred ones.
  • Expiration policies: Old or contradicted memories should decay or be removed.

Memory-as-a-Tool Pattern

Rather than automatically injecting memories into context, the Memory-as-a-Tool pattern gives the agent explicit control. The agent decides when to remember and when to recall.

1
2
3
4
5
6
7
8
9
10
11
12
13
from google.adk.tools.memory import LoadMemoryTool, SaveMemoryTool

load_memory_tool = LoadMemoryTool(
memory_service=memory_service,
description="Retrieve user preferences and past interactions"
)

save_memory_tool = SaveMemoryTool(
memory_service=memory_service,
description="Store important user information for future use"
)

agent = Agent(tools=[load_memory_tool, save_memory_tool])

Benefits of this pattern:

Benefit Description
Explicit reasoning Agent must decide what’s worth remembering
Token efficiency Only load memories when needed
Auditability Memory operations appear in event log
User control Easy to add confirmation for sensitive memories

The trade-off is increased complexity: the agent must learn when to use memory tools, which requires good prompting or fine-tuning.

Multi-Agent Session Architectures

When multiple agents collaborate on a task, session management becomes more complex. The whitepaper identifies two primary patterns:

flowchart TB
    subgraph Shared["Shared Unified History"]
        direction LR
        A1[Research
Agent] --> H[Shared
Session]
        A2[Writing
Agent] --> H
        A3[Review
Agent] --> H
    end

    subgraph Separate["Separate Individual Histories"]
        direction LR
        B1[Agent A] --> HA[Session A]
        B2[Agent B] --> HB[Session B]
        B3[Agent C] --> HC[Session C]
    end

    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff

    class A1,A2,A3,H greenClass
    class B1,B2,B3,HA,HB,HC orangeClass

Shared Unified History: All agents read from and write to a single session. Best for tightly coordinated workflows where each agent needs full context of what others have done. Example: a research-write-review pipeline where the reviewer needs to see both research findings and draft content.

Separate Individual Histories: Each agent maintains its own session. Agents communicate through explicit message passing. Best for privacy-sensitive scenarios or loosely coupled workflows. Example: a customer service escalation where the specialist agent shouldn’t see the full history of failed resolution attempts.

The A2A (Agent-to-Agent) protocol provides a standard for inter-agent communication. At a high level, it defines message formats and handshake sequences that allow agents to exchange context without sharing full session histories. This enables hybrid approaches where agents share summaries rather than raw events.

RAG vs Memory: Complementary Roles

A common question: “Should I use RAG or Memory for personalization?” The answer: both, for different purposes.

flowchart TB
    subgraph RAG["RAG: Expert on Facts"]
        direction LR
        D[Document
Corpus] --> S1[Semantic
Search] --> F[Domain
Facts]
    end

    subgraph Mem["Memory: Expert on User"]
        direction LR
        U[User
History] --> L[LLM
Extraction] --> P[Preferences &
Behaviors]
    end

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef purpleClass fill:#9B59B6,stroke:#333,stroke-width:2px,color:#fff

    class D,S1,F blueClass
    class U,L,P purpleClass
Aspect RAG Memory
Source External documents, knowledge bases User interaction history
Focus Factual domain knowledge Behavioral patterns, preferences
Retrieval Semantic similarity to query User-specific index
Update Frequency When documents are ingested After each conversation
Scope Shared across all users Per-user
Use Case “What is the return policy?” “Does this user prefer email or chat?”

RAG answers domain questions. Memory personalizes the answers. A support agent uses RAG to find the return policy, then uses Memory to know this user prefers concise bullet points over paragraphs.

Memory Provenance and Trust

Not all memories are equally reliable. The whitepaper introduces provenance tracking: recording where each memory came from and how confident we should be in it.

Trust hierarchy (highest to lowest):

  1. User-stated facts: “I’m vegetarian” - explicitly declared by user
  2. Observed behaviors: User consistently orders vegetarian options - inferred from patterns
  3. Single observations: User ordered a salad once - weak signal
  4. Inferred preferences: Assumed vegetarian because user mentioned health concerns - guesswork

When memories conflict, higher-trust sources override lower-trust ones. If a user explicitly says “I eat meat now,” that overrides observed vegetarian ordering patterns.

Confidence scores quantify reliability. A memory with 0.9 confidence might be included in context, while 0.3 confidence memories might be held back unless specifically relevant.

Framework Comparison: ADK vs LangGraph

Both Google’s ADK and LangGraph provide session and memory abstractions, but with different philosophies:

Feature Google ADK LangGraph
Session Store InMemorySessionStore, DatabaseSessionStore Checkpointer (memory, SQLite, PostgreSQL)
Memory Service InMemoryMemoryService, VertexAIRagMemoryService MemorySaver, custom stores
State Access session.state["key"] config["configurable"]["thread_id"]
Memory-as-Tool Built-in LoadMemoryTool, SaveMemoryTool Custom tool functions
Multi-Agent A2A protocol support LangGraph Studio orchestration
Persistence In-memory, Cloud SQL, Firestore SQLite, PostgreSQL, Redis

Choose ADK when building on Google Cloud with Vertex AI integration, or when A2A interoperability is important.

Choose LangGraph when you need flexible graph-based workflows, tight integration with LangChain ecosystem, or prefer Python-native state management.

Both frameworks are evolving rapidly. The concepts (sessions, state, memory extraction) transfer between them even as APIs differ.

Production Considerations

Deploying session and memory systems requires attention to security, privacy, and operations:

Area Requirement Implementation
Access Control Who can read/write sessions? Role-based access, API keys per agent
Encryption Data at rest and in transit TLS 1.3, AES-256 for storage
Audit Trails Log sensitive operations Cloud Logging, structured events
User Consent Explicit opt-in for memory Consent UI before memory extraction
Data Minimization Extract only necessary info Scoped extraction prompts
Retention Policies Auto-delete old data TTL on sessions, user-triggered deletion

Performance considerations:

  • Run memory extraction asynchronously to avoid blocking conversations
  • Cache frequently-accessed memories at the edge
  • Batch consolidation jobs during off-peak hours
  • Monitor session size growth and memory retrieval hit rates

Connecting to the Agentic AI Series

This whitepaper extends concepts from our broader exploration of agentic AI:

Related Post Connection
Anatomy of an AI Agent Context engineering feeds the cognitive architecture
Introduction to Agents Sessions implement the agent’s working memory module
Agent Tools & MCP Memory-as-a-Tool extends the tool taxonomy

Sessions are the agent’s working memory during a task. Memory is its long-term storage across tasks. Together, they enable the continuity that transforms stateless LLMs into persistent, personalized agents.

Key Takeaways

  1. Context Engineering is the Cognitive Assembly Line - Dynamically assembling instructions, history, tools, memories, and RAG reduces hallucinations and enables grounded, personalized responses.

  2. Sessions Enable Conversation Continuity - Events track the immutable conversation log; State provides mutable working memory for tools and follow-up queries.

  3. Memory Extends Beyond Sessions - An LLM-driven ETL pipeline extracts, consolidates, and retrieves facts across conversations for long-term personalization.

  4. Compaction Trades Information for Efficiency - Choose truncation (fast, lossy), keep-last-N (balanced), or recursive summarization (expensive, semantic preservation).

  5. Memory-as-a-Tool Gives Agents Autonomy - Explicit tools let agents reason about what to remember and when to recall, improving auditability and user control.

  6. RAG and Memory are Complementary - RAG provides domain expertise from documents; Memory provides user expertise from interactions. Use both.

  7. Production Requires Security and Privacy First - Access control, encryption, consent management, and retention policies are essential before deployment.

References

Financial Tools and Structured Outputs Summary: Google's Agent Quality & Evaluation Framework

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×