Summary: Google's Context Engineering - Sessions & Memory

Jan 12 2026 AI agentic-ai

This is the third installment in our Agentic AI series, following Google’s Introduction to Agents and Agent Tools & MCP. While those papers covered agent architecture and tool integration, this one focuses on how agents manage context across conversations through sessions and memory.

Source: Context Engineering: Sessions & Memory (PDF) by Kimberly Milam and Antonio Gulli, Google (November 2025)

Why Context Engineering Matters

Without proper context, LLMs suffer from the “goldfish problem”: each request is processed in isolation without memory of past interactions. Context engineering solves this by dynamically assembling the information an LLM needs at request time.

The context window receives six types of information:

Component	Purpose
System Instructions	Define agent behavior, constraints, persona
Conversation History	Prior messages in the current session
Tool Definitions	Available functions the agent can call
Memories	Long-term facts about the user across sessions
RAG Results	Retrieved documents relevant to the query
Output Structure	Response format constraints (JSON schema, etc.)

Why does this matter? A well-engineered context reduces hallucinations, improves factual grounding, and enables personalized responses. The challenge is fitting all relevant information within token limits while maintaining coherence.

flowchart LR
    subgraph Inputs["Context Sources"]
        I[Instructions]
        H[History]
        M[Memory]
        R[RAG]
    end

    subgraph Core["Processing"]
        A[Context
Assembly] --> L[LLM]
        L --> O[Response]
    end

    I --> A
    H --> A
    M --> A
    R --> A
    O -.->|Update| H

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff

    class I,H,M,R blueClass
    class A,L orangeClass
    class O greenClass

Sessions: The Conversation Container

A session is the container for a single conversation between a user and an agent. It has two primary components:

Events: The chronological log of messages, tool calls, and results
State: Key-value pairs representing working memory

Sessions are scoped to a single user and a single conversation thread. When the user starts a new conversation, a new session begins. Sessions are typically volatile - they exist only for the duration of the interaction.

flowchart TB
    subgraph Session["Session (Within Conversation)"]
        direction LR
        E1["User
Message"] --> E2["Agent
Response"] --> E3["Tool
Call"] --> E4["Tool
Result"]
        S["State: {preference: dark, order_id: 123}"]
    end

    subgraph Memory["Memory (Across Conversations)"]
        direction LR
        M1["Prefers dark mode"] --> M2["Vegetarian"] --> M3["Premium member"]
    end

    Session -.->|"Extract"| Memory

    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff

    class E1,E2,E3,E4,S orangeClass
    class M1,M2,M3 greenClass

Events: The Conversation Log

Events are the atomic units of conversation history. Each event captures:

Type: User message, agent response, tool call, tool result
Content: The actual payload (text, function arguments, return values)
Timestamp: When the event occurred
Metadata: Additional context (model used, latency, etc.)

Events are append-only. Once recorded, they form an immutable log of the conversation flow. This log becomes the “conversation history” component of the context window.

State: Working Memory

State provides a key-value store for working memory within a session. Unlike events, state is mutable - values can be updated as the conversation progresses.

Common uses for session state:

User preferences set during the conversation
Extracted entities from tool calls (order IDs, product names)
Workflow progress tracking multi-step processes
Temporary calculations that inform later responses

State is accessible to tools, allowing them to read context and write results that persist across turns.

Session State Management Patterns

The ADK provides a clean API for session management. Sessions are created via a session store, which handles persistence:

from google.adk.sessions import InMemorySessionStore

session_store = InMemorySessionStore()
session = session_store.create_session(
    app_name="support-agent",
    user_id="user-123"
)

Once created, state operates like a dictionary with prefix-based access for tools:

# Write to session state
session.state["user_preference"] = "dark_mode"
session.state["last_order_id"] = "ORD-456"

# Read from session state (available to tools via ToolContext)
preference = session.state.get("user_preference")

# Tools access state via context
def check_order(order_id: str, context: ToolContext) -> dict:
    # Read state set earlier in conversation
    user_pref = context.state.get("user_preference")
    return {"status": "shipped", "theme": user_pref}

Key patterns for state management:

Pattern	Description
Read-through	Check state before external calls to avoid redundant lookups
Write-behind	Update state after tool execution for future reference
Prefix isolation	Use `tool_name:key` prefixes to avoid collisions
TTL handling	Clear stale state after session timeout

Compaction Strategies for Efficiency

As conversations grow, the event log can exceed context window limits. Compaction reduces the history size while preserving essential information.

Three primary strategies exist:

def truncate_context(events: list, max_tokens: int) -> list:
    """Keep newest events within token limit."""
    total = 0
    result = []
    for event in reversed(events):
        tokens = count_tokens(event)
        if total + tokens > max_tokens:
            break
        result.insert(0, event)
        total += tokens
    return result

def keep_last_n(events: list, n: int) -> list:
    """Keep the N most recent events."""
    return events[-n:] if len(events) > n else events

def recursive_summarize(events: list, llm) -> str:
    """LLM-generated summary preserving key facts."""
    return llm.generate(
        f"Summarize conversation preserving: user preferences, "
        f"decisions made, pending actions:\n{events}"
    )

Strategy	Information Loss	Cost	Latency	Best For
Truncation	High (old context lost)	Low	Low	Simple queries
Keep-Last-N	Medium	Low	Low	Multi-turn dialogs
Recursive Summarization	Low	High (LLM calls)	High	Long sessions

The whitepaper recommends hybrid approaches: use truncation for initial reduction, then summarization to preserve semantic content from discarded events.

Memory: Persistence Across Conversations

While sessions handle within-conversation context, memory handles cross-conversation persistence. Memory answers: “What should the agent remember about this user after the conversation ends?”

Memory differs from sessions in scope and lifecycle:

Aspect	Session	Memory
Scope	Single conversation	All conversations
Lifecycle	Ephemeral	Persistent
Content	Raw events + state	Extracted facts
Update	Append-only events	Consolidated facts

Declarative vs Procedural Memory

The whitepaper distinguishes two memory types:

Declarative Memory (facts and events):

“User prefers dark mode”
“User is vegetarian”
“User ordered product X on date Y”

Procedural Memory (skills and behaviors):

“When user asks about orders, check the last 5 first”
“Always greet returning users by name”
“Escalate billing issues to human support”

Declarative memory stores what the agent knows. Procedural memory stores how the agent should behave. Both are extracted from conversation history and refined over time.

The Memory Generation Pipeline

Memory generation follows an ETL pattern: Extract facts from sessions, Transform via consolidation, Load into storage for retrieval.

flowchart LR
    subgraph Extract["1. Extraction"]
        S[Session
Events] --> E[LLM
Extraction]
    end

    subgraph Transform["2. Consolidation"]
        E --> C[Merge &
Dedupe]
        C --> V[Validate &
Score]
    end

    subgraph Load["3. Storage"]
        V --> DB[(Memory
Store)]
        DB --> I[Vector
Index]
    end

    subgraph Retrieve["4. Retrieval"]
        Q[Query] --> I
        I --> R[Relevant
Memories]
    end

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
    classDef tealClass fill:#16A085,stroke:#333,stroke-width:2px,color:#fff

    class S,Q blueClass
    class E,C,V orangeClass
    class DB,I greenClass
    class R tealClass

The ADK provides memory services that abstract this pipeline:

from google.adk.memory import InMemoryMemoryService

memory_service = InMemoryMemoryService()

# Extract memories from completed session
await memory_service.add_session_to_memory(session)

# Query memories for context assembly
memories = await memory_service.search_memory(
    app_name="support-agent",
    user_id="user-123",
    query="user preferences and past orders"
)

Key implementation considerations:

Async extraction: Don’t block the conversation. Extract memories in background jobs.
Batch consolidation: Merge related facts periodically (nightly or on-demand).
Confidence scoring: Track memory reliability. User-stated facts outrank inferred ones.
Expiration policies: Old or contradicted memories should decay or be removed.

Memory-as-a-Tool Pattern

Rather than automatically injecting memories into context, the Memory-as-a-Tool pattern gives the agent explicit control. The agent decides when to remember and when to recall.

from google.adk.tools.memory import LoadMemoryTool, SaveMemoryTool

load_memory_tool = LoadMemoryTool(
    memory_service=memory_service,
    description="Retrieve user preferences and past interactions"
)

save_memory_tool = SaveMemoryTool(
    memory_service=memory_service,
    description="Store important user information for future use"
)

agent = Agent(tools=[load_memory_tool, save_memory_tool])

Benefits of this pattern:

Benefit	Description
Explicit reasoning	Agent must decide what’s worth remembering
Token efficiency	Only load memories when needed
Auditability	Memory operations appear in event log
User control	Easy to add confirmation for sensitive memories

The trade-off is increased complexity: the agent must learn when to use memory tools, which requires good prompting or fine-tuning.

Multi-Agent Session Architectures

When multiple agents collaborate on a task, session management becomes more complex. The whitepaper identifies two primary patterns:

flowchart TB
    subgraph Shared["Shared Unified History"]
        direction LR
        A1[Research
Agent] --> H[Shared
Session]
        A2[Writing
Agent] --> H
        A3[Review
Agent] --> H
    end

    subgraph Separate["Separate Individual Histories"]
        direction LR
        B1[Agent A] --> HA[Session A]
        B2[Agent B] --> HB[Session B]
        B3[Agent C] --> HC[Session C]
    end

    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff

    class A1,A2,A3,H greenClass
    class B1,B2,B3,HA,HB,HC orangeClass

Shared Unified History: All agents read from and write to a single session. Best for tightly coordinated workflows where each agent needs full context of what others have done. Example: a research-write-review pipeline where the reviewer needs to see both research findings and draft content.

Separate Individual Histories: Each agent maintains its own session. Agents communicate through explicit message passing. Best for privacy-sensitive scenarios or loosely coupled workflows. Example: a customer service escalation where the specialist agent shouldn’t see the full history of failed resolution attempts.

The A2A (Agent-to-Agent) protocol provides a standard for inter-agent communication. At a high level, it defines message formats and handshake sequences that allow agents to exchange context without sharing full session histories. This enables hybrid approaches where agents share summaries rather than raw events.

RAG vs Memory: Complementary Roles

A common question: “Should I use RAG or Memory for personalization?” The answer: both, for different purposes.

flowchart TB
    subgraph RAG["RAG: Expert on Facts"]
        direction LR
        D[Document
Corpus] --> S1[Semantic
Search] --> F[Domain
Facts]
    end

    subgraph Mem["Memory: Expert on User"]
        direction LR
        U[User
History] --> L[LLM
Extraction] --> P[Preferences &
Behaviors]
    end

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef purpleClass fill:#9B59B6,stroke:#333,stroke-width:2px,color:#fff

    class D,S1,F blueClass
    class U,L,P purpleClass

Aspect	RAG	Memory
Source	External documents, knowledge bases	User interaction history
Focus	Factual domain knowledge	Behavioral patterns, preferences
Retrieval	Semantic similarity to query	User-specific index
Update Frequency	When documents are ingested	After each conversation
Scope	Shared across all users	Per-user
Use Case	“What is the return policy?”	“Does this user prefer email or chat?”

RAG answers domain questions. Memory personalizes the answers. A support agent uses RAG to find the return policy, then uses Memory to know this user prefers concise bullet points over paragraphs.

Memory Provenance and Trust

Not all memories are equally reliable. The whitepaper introduces provenance tracking: recording where each memory came from and how confident we should be in it.

Trust hierarchy (highest to lowest):

User-stated facts: “I’m vegetarian” - explicitly declared by user
Observed behaviors: User consistently orders vegetarian options - inferred from patterns
Single observations: User ordered a salad once - weak signal
Inferred preferences: Assumed vegetarian because user mentioned health concerns - guesswork

When memories conflict, higher-trust sources override lower-trust ones. If a user explicitly says “I eat meat now,” that overrides observed vegetarian ordering patterns.

Confidence scores quantify reliability. A memory with 0.9 confidence might be included in context, while 0.3 confidence memories might be held back unless specifically relevant.

Framework Comparison: ADK vs LangGraph

Both Google’s ADK and LangGraph provide session and memory abstractions, but with different philosophies:

Feature	Google ADK	LangGraph
Session Store	InMemorySessionStore, DatabaseSessionStore	Checkpointer (memory, SQLite, PostgreSQL)
Memory Service	InMemoryMemoryService, VertexAIRagMemoryService	MemorySaver, custom stores
State Access	`session.state["key"]`	`config["configurable"]["thread_id"]`
Memory-as-Tool	Built-in LoadMemoryTool, SaveMemoryTool	Custom tool functions
Multi-Agent	A2A protocol support	LangGraph Studio orchestration
Persistence	In-memory, Cloud SQL, Firestore	SQLite, PostgreSQL, Redis

Choose ADK when building on Google Cloud with Vertex AI integration, or when A2A interoperability is important.

Choose LangGraph when you need flexible graph-based workflows, tight integration with LangChain ecosystem, or prefer Python-native state management.

Both frameworks are evolving rapidly. The concepts (sessions, state, memory extraction) transfer between them even as APIs differ.

Production Considerations

Deploying session and memory systems requires attention to security, privacy, and operations:

Area	Requirement	Implementation
Access Control	Who can read/write sessions?	Role-based access, API keys per agent
Encryption	Data at rest and in transit	TLS 1.3, AES-256 for storage
Audit Trails	Log sensitive operations	Cloud Logging, structured events
User Consent	Explicit opt-in for memory	Consent UI before memory extraction
Data Minimization	Extract only necessary info	Scoped extraction prompts
Retention Policies	Auto-delete old data	TTL on sessions, user-triggered deletion

Performance considerations:

Run memory extraction asynchronously to avoid blocking conversations
Cache frequently-accessed memories at the edge
Batch consolidation jobs during off-peak hours
Monitor session size growth and memory retrieval hit rates

Connecting to the Agentic AI Series

This whitepaper extends concepts from our broader exploration of agentic AI:

Related Post	Connection
Anatomy of an AI Agent	Context engineering feeds the cognitive architecture
Introduction to Agents	Sessions implement the agent’s working memory module
Agent Tools & MCP	Memory-as-a-Tool extends the tool taxonomy

Sessions are the agent’s working memory during a task. Memory is its long-term storage across tasks. Together, they enable the continuity that transforms stateless LLMs into persistent, personalized agents.

Key Takeaways

Context Engineering is the Cognitive Assembly Line - Dynamically assembling instructions, history, tools, memories, and RAG reduces hallucinations and enables grounded, personalized responses.
Sessions Enable Conversation Continuity - Events track the immutable conversation log; State provides mutable working memory for tools and follow-up queries.
Memory Extends Beyond Sessions - An LLM-driven ETL pipeline extracts, consolidates, and retrieves facts across conversations for long-term personalization.
Compaction Trades Information for Efficiency - Choose truncation (fast, lossy), keep-last-N (balanced), or recursive summarization (expensive, semantic preservation).
Memory-as-a-Tool Gives Agents Autonomy - Explicit tools let agents reason about what to remember and when to recall, improving auditability and user control.
RAG and Memory are Complementary - RAG provides domain expertise from documents; Memory provides user expertise from interactions. Use both.
Production Requires Security and Privacy First - Access control, encryption, consent management, and retention policies are essential before deployment.

References

Context Engineering: Sessions & Memory (PDF) - Original Google whitepaper
Google ADK Documentation - Agent Development Kit reference
LangGraph Documentation - LangChain’s agent framework
MCP Specification - Model Context Protocol for tool interoperability

#llm #agentic-ai #python #memory #context-engineering

Summary: Google's Context Engineering - Sessions & Memory

Why Context Engineering Matters

Sessions: The Conversation Container

Events: The Conversation Log

State: Working Memory

Session State Management Patterns

Compaction Strategies for Efficiency

Memory: Persistence Across Conversations

Declarative vs Procedural Memory

The Memory Generation Pipeline

Memory-as-a-Tool Pattern

Multi-Agent Session Architectures

RAG vs Memory: Complementary Roles

Memory Provenance and Trust

Framework Comparison: ADK vs LangGraph

Production Considerations

Connecting to the Agentic AI Series

Key Takeaways

References

Comments

Your browser is out-of-date!

Summary: Google's Context Engineering - Sessions & Memory

Why Context Engineering Matters

Sessions: The Conversation Container

Events: The Conversation Log

State: Working Memory

Session State Management Patterns

Compaction Strategies for Efficiency

Memory: Persistence Across Conversations

Declarative vs Procedural Memory

The Memory Generation Pipeline

Memory-as-a-Tool Pattern

Multi-Agent Session Architectures

RAG vs Memory: Complementary Roles

Memory Provenance and Trust

Framework Comparison: ADK vs LangGraph

Production Considerations

Connecting to the Agentic AI Series

Key Takeaways

References

Related Posts

Comments

Your browser is out-of-date!