This is the third installment in our Agentic AI series, following Google’s Introduction to Agents and Agent Tools & MCP. While those papers covered agent architecture and tool integration, this one focuses on how agents manage context across conversations through sessions and memory.
Source: Context Engineering: Sessions & Memory (PDF) by Kimberly Milam and Antonio Gulli, Google (November 2025)
Why Context Engineering Matters
Without proper context, LLMs suffer from the “goldfish problem”: each request is processed in isolation without memory of past interactions. Context engineering solves this by dynamically assembling the information an LLM needs at request time.
The context window receives six types of information:
| Component | Purpose |
|---|---|
| System Instructions | Define agent behavior, constraints, persona |
| Conversation History | Prior messages in the current session |
| Tool Definitions | Available functions the agent can call |
| Memories | Long-term facts about the user across sessions |
| RAG Results | Retrieved documents relevant to the query |
| Output Structure | Response format constraints (JSON schema, etc.) |
Why does this matter? A well-engineered context reduces hallucinations, improves factual grounding, and enables personalized responses. The challenge is fitting all relevant information within token limits while maintaining coherence.
flowchart LR
subgraph Inputs["Context Sources"]
I[Instructions]
H[History]
M[Memory]
R[RAG]
end
subgraph Core["Processing"]
A[Context
Assembly] --> L[LLM]
L --> O[Response]
end
I --> A
H --> A
M --> A
R --> A
O -.->|Update| H
classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
class I,H,M,R blueClass
class A,L orangeClass
class O greenClass
Sessions: The Conversation Container
A session is the container for a single conversation between a user and an agent. It has two primary components:
- Events: The chronological log of messages, tool calls, and results
- State: Key-value pairs representing working memory
Sessions are scoped to a single user and a single conversation thread. When the user starts a new conversation, a new session begins. Sessions are typically volatile - they exist only for the duration of the interaction.
flowchart TB
subgraph Session["Session (Within Conversation)"]
direction LR
E1["User
Message"] --> E2["Agent
Response"] --> E3["Tool
Call"] --> E4["Tool
Result"]
S["State: {preference: dark, order_id: 123}"]
end
subgraph Memory["Memory (Across Conversations)"]
direction LR
M1["Prefers dark mode"] --> M2["Vegetarian"] --> M3["Premium member"]
end
Session -.->|"Extract"| Memory
classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
class E1,E2,E3,E4,S orangeClass
class M1,M2,M3 greenClass
Events: The Conversation Log
Events are the atomic units of conversation history. Each event captures:
- Type: User message, agent response, tool call, tool result
- Content: The actual payload (text, function arguments, return values)
- Timestamp: When the event occurred
- Metadata: Additional context (model used, latency, etc.)
Events are append-only. Once recorded, they form an immutable log of the conversation flow. This log becomes the “conversation history” component of the context window.
State: Working Memory
State provides a key-value store for working memory within a session. Unlike events, state is mutable - values can be updated as the conversation progresses.
Common uses for session state:
- User preferences set during the conversation
- Extracted entities from tool calls (order IDs, product names)
- Workflow progress tracking multi-step processes
- Temporary calculations that inform later responses
State is accessible to tools, allowing them to read context and write results that persist across turns.
Session State Management Patterns
The ADK provides a clean API for session management. Sessions are created via a session store, which handles persistence:
1 | from google.adk.sessions import InMemorySessionStore |
Once created, state operates like a dictionary with prefix-based access for tools:
1 | # Write to session state |
Key patterns for state management:
| Pattern | Description |
|---|---|
| Read-through | Check state before external calls to avoid redundant lookups |
| Write-behind | Update state after tool execution for future reference |
| Prefix isolation | Use tool_name:key prefixes to avoid collisions |
| TTL handling | Clear stale state after session timeout |
Compaction Strategies for Efficiency
As conversations grow, the event log can exceed context window limits. Compaction reduces the history size while preserving essential information.
Three primary strategies exist:
1 | def truncate_context(events: list, max_tokens: int) -> list: |
| Strategy | Information Loss | Cost | Latency | Best For |
|---|---|---|---|---|
| Truncation | High (old context lost) | Low | Low | Simple queries |
| Keep-Last-N | Medium | Low | Low | Multi-turn dialogs |
| Recursive Summarization | Low | High (LLM calls) | High | Long sessions |
The whitepaper recommends hybrid approaches: use truncation for initial reduction, then summarization to preserve semantic content from discarded events.
Memory: Persistence Across Conversations
While sessions handle within-conversation context, memory handles cross-conversation persistence. Memory answers: “What should the agent remember about this user after the conversation ends?”
Memory differs from sessions in scope and lifecycle:
| Aspect | Session | Memory |
|---|---|---|
| Scope | Single conversation | All conversations |
| Lifecycle | Ephemeral | Persistent |
| Content | Raw events + state | Extracted facts |
| Update | Append-only events | Consolidated facts |
Declarative vs Procedural Memory
The whitepaper distinguishes two memory types:
Declarative Memory (facts and events):
- “User prefers dark mode”
- “User is vegetarian”
- “User ordered product X on date Y”
Procedural Memory (skills and behaviors):
- “When user asks about orders, check the last 5 first”
- “Always greet returning users by name”
- “Escalate billing issues to human support”
Declarative memory stores what the agent knows. Procedural memory stores how the agent should behave. Both are extracted from conversation history and refined over time.
The Memory Generation Pipeline
Memory generation follows an ETL pattern: Extract facts from sessions, Transform via consolidation, Load into storage for retrieval.
flowchart LR
subgraph Extract["1. Extraction"]
S[Session
Events] --> E[LLM
Extraction]
end
subgraph Transform["2. Consolidation"]
E --> C[Merge &
Dedupe]
C --> V[Validate &
Score]
end
subgraph Load["3. Storage"]
V --> DB[(Memory
Store)]
DB --> I[Vector
Index]
end
subgraph Retrieve["4. Retrieval"]
Q[Query] --> I
I --> R[Relevant
Memories]
end
classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
classDef tealClass fill:#16A085,stroke:#333,stroke-width:2px,color:#fff
class S,Q blueClass
class E,C,V orangeClass
class DB,I greenClass
class R tealClass
The ADK provides memory services that abstract this pipeline:
1 | from google.adk.memory import InMemoryMemoryService |
Key implementation considerations:
- Async extraction: Don’t block the conversation. Extract memories in background jobs.
- Batch consolidation: Merge related facts periodically (nightly or on-demand).
- Confidence scoring: Track memory reliability. User-stated facts outrank inferred ones.
- Expiration policies: Old or contradicted memories should decay or be removed.
Memory-as-a-Tool Pattern
Rather than automatically injecting memories into context, the Memory-as-a-Tool pattern gives the agent explicit control. The agent decides when to remember and when to recall.
1 | from google.adk.tools.memory import LoadMemoryTool, SaveMemoryTool |
Benefits of this pattern:
| Benefit | Description |
|---|---|
| Explicit reasoning | Agent must decide what’s worth remembering |
| Token efficiency | Only load memories when needed |
| Auditability | Memory operations appear in event log |
| User control | Easy to add confirmation for sensitive memories |
The trade-off is increased complexity: the agent must learn when to use memory tools, which requires good prompting or fine-tuning.
Multi-Agent Session Architectures
When multiple agents collaborate on a task, session management becomes more complex. The whitepaper identifies two primary patterns:
flowchart TB
subgraph Shared["Shared Unified History"]
direction LR
A1[Research
Agent] --> H[Shared
Session]
A2[Writing
Agent] --> H
A3[Review
Agent] --> H
end
subgraph Separate["Separate Individual Histories"]
direction LR
B1[Agent A] --> HA[Session A]
B2[Agent B] --> HB[Session B]
B3[Agent C] --> HC[Session C]
end
classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
class A1,A2,A3,H greenClass
class B1,B2,B3,HA,HB,HC orangeClass
Shared Unified History: All agents read from and write to a single session. Best for tightly coordinated workflows where each agent needs full context of what others have done. Example: a research-write-review pipeline where the reviewer needs to see both research findings and draft content.
Separate Individual Histories: Each agent maintains its own session. Agents communicate through explicit message passing. Best for privacy-sensitive scenarios or loosely coupled workflows. Example: a customer service escalation where the specialist agent shouldn’t see the full history of failed resolution attempts.
The A2A (Agent-to-Agent) protocol provides a standard for inter-agent communication. At a high level, it defines message formats and handshake sequences that allow agents to exchange context without sharing full session histories. This enables hybrid approaches where agents share summaries rather than raw events.
RAG vs Memory: Complementary Roles
A common question: “Should I use RAG or Memory for personalization?” The answer: both, for different purposes.
flowchart TB
subgraph RAG["RAG: Expert on Facts"]
direction LR
D[Document
Corpus] --> S1[Semantic
Search] --> F[Domain
Facts]
end
subgraph Mem["Memory: Expert on User"]
direction LR
U[User
History] --> L[LLM
Extraction] --> P[Preferences &
Behaviors]
end
classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
classDef purpleClass fill:#9B59B6,stroke:#333,stroke-width:2px,color:#fff
class D,S1,F blueClass
class U,L,P purpleClass
| Aspect | RAG | Memory |
|---|---|---|
| Source | External documents, knowledge bases | User interaction history |
| Focus | Factual domain knowledge | Behavioral patterns, preferences |
| Retrieval | Semantic similarity to query | User-specific index |
| Update Frequency | When documents are ingested | After each conversation |
| Scope | Shared across all users | Per-user |
| Use Case | “What is the return policy?” | “Does this user prefer email or chat?” |
RAG answers domain questions. Memory personalizes the answers. A support agent uses RAG to find the return policy, then uses Memory to know this user prefers concise bullet points over paragraphs.
Memory Provenance and Trust
Not all memories are equally reliable. The whitepaper introduces provenance tracking: recording where each memory came from and how confident we should be in it.
Trust hierarchy (highest to lowest):
- User-stated facts: “I’m vegetarian” - explicitly declared by user
- Observed behaviors: User consistently orders vegetarian options - inferred from patterns
- Single observations: User ordered a salad once - weak signal
- Inferred preferences: Assumed vegetarian because user mentioned health concerns - guesswork
When memories conflict, higher-trust sources override lower-trust ones. If a user explicitly says “I eat meat now,” that overrides observed vegetarian ordering patterns.
Confidence scores quantify reliability. A memory with 0.9 confidence might be included in context, while 0.3 confidence memories might be held back unless specifically relevant.
Framework Comparison: ADK vs LangGraph
Both Google’s ADK and LangGraph provide session and memory abstractions, but with different philosophies:
| Feature | Google ADK | LangGraph |
|---|---|---|
| Session Store | InMemorySessionStore, DatabaseSessionStore | Checkpointer (memory, SQLite, PostgreSQL) |
| Memory Service | InMemoryMemoryService, VertexAIRagMemoryService | MemorySaver, custom stores |
| State Access | session.state["key"] |
config["configurable"]["thread_id"] |
| Memory-as-Tool | Built-in LoadMemoryTool, SaveMemoryTool | Custom tool functions |
| Multi-Agent | A2A protocol support | LangGraph Studio orchestration |
| Persistence | In-memory, Cloud SQL, Firestore | SQLite, PostgreSQL, Redis |
Choose ADK when building on Google Cloud with Vertex AI integration, or when A2A interoperability is important.
Choose LangGraph when you need flexible graph-based workflows, tight integration with LangChain ecosystem, or prefer Python-native state management.
Both frameworks are evolving rapidly. The concepts (sessions, state, memory extraction) transfer between them even as APIs differ.
Production Considerations
Deploying session and memory systems requires attention to security, privacy, and operations:
| Area | Requirement | Implementation |
|---|---|---|
| Access Control | Who can read/write sessions? | Role-based access, API keys per agent |
| Encryption | Data at rest and in transit | TLS 1.3, AES-256 for storage |
| Audit Trails | Log sensitive operations | Cloud Logging, structured events |
| User Consent | Explicit opt-in for memory | Consent UI before memory extraction |
| Data Minimization | Extract only necessary info | Scoped extraction prompts |
| Retention Policies | Auto-delete old data | TTL on sessions, user-triggered deletion |
Performance considerations:
- Run memory extraction asynchronously to avoid blocking conversations
- Cache frequently-accessed memories at the edge
- Batch consolidation jobs during off-peak hours
- Monitor session size growth and memory retrieval hit rates
Connecting to the Agentic AI Series
This whitepaper extends concepts from our broader exploration of agentic AI:
| Related Post | Connection |
|---|---|
| Anatomy of an AI Agent | Context engineering feeds the cognitive architecture |
| Introduction to Agents | Sessions implement the agent’s working memory module |
| Agent Tools & MCP | Memory-as-a-Tool extends the tool taxonomy |
Sessions are the agent’s working memory during a task. Memory is its long-term storage across tasks. Together, they enable the continuity that transforms stateless LLMs into persistent, personalized agents.
Key Takeaways
Context Engineering is the Cognitive Assembly Line - Dynamically assembling instructions, history, tools, memories, and RAG reduces hallucinations and enables grounded, personalized responses.
Sessions Enable Conversation Continuity - Events track the immutable conversation log; State provides mutable working memory for tools and follow-up queries.
Memory Extends Beyond Sessions - An LLM-driven ETL pipeline extracts, consolidates, and retrieves facts across conversations for long-term personalization.
Compaction Trades Information for Efficiency - Choose truncation (fast, lossy), keep-last-N (balanced), or recursive summarization (expensive, semantic preservation).
Memory-as-a-Tool Gives Agents Autonomy - Explicit tools let agents reason about what to remember and when to recall, improving auditability and user control.
RAG and Memory are Complementary - RAG provides domain expertise from documents; Memory provides user expertise from interactions. Use both.
Production Requires Security and Privacy First - Access control, encryption, consent management, and retention policies are essential before deployment.
References
- Context Engineering: Sessions & Memory (PDF) - Original Google whitepaper
- Google ADK Documentation - Agent Development Kit reference
- LangGraph Documentation - LangChain’s agent framework
- MCP Specification - Model Context Protocol for tool interoperability
Comments