A stateless agent treats every interaction as its first - no memory of previous conversations, no awareness of ongoing tasks, no accumulated context. While this works for simple Q&A, real-world applications demand more. In this post, I’ll explore how to give agents memory through state management, enabling them to maintain context across interactions and handle complex multi-step workflows.
The Stateless Problem
Consider a simple travel agent that helps book trips. In a stateless world:
1 2 3 4 5 6 7
User: I want to fly to Tokyo next month Agent: I can help! When exactly would you like to travel?
User: The 15th Agent: I'd be happy to help with flights on the 15th. Where are you traveling to?
User: I just told you - Tokyo!
The agent has forgotten everything between messages. Each turn is isolated, leading to frustrating user experiences and broken workflows.
What is Agent State?
State is the information an agent retains between steps in a workflow. It encompasses:
Conversation history: What was said before
Task progress: Current step in a multi-step process
Gathered data: Information collected along the way
User preferences: Learned details about the user
Pending actions: What still needs to be done
flowchart TD
subgraph State["Agent State"]
H[History]
T[Task Progress]
D[Collected Data]
P[Preferences]
end
I[User Input] --> A[Agent]
State --> A
A --> O[Response]
A --> State
classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
class State blueClass
class A orangeClass
The agent reads from state to understand context, then writes back to state to remember what happened.
State as a Graph
One powerful mental model is treating agent state as a graph structure. Nodes represent states, edges represent transitions triggered by actions or events.
stateDiagram-v2
[*] --> Greeting
Greeting --> CollectingInfo: User provides details
CollectingInfo --> Searching: All info gathered
Searching --> Presenting: Results found
Presenting --> Booking: User selects option
Booking --> Confirmed: Payment processed
Confirmed --> [*]
CollectingInfo --> CollectingInfo: More info needed
Presenting --> Searching: User wants different options
This state machine approach provides:
Predictability: Clear transitions between states
Debuggability: Easy to see where the agent is and how it got there
Recoverability: Can resume from any state after interruption
Implementing Basic State Management
Here’s a simple state container for a conversational agent:
# Add response to state self.state.add_message("assistant", assistant_response)
return assistant_response
# Usage agent = StatefulAgent( persona="You are a helpful travel booking assistant." )
print(agent.process("I want to fly to Tokyo")) # "Great! When would you like to travel to Tokyo?"
print(agent.process("Next month, around the 15th")) # "Perfect, I'll look for flights to Tokyo around the 15th of next month..." # The agent remembers the destination from the previous turn
Short-Term vs Long-Term Memory
Agent memory operates at different timescales:
Type
Scope
Examples
Persistence
Short-term
Current session
Conversation history, task progress
Session duration
Long-term
Across sessions
User preferences, past interactions
Database/file
Memory Types: A Cognitive Perspective
Harrison Chase (LangChain) offers an alternative framing inspired by cognitive science - three memory types that map to how agents learn and recall:
Extracted facts, user preferences, retrieved context
Episodic
Recall of specific past events
Few-shot examples from past successful interactions
Procedural Memory
How the agent does things. In current systems, this is largely static - baked into model weights and application code. Some advanced systems modify prompts based on learned patterns, but true procedural learning (updating weights) remains rare.
Semantic Memory
What the agent knows about the user and world. This powers personalization:
1 2 3 4 5 6 7 8
# Extract semantic memory from conversation defextract_semantic_memory(messages: List[Dict]) -> Dict: prompt = """Extract facts about the user worth remembering: - Preferences (communication style, interests) - Personal details (name, location, role) - Stated constraints or requirements Return as JSON.""" return call_llm_json(prompt, messages)
Episodic Memory
Sequences of past actions that worked well. Implemented via dynamic few-shot prompting:
1 2 3 4 5
# Retrieve relevant episodes for few-shot learning defget_relevant_episodes(current_task: str, episode_store: List[Dict]) -> List[Dict]: # Find past successful interactions similar to current task relevant = semantic_search(current_task, episode_store, top_k=3) return [ep for ep in relevant if ep["outcome"] == "success"]
Memory Update Mechanisms
When should memory be updated? Two primary approaches:
flowchart LR
subgraph Hot["Hot Path (Synchronous)"]
direction LR
H1["User Input"] --> H2["Extract Memory"] --> H3["Update Store"] --> H4["Generate Response"]
end
subgraph Background["Background (Async)"]
direction LR
B1["User Input"] --> B2["Generate Response"]
B1 --> B3["Queue Update"]
B3 --> B4["Async Process"] --> B5["Update Store"]
end
classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
class H1,H2,H3,H4 blueClass
class B1,B2,B3,B4,B5 orangeClass
Approach
Latency
Consistency
Best For
Hot Path
Higher (waits for extraction)
Immediate
Critical personalization
Background
Lower (async update)
Delayed
High-throughput systems
The trade-off: hot path ensures the next response uses updated memory but adds latency. Background processing is faster but updates may not reflect immediately.
Short-Term Memory
Short-term memory lives within the current conversation. It’s typically implemented as an in-memory data structure:
def_trim_if_needed(self): """Remove oldest messages if context is too long""" whileself._estimate_tokens() > self.max_tokens: iflen(self.messages) > 2: # Keep at least system + last exchange self.messages.pop(1) # Remove oldest non-system message else: break
def_estimate_tokens(self) -> int: # Rough estimate: 4 chars per token total_chars = sum(len(m["content"]) for m inself.messages) return total_chars // 4
from langgraph.graph import StateGraph, END from typing import TypedDict, Annotated from operator import add
# Define state schema classTravelState(TypedDict): messages: Annotated[list, add] # Accumulates messages destination: str dates: str passengers: int current_step: str
# Define node functions defgreeting_node(state: TravelState) -> dict: return { "messages": ["Welcome! Where would you like to travel?"], "current_step": "collecting_destination" }
defcollect_destination(state: TravelState) -> dict: # Extract destination from last message last_message = state["messages"][-1] # In real implementation, use LLM to extract return { "destination": "Tokyo", # Extracted value "messages": ["Great choice! When would you like to travel?"], "current_step": "collecting_dates" }
# Compile graph with checkpointing app = workflow.compile(checkpointer=checkpointer)
# Run with thread ID for state persistence config = {"configurable": {"thread_id": "user-123-session-456"}}
# First interaction result = app.invoke( {"messages": ["I want to go to Paris"]}, config )
# Later... resume from checkpoint result = app.invoke( {"messages": ["March 20th"]}, config # Same thread_id - continues from saved state )
Persistent Checkpointing
For production, use database-backed checkpointing:
1 2 3 4 5 6 7 8
from langgraph.checkpoint.sqlite import SqliteSaver
# SQLite for development checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
# For production, use PostgreSQL or Redis # from langgraph.checkpoint.postgres import PostgresSaver # checkpointer = PostgresSaver.from_conn_string(os.environ["DATABASE_URL"])
Managing Context Windows
LLMs have limited context windows. As conversations grow, you need strategies to manage what fits:
defextract_key_facts(messages: List[Dict]) -> Dict: """Use LLM to extract key facts worth remembering""" prompt = """ Extract key facts from this conversation that should be remembered: - User preferences - Important decisions made - Pending actions - Critical information Return as JSON. """ return call_llm_json(prompt, messages)
defget_or_create_session(self, session_id: str = None) -> tuple[str, AgentState]: """Get existing session or create new one""" if session_id and session_id inself.sessions: state = self.sessions[session_id] # Check if session expired if datetime.now() - state.created_at < self.timeout: return session_id, state
defsave_session(self, session_id: str, state: AgentState): self.sessions[session_id] = state
defcleanup_expired(self): """Remove expired sessions""" now = datetime.now() expired = [ sid for sid, state inself.sessions.items() if now - state.created_at > self.timeout ] for sid in expired: delself.sessions[sid]
Key Takeaways
State enables continuity: Without state, every interaction is isolated and context is lost
Model state as a graph: State machines provide clear structure for complex workflows
Separate memory timescales: Short-term for current session, long-term for persistent knowledge
Checkpoint for resilience: Save state to recover from interruptions
Manage context actively: Use sliding windows or summarization to stay within token limits
State management transforms agents from forgetful responders into coherent collaborators that remember, learn, and adapt. In the next post, I’ll explore how agents connect to external systems - databases, APIs, and the wider world.
References
Memory for Agents - Harrison Chase’s deep dive into cognitive memory types and update mechanisms
Comments