Memory is the most consequential architectural decision in coding agent design. It determines whether an agent forgets everything between sessions or progressively learns your codebase, preferences, and patterns. This section analyzes the full spectrum of memory architectures across 13 coding agents, from ephemeral session-based approaches (Claude Code, Codex CLI, Aider) to fully persistent memory block systems (Letta Code), with hybrid strategies in between (Replit trajectory compression, Warp model-aligned summarization, Qwen Code cross-session save).
Context window management is equally critical: a 200k token window sounds generous until MCP server definitions consume 130k tokens, leaving only ~70k for actual work. Every agent faces the same fundamental constraint—finite context, infinite codebase—and their solutions reveal deep architectural trade-offs between continuity, efficiency, and fidelity.
Think of an agent’s memory like a desk. Session-based agents clear the desk completely every time you leave the room—next time you come back, you have to re-explain everything. Persistent-memory agents have a filing cabinet next to the desk: they write down important things (your preferences, project structure, skills learned) on index cards and file them away. When the desk gets too cluttered (context window fills up), some agents have a janitor who compacts the desk by summarizing old papers into sticky notes. The smartest agents use a repo map—a table of contents for your codebase—so they only pull the specific files they need onto the desk.
A fundamental architectural divide separates coding agents into two camps: those that treat each session as a blank slate, and those that maintain memory across sessions. This distinction has profound implications for user experience, architecture complexity, and the types of tasks an agent can handle over time. The majority of agents today remain session-based, relying on file-system workarounds for any cross-session continuity. Only Letta Code implements a true server-persistent memory model where the agent actively manages its own long-term state.
Memory approaches fall on a spectrum from fully ephemeral to fully persistent. Most agents cluster at the ephemeral end, relying on static configuration files (CLAUDE.md, AGENTS.md, replit.md) for cross-session continuity. Letta Code is the only agent that implements true persistent memory with agent-managed updates. Hybrid approaches—Replit’s trajectory compression, Qwen Code’s save_memory tool, Warp’s model-aligned summarization—represent a middle ground where the industry is likely converging.
Letta Code implements the most sophisticated memory architecture of any coding agent. Based on the MemGPT research paper, it treats memory as a first-class primitive: the agent has explicit tools to read and write its own memory blocks, which are embedded directly into the system prompt. This creates a self-modifying agent that learns and adapts across sessions without any user intervention.
SYSTEM PROMPT LAYOUT (agent-modifiable via memory() tool):
+============================================================+
| CORE MEMORY (BLOCKS) |
| |
| <persona> |
| I am a coding assistant that prefers functional |
| programming patterns. I always run tests before |
| committing. I use TypeScript strict mode. |
| I prefer small, focused commits with clear messages. |
| </persona> |
| |
| <human> |
| User prefers TypeScript over JavaScript. |
| Works on an e-commerce platform (Next.js 14). |
| Senior engineer, prefers concise explanations. |
| Uses pnpm as package manager. |
| Testing: Jest + React Testing Library. |
| </human> |
| |
| <project> |
| Framework: Next.js 14 (App Router) |
| Database: PostgreSQL via Prisma ORM |
| Auth: NextAuth.js v5 |
| Deployment: Vercel |
| Monorepo: Turborepo with apps/ and packages/ |
| </project> |
| |
| <skills> |
| Available skills: |
| - api-migration: Migrate REST to GraphQL |
| - testing-patterns: TDD workflow |
| - prisma-relations: Complex Prisma relation patterns |
| </skills> |
+============================================================+
| CONVERSATION MESSAGES (scrolling window) |
| * User msg -> Assistant response -> Tool calls |
+============================================================+
| ARCHIVAL MEMORY (overflow, vector-indexed) |
| * Historical summaries, large code snippets, docs |
+============================================================+
MEMORY() TOOL API:
memory(action: "edit"|"read"|"append",
block: "persona"|"human"|"project"|"skills",
updates: "string content to write/append")
Example: Agent autonomously learns a preference:
memory(action: "edit", block: "human",
updates: "Add: User prefers kebab-case file names")
| Agent | Memory Type | Mechanism | Cross-Session? | Capacity |
|---|---|---|---|---|
| Claude Code | File-based | CLAUDE.md, Memory blocks via /memory |
Partial (files persist) | Unlimited file storage |
| Codex CLI | Session-only | JSONL recorder (RolloutRecorder) | No (audit only) | Model-dependent |
| Letta Code | Server-persistent | Memory blocks + Archival DB + Recall | Yes (full) | Unlimited (server) |
| Aider | Session + repo map | tree-sitter AST map, on-demand /add |
No | ~1024 tokens overview |
| Cline | Session + settings | VS Code workspace state, webview history | No | Model-dependent |
| Goose | Session-only | In-memory, MCP extension context | No | Model-dependent |
| OpenCode | File-based | AGENTS.md + session + LSP queries | Partial (files persist) | Model-dependent |
| Vibe CLI | File-based | history/ directory on disk | Partial (history) | 256k (Devstral) |
| Qwen Code | File-based | save_memory tool → ~/.qwen/settings.json |
Partial (files persist) | Model-dependent |
| OpenManus | Agent-managed | Memory class within 4-level hierarchy | No | Model-dependent |
| Droid | Session-only | In-memory + HyperCode retrieval | No | Model-dependent |
| Warp | Session-only | In-memory + codebase embeddings | No | Model-dependent |
| Replit | Trajectory-based | Trajectory compression + checkpoints | Partial (compressed) | 200-min sessions |
Every coding agent eventually fills its context window. The strategies they use to manage this constraint—compaction, summarization, offloading, structural mapping—reveal fundamental architectural trade-offs. Some agents summarize aggressively and risk losing detail. Others offload to external storage and risk retrieval latency. A few avoid the problem entirely by providing the agent with tools to query code on demand rather than storing it in context.
Claude Code uses a 200k token context window and implements automatic compaction when usage approaches capacity. The compaction system is the most transparent among all agents analyzed, with a dedicated hook event (PreCompact) that allows external systems to intercept the process, and a /compact command for manual triggering.
// .claude/settings.json - PreCompact Hook configuration
{
"hooks": {
"PreCompact": [
{
"command": "node scripts/save-conversation-state.js",
"description": "Export conversation before compaction"
}
]
}
}
// scripts/save-conversation-state.js
// Receives conversation data via stdin
// Can: save to file, send to analytics, update external state
// Runs BEFORE compaction occurs, has access to full context
// Exit 0 = proceed with compaction
// Exit 2 = abort compaction (hook can block the operation)
Auto-compaction is lossy by design. When a complex debugging session is compacted, subtle details about failed approaches may be lost, causing the agent to retry strategies that already failed. The PreCompact hook mitigates this by allowing external systems to capture state before compaction. Teams building on Claude Code should implement PreCompact hooks that log key decisions and failed approaches to persistent storage, then inject this context via CLAUDE.md or session initialization. The /compact command also accepts a custom prompt, enabling targeted summaries: /compact focus on the authentication refactoring decisions.
Aider takes a fundamentally different approach to context management: instead of summarizing conversations, it builds an AST-aware map of the entire repository using tree-sitter parsers. This gives the agent a structural understanding of the codebase without loading full file contents, typically consuming only ~1024 tokens for a complete repository overview.
Example repo map output injected into the agent’s context:
src/auth/login.ts
export async function login(credentials: LoginCredentials): Promise<AuthToken>
export function validateToken(token: string): boolean
export function refreshToken(token: AuthToken): Promise<AuthToken>
src/auth/middleware.ts
import { validateToken } from './login'
export function authMiddleware(req: Request, res: Response, next: NextFunction)
src/api/users.ts
import { login } from '../auth/login'
export class UserService
async getUser(id: string): Promise<User>
async updateUser(id: string, data: Partial<User>): Promise<User>
async deleteUser(id: string): Promise<void>
src/models/types.ts
export interface User { id: string; email: string; name: string; }
export interface AuthToken { token: string; expiresAt: Date; }
export interface Order { id: string; userId: string; items: OrderItem[]; }
Aider’s repo map demonstrates that structural understanding—knowing what functions exist and how they relate—is often more valuable per token than content understanding—knowing the implementation of each function. By spending only ~1024 tokens on a repo overview, Aider preserves the vast majority of the context window for actual conversation, file edits, and tool output. This is especially effective for large codebases where loading even a fraction of files would exhaust the context window. The map is regenerated each turn to reflect file changes and conversation focus, with full file content loaded only when the user explicitly adds files via /add.
Replit Agent handles long-running autonomous sessions (up to 200 minutes) that would quickly exhaust any context window. Its solution is trajectory compression: using an LLM to condense long action histories into compact summaries, preserving key decision points while discarding redundant intermediate states.
Complementing trajectory compression, Replit’s checkpoints capture a full snapshot of the workspace state at key moments, enabling the user to roll back to any previous state. Each checkpoint includes the workspace files, compressed conversation, database state, environment variables, and running process state.
Letta Code’s archival memory acts as an unbounded overflow layer backed by a vector database. When information exceeds what the core memory blocks can hold, the agent can explicitly store it in archival memory using the archival_memory_insert tool, and later retrieve it using archival_memory_search with semantic similarity queries.
LETTA ARCHIVAL MEMORY SYSTEM:
┌───────────────────────────────────────────────────────┐
│ archival_memory_insert(content: string) │
│ ─► Embeds content into vector DB │
│ ─► Content survives beyond context window limits │
│ ─► Unlimited storage capacity │
│ │
│ archival_memory_search(query: string, n: int) │
│ ─► Semantic similarity search over all stored data │
│ ─► Returns top-n most relevant entries │
│ ─► Agent decides when to search and what to query │
│ │
│ USE CASES: │
│ - Store large code review findings │
│ - Save historical conversation summaries │
│ - Cache project documentation excerpts │
│ - Record detailed debugging session outcomes │
│ │
│ EXAMPLE: │
│ archival_memory_insert( │
│ "Auth migration: Changed from session-based to │
│ JWT. Files modified: auth.ts, middleware.ts, │
│ config.ts. Key decision: chose RS256 over HS256 │
│ for token signing due to microservice arch." │
│ ) │
│ │
│ archival_memory_search("JWT token signing", n=3) │
│ ─► Returns the auth migration entry above │
└───────────────────────────────────────────────────────┘
The following table provides a comprehensive comparison of how each agent manages its finite context window. Strategies range from no management at all (relying on the model’s native window) to sophisticated multi-tier memory systems with automatic offloading.
| Agent | Context Size | Compaction Strategy | Warning Mechanism |
|---|---|---|---|
| Claude Code | 200k tokens | Auto-compact at ~80%, preserve recent + todos; PreCompact hook for external state save | /context command, status line |
| Codex CLI | Model-dependent | Manual context management; RolloutRecorder for audit replay | Token count display |
| Letta Code | Model-dependent | Archival memory offload to vector DB; memory blocks always loaded in system prompt | Server-side monitoring |
| Vibe CLI | 256k (Devstral) | Architecture-level context prioritization; large native window reduces compaction need | N/A |
| Aider | Model-dependent | Repo map (~1024 tokens via tree-sitter); add files on demand via /add |
Token usage display |
| Qwen Code | Model-dependent | Auto-compaction (forked from Gemini CLI); save_memory for cross-session persistence |
Token usage display |
| OpenCode | Model-dependent | AGENTS.md context + LSP real-time queries reduce need for full file loads | Token usage display |
| Cline | Model-dependent | Per-action shadow git checkpoints; VS Code extension state persistence | VS Code status bar |
| Goose | Model-dependent | MCP extension-based context; disabledMcpServers to reclaim space |
N/A |
| OpenManus | Model-dependent | Planning-based context management; max_messages bounded buffer |
N/A |
| Droid | Proprietary | HyperCode multi-resolution retrieval reduces context needs; ByteRank relevance scoring | Managed |
| Warp | Model-dependent | Model-aligned summarization; codebase embeddings for navigation; TODO state protection | Managed |
| Replit | Model-dependent | Trajectory compression via LLM; checkpoints for state snapshots | Session timer (200 min) |
Each MCP server adds tool definitions to the context window. Goose, with its 3,000+ extension ecosystem, is particularly vulnerable: enabling too many MCP servers can shrink usable context from 200k to ~70k tokens. The rule of thumb: every MCP server costs 500–2,000 tokens of context just for its tool definitions. Use disabledMcpServers to selectively disable unused servers per project. Claude Code’s /context command helps monitor this overhead in real time.
Beyond session-level memory, a handful of agents implement mechanisms for the agent to learn reusable skills and persist knowledge across its entire lifetime. This transforms an agent from a stateless tool into a progressively improving collaborator. The approaches vary dramatically—from Letta Code’s structured skill lifecycle to Claude Code’s simple but effective Markdown files.
SKILL LIFECYCLE: 1. EXPERIENCE ──► Work through a complex task with user coaching 2. REFLECT ──► /skill command triggers agent self-reflection 3. EXTRACT ──► Agent identifies reusable patterns and steps 4. STORE ──► Skill saved as .md file in .skills/ directory 5. LOAD ──► Future sessions load skill via skill tool SKILL FILE STRUCTURE (.skills/api-migration/SKILL.md): ─────────────────────────────────────────────────────── --- name: API Migration Pattern description: Migrate REST APIs to GraphQL triggers: ["migrate", "graphql", "api upgrade"] --- # API Migration Skill ## Prerequisites - Identify all REST endpoints - Map to GraphQL schema types ## Steps 1. Create GraphQL schema from REST response types 2. Implement resolvers that call existing services 3. Add deprecation notices to REST endpoints 4. Write integration tests for GraphQL layer 5. Update client code to use GraphQL queries ## Gotchas - N+1 query problem: use DataLoader - Nested resolvers need explicit type definitions - Pagination: prefer cursor-based over offset SKILL MEMORY BLOCK (in <skills> section of system prompt): ─────────────────────────────────────────────────────── <skills> Available skills: - api-migration: Migrate REST to GraphQL (3 uses, last: Jan 28) - testing-patterns: TDD workflow for React components (5 uses) - prisma-relations: Complex Prisma relation patterns (2 uses) - nextjs-middleware: Auth middleware for App Router (1 use) </skills>
The skill system creates a compounding advantage: the agent becomes measurably more efficient on repeated task types. After learning the api-migration skill, Letta Code can execute the pattern in future sessions without step-by-step coaching, referencing its stored skill file for the exact procedure and known pitfalls.
Claude Code uses a hierarchical Markdown file system for persistent project context. While simpler than Letta’s memory blocks, CLAUDE.md is the most widely adopted cross-session persistence mechanism due to its simplicity and git-friendliness.
CLAUDE.MD HIERARCHY: ─────────────────────────────────────────────────────── ~/.claude/CLAUDE.md ← Global (all projects) ~/projects/CLAUDE.md ← Parent directory ~/projects/myapp/CLAUDE.md ← Project root (shared, git-committed) ~/projects/myapp/CLAUDE.local.md ← Personal (gitignored) ~/projects/myapp/src/CLAUDE.md ← Subdirectory-specific INHERITANCE: Child inherits parent. All levels merged at session start. /memory COMMAND: User: /memory "Always use pnpm, never npm" ──► Appends to CLAUDE.md memory section ──► Persists across all future sessions ──► Can also be edited manually like any file MEMORY BLOCKS (added via /memory): ─────────────────────────────────────────────────────── ## Memory - User prefers pnpm over npm - Run tests before committing: pnpm test - Use conventional commits: feat/fix/chore prefix - Database migrations require review before applying - Always check TypeScript strict mode after edits
Qwen Code (Alibaba, forked from Gemini CLI) adds a save_memory tool that allows the agent to explicitly persist information across sessions. Memories are stored per-project in the user’s home directory at ~/.qwen/settings.json.
QWEN CODE MEMORY SYSTEM:
───────────────────────────────────────────────────────
Tool: save_memory
Trigger: Agent detects information worth persisting
Storage: ~/.qwen/settings.json (per-project keys)
EXAMPLE FLOW:
Session 1:
User: "This project uses pnpm, not npm"
Agent: *calls save_memory*
save_memory({
key: "package_manager",
value: "pnpm",
project: "/home/user/my-project"
})
Session 2 (next day):
Agent: *reads ~/.qwen/settings.json on startup*
Agent: "I'll use pnpm since that's your project's package manager."
SETTINGS FILE STRUCTURE:
{
"memories": {
"/home/user/my-project": {
"package_manager": "pnpm",
"test_command": "pnpm test",
"preferred_style": "functional"
}
},
"global_preferences": {
"commit_style": "conventional",
"explanation_level": "concise"
}
}
OpenCode uses an AGENTS.md file analogous to Claude Code’s CLAUDE.md—a project-specific configuration file that persists across sessions via the file system. It is loaded at session start and provides the agent with project conventions, build commands, and architectural context. Combined with OpenCode’s LSP integration, AGENTS.md provides static knowledge while LSP provides dynamic, real-time code intelligence.
# AGENTS.md (OpenCode project configuration)
Project: API Gateway Service
Language: Go 1.22
Build: make build
Test: go test ./...
Key packages: cmd/, internal/, pkg/
Conventions:
- Use context.Context for all handler functions
- Errors wrap with fmt.Errorf("operation: %w", err)
- Table-driven tests preferred
- Run golangci-lint before committing
Architecture:
- cmd/gateway/main.go is the entrypoint
- internal/handlers/ contains HTTP handlers
- internal/middleware/ contains auth and logging
- pkg/client/ contains external API clients
Memory is the least mature capability across all coding agents. Only Letta Code provides true persistent memory with self-modifying memory blocks and vector-based archival storage. Most agents start every session from scratch, relying on file-based workarounds (CLAUDE.md, AGENTS.md) or session replay. This means that today, even the most sophisticated agents cannot genuinely learn from weeks of collaboration—they can only reference static files that the user or agent has manually maintained. The gap between what is architecturally possible (Letta’s full memory stack) and what is commonly deployed (session-based with config files) represents the single largest opportunity for differentiation in the coding agent market.
Future coding agents will likely adopt a tiered memory approach, analogous to CPU cache hierarchies. Each tier trades capacity for access speed and relevance. No single agent implements all four tiers today, but the trajectory across the 13 agents analyzed points clearly toward convergence on this model:
The implication is clear: an agent that implements all four memory tiers—fast working memory, compacted short-term memory, persistent long-term blocks, and unlimited archival search—would have a substantial advantage over today’s session-based agents. It would never re-learn what it already knows, never retry approaches that already failed in previous sessions, and progressively build a deep, personalized understanding of the user and their codebase. The engineering challenge lies in managing the complexity of four memory tiers while maintaining reliability. Letta Code proves the concept is viable; the question is whether mainstream agents like Claude Code and Codex CLI will adopt similar architectures.
Memory architecture is the most consequential and least converged design decision in coding agent engineering. Across the 13 agents analyzed, the fundamental tension is between simplicity (session-based, no state to corrupt) and capability (persistent memory, compounding intelligence). Key takeaways:
save_memory tool and OpenCode’s AGENTS.md represent pragmatic middle-ground approaches to cross-session persistence.Memory Systems: Letta Documentation, Claude Code Documentation
Context Management: SWE-agent (observation collapsing), Aider Repository Map
Agent Repositories: Aider, Codex CLI, Qwen Code, OpenCode, OpenManus
Platforms: Warp, Replit, Factory.ai (Droid), Letta, Mistral (Vibe CLI)
Part 3 of 6 · Coding Agent Engineering Analysis · January 2026