Part 3: Memory & Context Management

Overview

Memory is the most consequential architectural decision in coding agent design. It determines whether an agent forgets everything between sessions or progressively learns your codebase, preferences, and patterns. This section analyzes the full spectrum of memory architectures across 13 coding agents, from ephemeral session-based approaches (Claude Code, Codex CLI, Aider) to fully persistent memory block systems (Letta Code), with hybrid strategies in between (Replit trajectory compression, Warp model-aligned summarization, Qwen Code cross-session save).

Context window management is equally critical: a 200k token window sounds generous until MCP server definitions consume 130k tokens, leaving only ~70k for actual work. Every agent faces the same fundamental constraint—finite context, infinite codebase—and their solutions reveal deep architectural trade-offs between continuity, efficiency, and fidelity.

A fundamental architectural divide separates coding agents into two camps: those that treat each session as a blank slate, and those that maintain memory across sessions. This distinction has profound implications for user experience, architecture complexity, and the types of tasks an agent can handle over time. The majority of agents today remain session-based, relying on file-system workarounds for any cross-session continuity. Only Letta Code implements a true server-persistent memory model where the agent actively manages its own long-term state.

SESSION-BASED ARCHITECTURE (Claude Code, Codex CLI, Aider, Cline, OpenCode, Vibe CLI) ════════════════════════════════════════════════════════════════════════════════ Session 1 Session 2 Session 3 ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ Context Window │ │ Context Window │ │ Context Window │ │ │ │ │ │ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ │ System │ │ │ │ System │ │ │ │ System │ │ │ │ Prompt │ │ │ │ Prompt │ │ │ │ Prompt │ │ │ ├────────────┤ │ │ ├────────────┤ │ │ ├────────────┤ │ │ │ CLAUDE.md │ │ │ │ CLAUDE.md │ │ │ │ CLAUDE.md │ │ │ │ (static) │ │ │ │ (static) │ │ │ │ (static) │ │ │ ├────────────┤ │ │ ├────────────┤ │ │ ├────────────┤ │ │ │ Messages │ │ │ │ Messages │ │ │ │ Messages │ │ │ │ + Files │ │ │ │ + Files │ │ │ │ + Files │ │ │ │ + Tools │ │ │ │ + Tools │ │ │ │ + Tools │ │ │ └────────────┘ │ │ └────────────┘ │ │ └────────────┘ │ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │ │ │ ▼ ▼ ▼ [DISCARDED] [DISCARDED] [DISCARDED] Context is LOST between sessions. Only static files (CLAUDE.md, AGENTS.md, settings) persist on disk. Conversation history, learned preferences, task context = gone. PERSISTENT-MEMORY ARCHITECTURE (Letta Code) ════════════════════════════════════════════════════════════════════════════════ ┌─────────────────────────────────────────────┐ │ LETTA API SERVER │ │ (cloud.letta.com or self-hosted) │ │ │ │ ┌───────────────────────────────────────┐ │ │ │ Memory Blocks (persisted, mutable) │ │ │ │ ┌──────────┐ ┌──────────┐ │ │ │ │ │ persona │ │ human │ │ │ │ │ │ (agent │ │ (user │ │ │ │ │ │ identity│ │ prefs) │ │ │ │ │ └──────────┘ └──────────┘ │ │ │ │ ┌──────────┐ ┌──────────┐ │ │ │ │ │ project │ │ skills │ │ │ │ │ │ (codebase│ │ (learned │ │ │ │ │ │ context)│ │ patterns)│ │ │ │ │ └──────────┘ └──────────┘ │ │ │ └───────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────┐ │ │ │ Archival Memory (vector DB) │ │ │ │ Unlimited long-term knowledge store │ │ │ └───────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────┐ │ │ │ Recall Memory (conversation log) │ │ │ │ Searchable past messages │ │ │ └───────────────────────────────────────┘ │ └──────────────────┬──────────────────────────┘ │ ┌────────────────────────┼────────────────────────┐ │ │ │ Session 1 ──────────────────────┤ │ Session 2 ──────────────────────┤ │ Session 3 ──────────────────────┘ │ │ All sessions share the SAME persistent state. │ Agent actively updates its own memory blocks. │ Skills compound over time across every session. │ └────────────────────────────────────────────────────────┘

1.1 Letta Memory Block Structure

Letta Code implements the most sophisticated memory architecture of any coding agent. Based on the MemGPT research paper, it treats memory as a first-class primitive: the agent has explicit tools to read and write its own memory blocks, which are embedded directly into the system prompt. This creates a self-modifying agent that learns and adapts across sessions without any user intervention.

1.2 Memory Approaches by Agent

2. Context Compaction Strategies

Every coding agent eventually fills its context window. The strategies they use to manage this constraint—compaction, summarization, offloading, structural mapping—reveal fundamental architectural trade-offs. Some agents summarize aggressively and risk losing detail. Others offload to external storage and risk retrieval latency. A few avoid the problem entirely by providing the agent with tools to query code on demand rather than storing it in context.

2.1 Claude Code Auto-Compaction

Agent	Memory Type	Mechanism	Cross-Session?	Capacity
Claude Code	File-based	CLAUDE.md, Memory blocks via `/memory`	Partial (files persist)	Unlimited file storage
Codex CLI	Session-only	JSONL recorder (RolloutRecorder)	No (audit only)	Model-dependent
Letta Code	Server-persistent	Memory blocks + Archival DB + Recall	Yes (full)	Unlimited (server)
Aider	Session + repo map	tree-sitter AST map, on-demand `/add`	No	~1024 tokens overview
Cline	Session + settings	VS Code workspace state, webview history	No	Model-dependent
Goose	Session-only	In-memory, MCP extension context	No	Model-dependent
OpenCode	File-based	AGENTS.md + session + LSP queries	Partial (files persist)	Model-dependent
Vibe CLI	File-based	history/ directory on disk	Partial (history)	256k (Devstral)
Qwen Code	File-based	`save_memory` tool → ~/.qwen/settings.json	Partial (files persist)	Model-dependent
OpenManus	Agent-managed	Memory class within 4-level hierarchy	No	Model-dependent
Droid	Session-only	In-memory + HyperCode retrieval	No	Model-dependent
Warp	Session-only	In-memory + codebase embeddings	No	Model-dependent
Replit	Trajectory-based	Trajectory compression + checkpoints	Partial (compressed)	200-min sessions

Claude Code uses a 200k token context window and implements automatic compaction when usage approaches capacity. The compaction system is the most transparent among all agents analyzed, with a dedicated hook event (PreCompact) that allows external systems to intercept the process, and a /compact command for manual triggering.

CLAUDE CODE AUTO-COMPACTION ALGORITHM ═══════════════════════════════════════════════════════════════════ ┌─────────────────────────────┐ │ Context Window (200k) │ │ ████████████████░░░░░░░░ │ ~80% full └──────────────┬──────────────┘ │ ▼ ┌─────────────────────────────┐ │ 1. Monitor Context Usage │ │ Threshold: ~80% of max │ └──────────────┬──────────────┘ │ Threshold exceeded ▼ ┌─────────────────────────────┐ │ 2. Trigger PreCompact Hook │◄── External systems can: │ (allows saving state) │ - Save state to disk └──────────────┬──────────────┘ - Export conversation │ - Log metrics ▼ ┌─────────────────────────────┐ │ 3. Generate Summary │ │ │ │ COMPACTION PROMPT: │ │ "Summarize this │ │ conversation, preserving: │ │ - Key decisions made │ │ - Current task state │ │ - Unresolved issues │ │ - User preferences learned" │ └──────────────┬──────────────┘ │ ┌──────────────┴──────────────┐ │ │ ┌─────▼──────┐ ┌─────────▼────────┐ │ PRESERVE │ │ DISCARD │ │ │ │ │ │ Recent │ │ Old tool outputs │ │ messages │ │ Resolved threads │ │ (5-10) │ │ Redundant reads │ │ Active │ │ Stale search │ │ file │ │ results │ │ contents │ │ Superseded edits │ │ Todo list │ │ Exploration that │ │ state │ │ led nowhere │ │ Error │ │ │ │ context │ │ │ └─────┬──────┘ └─────────┬────────┘ │ │ └──────────────┬───────────────┘ │ ▼ ┌─────────────────────────────┐ │ Compacted Context │ │ ██████░░░░░░░░░░░░░░░░░░ │ ~30-40% full │ │ │ Summary + Recent messages │ │ + Active file state │ └─────────────────────────────┘

2.2 Aider Repository Map (tree-sitter)

Aider takes a fundamentally different approach to context management: instead of summarizing conversations, it builds an AST-aware map of the entire repository using tree-sitter parsers. This gives the agent a structural understanding of the codebase without loading full file contents, typically consuming only ~1024 tokens for a complete repository overview.

AIDER REPOSITORY MAP PIPELINE ═══════════════════════════════════════════════════════════════════ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ Source Files │───▶│ tree-sitter │───▶│ AST Extraction │ │ (all in repo)│ │ Parsers │ │ │ │ │ │ │ │ - Function sigs │ │ *.ts *.py │ │ Language- │ │ - Class definitions │ │ *.js *.go │ │ specific │ │ - Import statements │ │ *.rs *.java │ │ grammars │ │ - Type definitions │ └──────────────┘ └──────────────┘ └──────────┬───────────┘ │ ▼ ┌──────────────────────┐ │ Relationship Graph │ │ │ │ login() ──calls──▶ │ │ validateToken() │ │ │ │ UserService │ │ ──imports──▶ │ │ login module │ │ │ │ Module dependencies │ │ Type hierarchies │ └──────────┬───────────┘ │ ▼ ┌──────────────────────┐ │ Context Prioritizer │ │ │ │ 1. Files mentioned │ │ in conversation │ │ 2. Files related │ │ to active edits │ │ 3. Import chains │ │ 4. Remaining sigs │ └──────────┬───────────┘ │ ▼ ┌──────────────────────┐ │ REPO MAP OUTPUT │ │ (~1024 tokens) │ │ │ │ Signatures only, │ │ not implementations │ └──────────────────────┘

Example repo map output injected into the agent’s context:

src/auth/login.ts
  export async function login(credentials: LoginCredentials): Promise<AuthToken>
  export function validateToken(token: string): boolean
  export function refreshToken(token: AuthToken): Promise<AuthToken>

src/auth/middleware.ts
  import { validateToken } from './login'
  export function authMiddleware(req: Request, res: Response, next: NextFunction)

src/api/users.ts
  import { login } from '../auth/login'
  export class UserService
    async getUser(id: string): Promise<User>
    async updateUser(id: string, data: Partial<User>): Promise<User>
    async deleteUser(id: string): Promise<void>

src/models/types.ts
  export interface User { id: string; email: string; name: string; }
  export interface AuthToken { token: string; expiresAt: Date; }
  export interface Order { id: string; userId: string; items: OrderItem[]; }

Key Finding: Structure vs Content Trade-off

Aider’s repo map demonstrates that structural understanding—knowing what functions exist and how they relate—is often more valuable per token than content understanding—knowing the implementation of each function. By spending only ~1024 tokens on a repo overview, Aider preserves the vast majority of the context window for actual conversation, file edits, and tool output. This is especially effective for large codebases where loading even a fraction of files would exhaust the context window. The map is regenerated each turn to reflect file changes and conversation focus, with full file content loaded only when the user explicitly adds files via /add.

2.3 Replit Trajectory Compression

Replit Agent handles long-running autonomous sessions (up to 200 minutes) that would quickly exhaust any context window. Its solution is trajectory compression: using an LLM to condense long action histories into compact summaries, preserving key decision points while discarding redundant intermediate states.

REPLIT TRAJECTORY COMPRESSION ═══════════════════════════════════════════════════════════════════ Raw Trajectory (growing unbounded during 200-min session): ┌─────────────────────────────────────────────────────────────┐ │ Step 1: Read package.json │ │ Step 2: Analyze dependencies │ │ Step 3: Create src/index.ts with Express server │ │ Step 4: Install express, typescript, ts-node │ │ Step 5: Configure tsconfig.json │ │ Step 6: Write first route handler │ │ Step 7: Test with curl - got 404 error │ │ Step 8: Fix route path typo │ │ Step 9: Test again - success (200 OK) │ │ Step 10: Add middleware for JSON parsing │ │ Step 11: Create database connection module │ │ ... (continues for 100+ steps) │ └──────────────────────────┬──────────────────────────────────┘ │ LLM Compression ▼ Compressed Trajectory: ┌─────────────────────────────────────────────────────────────┐ │ PRESERVES: │ DISCARDS: │ │ - File changes (what was │ - Redundant file reads │ │ created/modified) │ - Intermediate states │ │ - Test results (pass/fail) │ - Exploratory dead-ends │ │ - Error encounters + fixes │ - Verbose tool outputs │ │ - Key decisions │ - Unchanged re-reads │ │ │ │ │ COMPRESSED SUMMARY: │ │ │ "Set up Express/TypeScript │ │ │ project. Fixed route bug │ │ │ (step 8). 5 API endpoints │ │ │ working. Auth pending. │ │ │ Decisions: Prisma ORM, │ │ │ JWT auth planned." │ │ └───────────────────────────────┴─────────────────────────────┘

Complementing trajectory compression, Replit’s checkpoints capture a full snapshot of the workspace state at key moments, enabling the user to roll back to any previous state. Each checkpoint includes the workspace files, compressed conversation, database state, environment variables, and running process state.

2.4 Letta Archival Memory

Letta Code’s archival memory acts as an unbounded overflow layer backed by a vector database. When information exceeds what the core memory blocks can hold, the agent can explicitly store it in archival memory using the archival_memory_insert tool, and later retrieve it using archival_memory_search with semantic similarity queries.

LETTA ARCHIVAL MEMORY SYSTEM:
┌───────────────────────────────────────────────────────┐
│  archival_memory_insert(content: string)              │
│  ─► Embeds content into vector DB                     │
│  ─► Content survives beyond context window limits     │
│  ─► Unlimited storage capacity                        │
│                                                       │
│  archival_memory_search(query: string, n: int)        │
│  ─► Semantic similarity search over all stored data   │
│  ─► Returns top-n most relevant entries               │
│  ─► Agent decides when to search and what to query    │
│                                                       │
│  USE CASES:                                           │
│  - Store large code review findings                   │
│  - Save historical conversation summaries             │
│  - Cache project documentation excerpts               │
│  - Record detailed debugging session outcomes         │
│                                                       │
│  EXAMPLE:                                             │
│  archival_memory_insert(                              │
│    "Auth migration: Changed from session-based to     │
│     JWT. Files modified: auth.ts, middleware.ts,      │
│     config.ts. Key decision: chose RS256 over HS256   │
│     for token signing due to microservice arch."      │
│  )                                                    │
│                                                       │
│  archival_memory_search("JWT token signing", n=3)     │
│  ─► Returns the auth migration entry above            │
└───────────────────────────────────────────────────────┘

3. Context Window Comparison

The following table provides a comprehensive comparison of how each agent manages its finite context window. Strategies range from no management at all (relying on the model’s native window) to sophisticated multi-tier memory systems with automatic offloading.

Agent	Context Size	Compaction Strategy	Warning Mechanism
Claude Code	200k tokens	Auto-compact at ~80%, preserve recent + todos; PreCompact hook for external state save	`/context` command, status line
Codex CLI	Model-dependent	Manual context management; RolloutRecorder for audit replay	Token count display
Letta Code	Model-dependent	Archival memory offload to vector DB; memory blocks always loaded in system prompt	Server-side monitoring
Vibe CLI	256k (Devstral)	Architecture-level context prioritization; large native window reduces compaction need	N/A
Aider	Model-dependent	Repo map (~1024 tokens via tree-sitter); add files on demand via `/add`	Token usage display
Qwen Code	Model-dependent	Auto-compaction (forked from Gemini CLI); `save_memory` for cross-session persistence	Token usage display
OpenCode	Model-dependent	AGENTS.md context + LSP real-time queries reduce need for full file loads	Token usage display
Cline	Model-dependent	Per-action shadow git checkpoints; VS Code extension state persistence	VS Code status bar
Goose	Model-dependent	MCP extension-based context; `disabledMcpServers` to reclaim space	N/A
OpenManus	Model-dependent	Planning-based context management; `max_messages` bounded buffer	N/A
Droid	Proprietary	HyperCode multi-resolution retrieval reduces context needs; ByteRank relevance scoring	Managed
Warp	Model-dependent	Model-aligned summarization; codebase embeddings for navigation; TODO state protection	Managed
Replit	Model-dependent	Trajectory compression via LLM; checkpoints for state snapshots	Session timer (200 min)

Context Budget Trap: MCP Server Overhead

Each MCP server adds tool definitions to the context window. Goose, with its 3,000+ extension ecosystem, is particularly vulnerable: enabling too many MCP servers can shrink usable context from 200k to ~70k tokens. The rule of thumb: every MCP server costs 500–2,000 tokens of context just for its tool definitions. Use disabledMcpServers to selectively disable unused servers per project. Claude Code’s /context command helps monitor this overhead in real time.

4. Skill Learning & Knowledge Persistence

Beyond session-level memory, a handful of agents implement mechanisms for the agent to learn reusable skills and persist knowledge across its entire lifetime. This transforms an agent from a stateless tool into a progressively improving collaborator. The approaches vary dramatically—from Letta Code’s structured skill lifecycle to Claude Code’s simple but effective Markdown files.

4.1 Letta Code Skill System

SKILL LIFECYCLE:
1. EXPERIENCE ──► Work through a complex task with user coaching
2. REFLECT ──► /skill command triggers agent self-reflection
3. EXTRACT ──► Agent identifies reusable patterns and steps
4. STORE ──► Skill saved as .md file in .skills/ directory
5. LOAD ──► Future sessions load skill via skill tool

SKILL FILE STRUCTURE (.skills/api-migration/SKILL.md):
───────────────────────────────────────────────────────
---
name: API Migration Pattern
description: Migrate REST APIs to GraphQL
triggers: ["migrate", "graphql", "api upgrade"]
---
# API Migration Skill

## Prerequisites
- Identify all REST endpoints
- Map to GraphQL schema types

## Steps
1. Create GraphQL schema from REST response types
2. Implement resolvers that call existing services
3. Add deprecation notices to REST endpoints
4. Write integration tests for GraphQL layer
5. Update client code to use GraphQL queries

## Gotchas
- N+1 query problem: use DataLoader
- Nested resolvers need explicit type definitions
- Pagination: prefer cursor-based over offset

SKILL MEMORY BLOCK (in <skills> section of system prompt):
───────────────────────────────────────────────────────
<skills>
Available skills:
- api-migration: Migrate REST to GraphQL (3 uses, last: Jan 28)
- testing-patterns: TDD workflow for React components (5 uses)
- prisma-relations: Complex Prisma relation patterns (2 uses)
- nextjs-middleware: Auth middleware for App Router (1 use)
</skills>

The skill system creates a compounding advantage: the agent becomes measurably more efficient on repeated task types. After learning the api-migration skill, Letta Code can execute the pattern in future sessions without step-by-step coaching, referencing its stored skill file for the exact procedure and known pitfalls.

4.2 Claude Code CLAUDE.md System

Claude Code uses a hierarchical Markdown file system for persistent project context. While simpler than Letta’s memory blocks, CLAUDE.md is the most widely adopted cross-session persistence mechanism due to its simplicity and git-friendliness.

CLAUDE.MD HIERARCHY:
───────────────────────────────────────────────────────
~/.claude/CLAUDE.md            ← Global (all projects)
~/projects/CLAUDE.md           ← Parent directory
~/projects/myapp/CLAUDE.md     ← Project root (shared, git-committed)
~/projects/myapp/CLAUDE.local.md  ← Personal (gitignored)
~/projects/myapp/src/CLAUDE.md    ← Subdirectory-specific

INHERITANCE: Child inherits parent. All levels merged at session start.

/memory COMMAND:
  User: /memory "Always use pnpm, never npm"
  ──► Appends to CLAUDE.md memory section
  ──► Persists across all future sessions
  ──► Can also be edited manually like any file

MEMORY BLOCKS (added via /memory):
───────────────────────────────────────────────────────
## Memory

- User prefers pnpm over npm
- Run tests before committing: pnpm test
- Use conventional commits: feat/fix/chore prefix
- Database migrations require review before applying
- Always check TypeScript strict mode after edits

4.3 Qwen Code save_memory Tool

Qwen Code (Alibaba, forked from Gemini CLI) adds a save_memory tool that allows the agent to explicitly persist information across sessions. Memories are stored per-project in the user’s home directory at ~/.qwen/settings.json.

QWEN CODE MEMORY SYSTEM:
───────────────────────────────────────────────────────
Tool: save_memory
Trigger: Agent detects information worth persisting
Storage: ~/.qwen/settings.json (per-project keys)

EXAMPLE FLOW:
  Session 1:
  User: "This project uses pnpm, not npm"
  Agent: *calls save_memory*
    save_memory({
      key: "package_manager",
      value: "pnpm",
      project: "/home/user/my-project"
    })

  Session 2 (next day):
  Agent: *reads ~/.qwen/settings.json on startup*
  Agent: "I'll use pnpm since that's your project's package manager."

SETTINGS FILE STRUCTURE:
{
  "memories": {
    "/home/user/my-project": {
      "package_manager": "pnpm",
      "test_command": "pnpm test",
      "preferred_style": "functional"
    }
  },
  "global_preferences": {
    "commit_style": "conventional",
    "explanation_level": "concise"
  }
}

4.4 OpenCode AGENTS.md

OpenCode uses an AGENTS.md file analogous to Claude Code’s CLAUDE.md—a project-specific configuration file that persists across sessions via the file system. It is loaded at session start and provides the agent with project conventions, build commands, and architectural context. Combined with OpenCode’s LSP integration, AGENTS.md provides static knowledge while LSP provides dynamic, real-time code intelligence.

# AGENTS.md (OpenCode project configuration)
Project: API Gateway Service
Language: Go 1.22
Build: make build
Test: go test ./...
Key packages: cmd/, internal/, pkg/

Conventions:
- Use context.Context for all handler functions
- Errors wrap with fmt.Errorf("operation: %w", err)
- Table-driven tests preferred
- Run golangci-lint before committing

Architecture:
- cmd/gateway/main.go is the entrypoint
- internal/handlers/ contains HTTP handlers
- internal/middleware/ contains auth and logging
- pkg/client/ contains external API clients

5. The Memory Gap — Industry Challenge

Key Finding: Memory Is the Least Mature Capability

Memory is the least mature capability across all coding agents. Only Letta Code provides true persistent memory with self-modifying memory blocks and vector-based archival storage. Most agents start every session from scratch, relying on file-based workarounds (CLAUDE.md, AGENTS.md) or session replay. This means that today, even the most sophisticated agents cannot genuinely learn from weeks of collaboration—they can only reference static files that the user or agent has manually maintained. The gap between what is architecturally possible (Letta’s full memory stack) and what is commonly deployed (session-based with config files) represents the single largest opportunity for differentiation in the coding agent market.

The Memory Hierarchy Opportunity

Future coding agents will likely adopt a tiered memory approach, analogous to CPU cache hierarchies. Each tier trades capacity for access speed and relevance. No single agent implements all four tiers today, but the trajectory across the 13 agents analyzed points clearly toward convergence on this model:

L1 — Working Memory (context window): Current session messages, tool outputs, active file contents. Fastest access, most limited capacity. Every agent has this.
L2 — Short-term Memory (compacted summaries): Recent session summaries, compressed trajectories. Survives compaction but not session boundaries. Claude Code, Replit, and Warp implement this.
L3 — Long-term Memory (memory blocks): Persistent facts about the user, project, and agent identity. Survives across all sessions. Only Letta Code fully implements this; Claude Code and Qwen Code approximate it with files.
L4 — Archival Memory (vector DB): Unlimited searchable storage for historical context, large documents, and past experiences. Only Letta Code implements this via its archival memory system.

THE MEMORY HIERARCHY (Future Coding Agent Architecture) ═══════════════════════════════════════════════════════════════════ ┌─────────────────────────────────────┐ │ L1: WORKING MEMORY │ Fastest │ (Context Window) │ ~200k tokens │ │ │ Current messages, tool output, │ │ active file contents │ │ │ │ ALL 13 agents have this │ ├─────────────────────────────────────┤ │ L2: SHORT-TERM MEMORY │ Fast │ (Compacted Summaries) │ ~10-50k tokens │ │ │ Session summaries, compressed │ │ trajectories, recent decisions │ │ │ │ Claude Code, Replit, Warp │ ├─────────────────────────────────────┤ │ L3: LONG-TERM MEMORY │ Medium │ (Memory Blocks / Persistent Files) │ ~5-20k tokens │ │ │ User preferences, project facts, │ │ agent identity, learned skills │ │ │ │ Letta Code (full), Claude Code │ │ (CLAUDE.md), Qwen Code (save_mem) │ ├─────────────────────────────────────┤ │ L4: ARCHIVAL MEMORY │ Slowest │ (Vector DB) │ Unlimited │ │ │ Historical conversations, large │ │ code reviews, documentation, │ │ past debugging sessions │ │ │ │ Only Letta Code (archival_memory) │ Capacity └─────────────────────────────────────┘ Access Speed ▲ ▼ Storage Capacity CURRENT STATE OF ADOPTION: ────────────────────────────────────────────────────────────── L1 only: Codex CLI, Goose, Droid, Warp, OpenManus L1 + L2: Replit (trajectory), Warp (summarization) L1 + L3: Claude Code (CLAUDE.md), OpenCode (AGENTS.md), Qwen Code (save_memory), Vibe CLI (history/) L1 + L2 + L3: (No agent fully implements this yet) L1-L4 (full): Letta Code (only agent with all 4 tiers)

The implication is clear: an agent that implements all four memory tiers—fast working memory, compacted short-term memory, persistent long-term blocks, and unlimited archival search—would have a substantial advantage over today’s session-based agents. It would never re-learn what it already knows, never retry approaches that already failed in previous sessions, and progressively build a deep, personalized understanding of the user and their codebase. The engineering challenge lies in managing the complexity of four memory tiers while maintaining reliability. Letta Code proves the concept is viable; the question is whether mainstream agents like Claude Code and Codex CLI will adopt similar architectures.

Summary: Memory & Context Management

Memory architecture is the most consequential and least converged design decision in coding agent engineering. Across the 13 agents analyzed, the fundamental tension is between simplicity (session-based, no state to corrupt) and capability (persistent memory, compounding intelligence). Key takeaways:

Letta Code is the only agent with true persistent, agent-managed memory blocks and vector-based archival storage—the most ambitious architecture but also the most complex to operate.
Claude Code’s auto-compaction with PreCompact hooks provides the best within-session memory management, complemented by the CLAUDE.md hierarchy for cross-session persistence.
Aider’s tree-sitter repo map achieves remarkable context efficiency—~1024 tokens for a full repository structural overview—by trading content for structure.
Replit’s trajectory compression solves the unique challenge of very long autonomous sessions (200 minutes), condensing histories to key decision points.
Qwen Code’s save_memory tool and OpenCode’s AGENTS.md represent pragmatic middle-ground approaches to cross-session persistence.
The Memory Hierarchy (L1–L4) framework maps the industry trajectory: from today’s mostly-L1 agents toward full four-tier memory systems that learn, persist, and retrieve across unlimited sessions.