Part 3: Memory & Context Management

Coding Agent Engineering Analysis · Part 3 of 6
Enhanced Edition · January 2026
← Part 2: Extensions Part 3 of 6 Next: Agent Deep-Dives A–L →

Overview

Memory is the most consequential architectural decision in coding agent design. It determines whether an agent forgets everything between sessions or progressively learns your codebase, preferences, and patterns. This section analyzes the full spectrum of memory architectures across 13 coding agents, from ephemeral session-based approaches (Claude Code, Codex CLI, Aider) to fully persistent memory block systems (Letta Code), with hybrid strategies in between (Replit trajectory compression, Warp model-aligned summarization, Qwen Code cross-session save).

Context window management is equally critical: a 200k token window sounds generous until MCP server definitions consume 130k tokens, leaving only ~70k for actual work. Every agent faces the same fundamental constraint—finite context, infinite codebase—and their solutions reveal deep architectural trade-offs between continuity, efficiency, and fidelity.

ELI5: Agent Memory

Think of an agent’s memory like a desk. Session-based agents clear the desk completely every time you leave the room—next time you come back, you have to re-explain everything. Persistent-memory agents have a filing cabinet next to the desk: they write down important things (your preferences, project structure, skills learned) on index cards and file them away. When the desk gets too cluttered (context window fills up), some agents have a janitor who compacts the desk by summarizing old papers into sticky notes. The smartest agents use a repo map—a table of contents for your codebase—so they only pull the specific files they need onto the desk.

1. Session-Based vs Persistent Memory

A fundamental architectural divide separates coding agents into two camps: those that treat each session as a blank slate, and those that maintain memory across sessions. This distinction has profound implications for user experience, architecture complexity, and the types of tasks an agent can handle over time. The majority of agents today remain session-based, relying on file-system workarounds for any cross-session continuity. Only Letta Code implements a true server-persistent memory model where the agent actively manages its own long-term state.

SESSION-BASED ARCHITECTURE (Claude Code, Codex CLI, Aider, Cline, OpenCode, Vibe CLI) ════════════════════════════════════════════════════════════════════════════════ Session 1 Session 2 Session 3 ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ Context Window │ │ Context Window │ │ Context Window │ │ │ │ │ │ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ │ System │ │ │ │ System │ │ │ │ System │ │ │ │ Prompt │ │ │ │ Prompt │ │ │ │ Prompt │ │ │ ├────────────┤ │ │ ├────────────┤ │ │ ├────────────┤ │ │ │ CLAUDE.md │ │ │ │ CLAUDE.md │ │ │ │ CLAUDE.md │ │ │ │ (static) │ │ │ │ (static) │ │ │ │ (static) │ │ │ ├────────────┤ │ │ ├────────────┤ │ │ ├────────────┤ │ │ │ Messages │ │ │ │ Messages │ │ │ │ Messages │ │ │ │ + Files │ │ │ │ + Files │ │ │ │ + Files │ │ │ │ + Tools │ │ │ │ + Tools │ │ │ │ + Tools │ │ │ └────────────┘ │ │ └────────────┘ │ │ └────────────┘ │ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │ │ │ ▼ ▼ ▼ [DISCARDED] [DISCARDED] [DISCARDED] Context is LOST between sessions. Only static files (CLAUDE.md, AGENTS.md, settings) persist on disk. Conversation history, learned preferences, task context = gone. PERSISTENT-MEMORY ARCHITECTURE (Letta Code) ════════════════════════════════════════════════════════════════════════════════ ┌─────────────────────────────────────────────┐ │ LETTA API SERVER │ │ (cloud.letta.com or self-hosted) │ │ │ │ ┌───────────────────────────────────────┐ │ │ │ Memory Blocks (persisted, mutable) │ │ │ │ ┌──────────┐ ┌──────────┐ │ │ │ │ │ persona │ │ human │ │ │ │ │ │ (agent │ │ (user │ │ │ │ │ │ identity│ │ prefs) │ │ │ │ │ └──────────┘ └──────────┘ │ │ │ │ ┌──────────┐ ┌──────────┐ │ │ │ │ │ project │ │ skills │ │ │ │ │ │ (codebase│ │ (learned │ │ │ │ │ │ context)│ │ patterns)│ │ │ │ │ └──────────┘ └──────────┘ │ │ │ └───────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────┐ │ │ │ Archival Memory (vector DB) │ │ │ │ Unlimited long-term knowledge store │ │ │ └───────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────┐ │ │ │ Recall Memory (conversation log) │ │ │ │ Searchable past messages │ │ │ └───────────────────────────────────────┘ │ └──────────────────┬──────────────────────────┘ │ ┌────────────────────────┼────────────────────────┐ │ │ │ Session 1 ──────────────────────┤ │ Session 2 ──────────────────────┤ │ Session 3 ──────────────────────┘ │ │ All sessions share the SAME persistent state. │ Agent actively updates its own memory blocks. │ Skills compound over time across every session. │ └────────────────────────────────────────────────────────┘

Key Finding: The Memory Spectrum

Memory approaches fall on a spectrum from fully ephemeral to fully persistent. Most agents cluster at the ephemeral end, relying on static configuration files (CLAUDE.md, AGENTS.md, replit.md) for cross-session continuity. Letta Code is the only agent that implements true persistent memory with agent-managed updates. Hybrid approaches—Replit’s trajectory compression, Qwen Code’s save_memory tool, Warp’s model-aligned summarization—represent a middle ground where the industry is likely converging.

1.1 Letta Memory Block Structure

Letta Code implements the most sophisticated memory architecture of any coding agent. Based on the MemGPT research paper, it treats memory as a first-class primitive: the agent has explicit tools to read and write its own memory blocks, which are embedded directly into the system prompt. This creates a self-modifying agent that learns and adapts across sessions without any user intervention.

SYSTEM PROMPT LAYOUT (agent-modifiable via memory() tool):
+============================================================+
| CORE MEMORY (BLOCKS)                                       |
|                                                            |
| <persona>                                                  |
|   I am a coding assistant that prefers functional          |
|   programming patterns. I always run tests before          |
|   committing. I use TypeScript strict mode.                |
|   I prefer small, focused commits with clear messages.     |
| </persona>                                                 |
|                                                            |
| <human>                                                    |
|   User prefers TypeScript over JavaScript.                 |
|   Works on an e-commerce platform (Next.js 14).            |
|   Senior engineer, prefers concise explanations.           |
|   Uses pnpm as package manager.                            |
|   Testing: Jest + React Testing Library.                   |
| </human>                                                   |
|                                                            |
| <project>                                                  |
|   Framework: Next.js 14 (App Router)                       |
|   Database: PostgreSQL via Prisma ORM                      |
|   Auth: NextAuth.js v5                                     |
|   Deployment: Vercel                                       |
|   Monorepo: Turborepo with apps/ and packages/             |
| </project>                                                 |
|                                                            |
| <skills>                                                   |
|   Available skills:                                        |
|   - api-migration: Migrate REST to GraphQL                 |
|   - testing-patterns: TDD workflow                         |
|   - prisma-relations: Complex Prisma relation patterns     |
| </skills>                                                  |
+============================================================+
| CONVERSATION MESSAGES (scrolling window)                   |
| * User msg -> Assistant response -> Tool calls             |
+============================================================+
| ARCHIVAL MEMORY (overflow, vector-indexed)                 |
| * Historical summaries, large code snippets, docs          |
+============================================================+

MEMORY() TOOL API:
  memory(action: "edit"|"read"|"append",
         block:  "persona"|"human"|"project"|"skills",
         updates: "string content to write/append")

Example: Agent autonomously learns a preference:
  memory(action: "edit", block: "human",
         updates: "Add: User prefers kebab-case file names")

1.2 Memory Approaches by Agent

Agent Memory Type Mechanism Cross-Session? Capacity
Claude Code File-based CLAUDE.md, Memory blocks via /memory Partial (files persist) Unlimited file storage
Codex CLI Session-only JSONL recorder (RolloutRecorder) No (audit only) Model-dependent
Letta Code Server-persistent Memory blocks + Archival DB + Recall Yes (full) Unlimited (server)
Aider Session + repo map tree-sitter AST map, on-demand /add No ~1024 tokens overview
Cline Session + settings VS Code workspace state, webview history No Model-dependent
Goose Session-only In-memory, MCP extension context No Model-dependent
OpenCode File-based AGENTS.md + session + LSP queries Partial (files persist) Model-dependent
Vibe CLI File-based history/ directory on disk Partial (history) 256k (Devstral)
Qwen Code File-based save_memory tool → ~/.qwen/settings.json Partial (files persist) Model-dependent
OpenManus Agent-managed Memory class within 4-level hierarchy No Model-dependent
Droid Session-only In-memory + HyperCode retrieval No Model-dependent
Warp Session-only In-memory + codebase embeddings No Model-dependent
Replit Trajectory-based Trajectory compression + checkpoints Partial (compressed) 200-min sessions

2. Context Compaction Strategies

Every coding agent eventually fills its context window. The strategies they use to manage this constraint—compaction, summarization, offloading, structural mapping—reveal fundamental architectural trade-offs. Some agents summarize aggressively and risk losing detail. Others offload to external storage and risk retrieval latency. A few avoid the problem entirely by providing the agent with tools to query code on demand rather than storing it in context.

2.1 Claude Code Auto-Compaction

Claude Code uses a 200k token context window and implements automatic compaction when usage approaches capacity. The compaction system is the most transparent among all agents analyzed, with a dedicated hook event (PreCompact) that allows external systems to intercept the process, and a /compact command for manual triggering.

CLAUDE CODE AUTO-COMPACTION ALGORITHM ═══════════════════════════════════════════════════════════════════ ┌─────────────────────────────┐ │ Context Window (200k) │ │ ████████████████░░░░░░░░ │ ~80% full └──────────────┬──────────────┘ │ ▼ ┌─────────────────────────────┐ │ 1. Monitor Context Usage │ │ Threshold: ~80% of max │ └──────────────┬──────────────┘ │ Threshold exceeded ▼ ┌─────────────────────────────┐ │ 2. Trigger PreCompact Hook │◄── External systems can: │ (allows saving state) │ - Save state to disk └──────────────┬──────────────┘ - Export conversation │ - Log metrics ▼ ┌─────────────────────────────┐ │ 3. Generate Summary │ │ │ │ COMPACTION PROMPT: │ │ "Summarize this │ │ conversation, preserving: │ │ - Key decisions made │ │ - Current task state │ │ - Unresolved issues │ │ - User preferences learned" │ └──────────────┬──────────────┘ │ ┌──────────────┴──────────────┐ │ │ ┌─────▼──────┐ ┌─────────▼────────┐ │ PRESERVE │ │ DISCARD │ │ │ │ │ │ Recent │ │ Old tool outputs │ │ messages │ │ Resolved threads │ │ (5-10) │ │ Redundant reads │ │ Active │ │ Stale search │ │ file │ │ results │ │ contents │ │ Superseded edits │ │ Todo list │ │ Exploration that │ │ state │ │ led nowhere │ │ Error │ │ │ │ context │ │ │ └─────┬──────┘ └─────────┬────────┘ │ │ └──────────────┬───────────────┘ │ ▼ ┌─────────────────────────────┐ │ Compacted Context │ │ ██████░░░░░░░░░░░░░░░░░░ │ ~30-40% full │ │ │ Summary + Recent messages │ │ + Active file state │ └─────────────────────────────┘
// .claude/settings.json - PreCompact Hook configuration
{
  "hooks": {
    "PreCompact": [
      {
        "command": "node scripts/save-conversation-state.js",
        "description": "Export conversation before compaction"
      }
    ]
  }
}

// scripts/save-conversation-state.js
// Receives conversation data via stdin
// Can: save to file, send to analytics, update external state
// Runs BEFORE compaction occurs, has access to full context
// Exit 0 = proceed with compaction
// Exit 2 = abort compaction (hook can block the operation)

Compaction Pitfall: Information Loss

Auto-compaction is lossy by design. When a complex debugging session is compacted, subtle details about failed approaches may be lost, causing the agent to retry strategies that already failed. The PreCompact hook mitigates this by allowing external systems to capture state before compaction. Teams building on Claude Code should implement PreCompact hooks that log key decisions and failed approaches to persistent storage, then inject this context via CLAUDE.md or session initialization. The /compact command also accepts a custom prompt, enabling targeted summaries: /compact focus on the authentication refactoring decisions.

2.2 Aider Repository Map (tree-sitter)

Aider takes a fundamentally different approach to context management: instead of summarizing conversations, it builds an AST-aware map of the entire repository using tree-sitter parsers. This gives the agent a structural understanding of the codebase without loading full file contents, typically consuming only ~1024 tokens for a complete repository overview.

AIDER REPOSITORY MAP PIPELINE ═══════════════════════════════════════════════════════════════════ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ Source Files │───▶│ tree-sitter │───▶│ AST Extraction │ │ (all in repo)│ │ Parsers │ │ │ │ │ │ │ │ - Function sigs │ │ *.ts *.py │ │ Language- │ │ - Class definitions │ │ *.js *.go │ │ specific │ │ - Import statements │ │ *.rs *.java │ │ grammars │ │ - Type definitions │ └──────────────┘ └──────────────┘ └──────────┬───────────┘ │ ▼ ┌──────────────────────┐ │ Relationship Graph │ │ │ │ login() ──calls──▶ │ │ validateToken() │ │ │ │ UserService │ │ ──imports──▶ │ │ login module │ │ │ │ Module dependencies │ │ Type hierarchies │ └──────────┬───────────┘ │ ▼ ┌──────────────────────┐ │ Context Prioritizer │ │ │ │ 1. Files mentioned │ │ in conversation │ │ 2. Files related │ │ to active edits │ │ 3. Import chains │ │ 4. Remaining sigs │ └──────────┬───────────┘ │ ▼ ┌──────────────────────┐ │ REPO MAP OUTPUT │ │ (~1024 tokens) │ │ │ │ Signatures only, │ │ not implementations │ └──────────────────────┘

Example repo map output injected into the agent’s context:

src/auth/login.ts
  export async function login(credentials: LoginCredentials): Promise<AuthToken>
  export function validateToken(token: string): boolean
  export function refreshToken(token: AuthToken): Promise<AuthToken>

src/auth/middleware.ts
  import { validateToken } from './login'
  export function authMiddleware(req: Request, res: Response, next: NextFunction)

src/api/users.ts
  import { login } from '../auth/login'
  export class UserService
    async getUser(id: string): Promise<User>
    async updateUser(id: string, data: Partial<User>): Promise<User>
    async deleteUser(id: string): Promise<void>

src/models/types.ts
  export interface User { id: string; email: string; name: string; }
  export interface AuthToken { token: string; expiresAt: Date; }
  export interface Order { id: string; userId: string; items: OrderItem[]; }

Key Finding: Structure vs Content Trade-off

Aider’s repo map demonstrates that structural understanding—knowing what functions exist and how they relate—is often more valuable per token than content understanding—knowing the implementation of each function. By spending only ~1024 tokens on a repo overview, Aider preserves the vast majority of the context window for actual conversation, file edits, and tool output. This is especially effective for large codebases where loading even a fraction of files would exhaust the context window. The map is regenerated each turn to reflect file changes and conversation focus, with full file content loaded only when the user explicitly adds files via /add.

2.3 Replit Trajectory Compression

Replit Agent handles long-running autonomous sessions (up to 200 minutes) that would quickly exhaust any context window. Its solution is trajectory compression: using an LLM to condense long action histories into compact summaries, preserving key decision points while discarding redundant intermediate states.

REPLIT TRAJECTORY COMPRESSION ═══════════════════════════════════════════════════════════════════ Raw Trajectory (growing unbounded during 200-min session): ┌─────────────────────────────────────────────────────────────┐ │ Step 1: Read package.json │ │ Step 2: Analyze dependencies │ │ Step 3: Create src/index.ts with Express server │ │ Step 4: Install express, typescript, ts-node │ │ Step 5: Configure tsconfig.json │ │ Step 6: Write first route handler │ │ Step 7: Test with curl - got 404 error │ │ Step 8: Fix route path typo │ │ Step 9: Test again - success (200 OK) │ │ Step 10: Add middleware for JSON parsing │ │ Step 11: Create database connection module │ │ ... (continues for 100+ steps) │ └──────────────────────────┬──────────────────────────────────┘ │ LLM Compression ▼ Compressed Trajectory: ┌─────────────────────────────────────────────────────────────┐ │ PRESERVES: │ DISCARDS: │ │ - File changes (what was │ - Redundant file reads │ │ created/modified) │ - Intermediate states │ │ - Test results (pass/fail) │ - Exploratory dead-ends │ │ - Error encounters + fixes │ - Verbose tool outputs │ │ - Key decisions │ - Unchanged re-reads │ │ │ │ │ COMPRESSED SUMMARY: │ │ │ "Set up Express/TypeScript │ │ │ project. Fixed route bug │ │ │ (step 8). 5 API endpoints │ │ │ working. Auth pending. │ │ │ Decisions: Prisma ORM, │ │ │ JWT auth planned." │ │ └───────────────────────────────┴─────────────────────────────┘

Complementing trajectory compression, Replit’s checkpoints capture a full snapshot of the workspace state at key moments, enabling the user to roll back to any previous state. Each checkpoint includes the workspace files, compressed conversation, database state, environment variables, and running process state.

2.4 Letta Archival Memory

Letta Code’s archival memory acts as an unbounded overflow layer backed by a vector database. When information exceeds what the core memory blocks can hold, the agent can explicitly store it in archival memory using the archival_memory_insert tool, and later retrieve it using archival_memory_search with semantic similarity queries.

LETTA ARCHIVAL MEMORY SYSTEM:
┌───────────────────────────────────────────────────────┐
│  archival_memory_insert(content: string)              │
│  ─► Embeds content into vector DB                     │
│  ─► Content survives beyond context window limits     │
│  ─► Unlimited storage capacity                        │
│                                                       │
│  archival_memory_search(query: string, n: int)        │
│  ─► Semantic similarity search over all stored data   │
│  ─► Returns top-n most relevant entries               │
│  ─► Agent decides when to search and what to query    │
│                                                       │
│  USE CASES:                                           │
│  - Store large code review findings                   │
│  - Save historical conversation summaries             │
│  - Cache project documentation excerpts               │
│  - Record detailed debugging session outcomes         │
│                                                       │
│  EXAMPLE:                                             │
│  archival_memory_insert(                              │
│    "Auth migration: Changed from session-based to     │
│     JWT. Files modified: auth.ts, middleware.ts,      │
│     config.ts. Key decision: chose RS256 over HS256   │
│     for token signing due to microservice arch."      │
│  )                                                    │
│                                                       │
│  archival_memory_search("JWT token signing", n=3)     │
│  ─► Returns the auth migration entry above            │
└───────────────────────────────────────────────────────┘

3. Context Window Comparison

The following table provides a comprehensive comparison of how each agent manages its finite context window. Strategies range from no management at all (relying on the model’s native window) to sophisticated multi-tier memory systems with automatic offloading.

Agent Context Size Compaction Strategy Warning Mechanism
Claude Code 200k tokens Auto-compact at ~80%, preserve recent + todos; PreCompact hook for external state save /context command, status line
Codex CLI Model-dependent Manual context management; RolloutRecorder for audit replay Token count display
Letta Code Model-dependent Archival memory offload to vector DB; memory blocks always loaded in system prompt Server-side monitoring
Vibe CLI 256k (Devstral) Architecture-level context prioritization; large native window reduces compaction need N/A
Aider Model-dependent Repo map (~1024 tokens via tree-sitter); add files on demand via /add Token usage display
Qwen Code Model-dependent Auto-compaction (forked from Gemini CLI); save_memory for cross-session persistence Token usage display
OpenCode Model-dependent AGENTS.md context + LSP real-time queries reduce need for full file loads Token usage display
Cline Model-dependent Per-action shadow git checkpoints; VS Code extension state persistence VS Code status bar
Goose Model-dependent MCP extension-based context; disabledMcpServers to reclaim space N/A
OpenManus Model-dependent Planning-based context management; max_messages bounded buffer N/A
Droid Proprietary HyperCode multi-resolution retrieval reduces context needs; ByteRank relevance scoring Managed
Warp Model-dependent Model-aligned summarization; codebase embeddings for navigation; TODO state protection Managed
Replit Model-dependent Trajectory compression via LLM; checkpoints for state snapshots Session timer (200 min)

Context Budget Trap: MCP Server Overhead

Each MCP server adds tool definitions to the context window. Goose, with its 3,000+ extension ecosystem, is particularly vulnerable: enabling too many MCP servers can shrink usable context from 200k to ~70k tokens. The rule of thumb: every MCP server costs 500–2,000 tokens of context just for its tool definitions. Use disabledMcpServers to selectively disable unused servers per project. Claude Code’s /context command helps monitor this overhead in real time.

4. Skill Learning & Knowledge Persistence

Beyond session-level memory, a handful of agents implement mechanisms for the agent to learn reusable skills and persist knowledge across its entire lifetime. This transforms an agent from a stateless tool into a progressively improving collaborator. The approaches vary dramatically—from Letta Code’s structured skill lifecycle to Claude Code’s simple but effective Markdown files.

4.1 Letta Code Skill System

SKILL LIFECYCLE:
1. EXPERIENCE ──► Work through a complex task with user coaching
2. REFLECT   ──► /skill command triggers agent self-reflection
3. EXTRACT   ──► Agent identifies reusable patterns and steps
4. STORE     ──► Skill saved as .md file in .skills/ directory
5. LOAD      ──► Future sessions load skill via skill tool

SKILL FILE STRUCTURE (.skills/api-migration/SKILL.md):
───────────────────────────────────────────────────────
---
name: API Migration Pattern
description: Migrate REST APIs to GraphQL
triggers: ["migrate", "graphql", "api upgrade"]
---
# API Migration Skill

## Prerequisites
- Identify all REST endpoints
- Map to GraphQL schema types

## Steps
1. Create GraphQL schema from REST response types
2. Implement resolvers that call existing services
3. Add deprecation notices to REST endpoints
4. Write integration tests for GraphQL layer
5. Update client code to use GraphQL queries

## Gotchas
- N+1 query problem: use DataLoader
- Nested resolvers need explicit type definitions
- Pagination: prefer cursor-based over offset

SKILL MEMORY BLOCK (in <skills> section of system prompt):
───────────────────────────────────────────────────────
<skills>
  Available skills:
  - api-migration: Migrate REST to GraphQL (3 uses, last: Jan 28)
  - testing-patterns: TDD workflow for React components (5 uses)
  - prisma-relations: Complex Prisma relation patterns (2 uses)
  - nextjs-middleware: Auth middleware for App Router (1 use)
</skills>

The skill system creates a compounding advantage: the agent becomes measurably more efficient on repeated task types. After learning the api-migration skill, Letta Code can execute the pattern in future sessions without step-by-step coaching, referencing its stored skill file for the exact procedure and known pitfalls.

4.2 Claude Code CLAUDE.md System

Claude Code uses a hierarchical Markdown file system for persistent project context. While simpler than Letta’s memory blocks, CLAUDE.md is the most widely adopted cross-session persistence mechanism due to its simplicity and git-friendliness.

CLAUDE.MD HIERARCHY:
───────────────────────────────────────────────────────
~/.claude/CLAUDE.md            ← Global (all projects)
~/projects/CLAUDE.md           ← Parent directory
~/projects/myapp/CLAUDE.md     ← Project root (shared, git-committed)
~/projects/myapp/CLAUDE.local.md  ← Personal (gitignored)
~/projects/myapp/src/CLAUDE.md    ← Subdirectory-specific

INHERITANCE: Child inherits parent. All levels merged at session start.

/memory COMMAND:
  User: /memory "Always use pnpm, never npm"
  ──► Appends to CLAUDE.md memory section
  ──► Persists across all future sessions
  ──► Can also be edited manually like any file

MEMORY BLOCKS (added via /memory):
───────────────────────────────────────────────────────
## Memory

- User prefers pnpm over npm
- Run tests before committing: pnpm test
- Use conventional commits: feat/fix/chore prefix
- Database migrations require review before applying
- Always check TypeScript strict mode after edits

4.3 Qwen Code save_memory Tool

Qwen Code (Alibaba, forked from Gemini CLI) adds a save_memory tool that allows the agent to explicitly persist information across sessions. Memories are stored per-project in the user’s home directory at ~/.qwen/settings.json.

QWEN CODE MEMORY SYSTEM:
───────────────────────────────────────────────────────
Tool: save_memory
Trigger: Agent detects information worth persisting
Storage: ~/.qwen/settings.json (per-project keys)

EXAMPLE FLOW:
  Session 1:
  User: "This project uses pnpm, not npm"
  Agent: *calls save_memory*
    save_memory({
      key: "package_manager",
      value: "pnpm",
      project: "/home/user/my-project"
    })

  Session 2 (next day):
  Agent: *reads ~/.qwen/settings.json on startup*
  Agent: "I'll use pnpm since that's your project's package manager."

SETTINGS FILE STRUCTURE:
{
  "memories": {
    "/home/user/my-project": {
      "package_manager": "pnpm",
      "test_command": "pnpm test",
      "preferred_style": "functional"
    }
  },
  "global_preferences": {
    "commit_style": "conventional",
    "explanation_level": "concise"
  }
}

4.4 OpenCode AGENTS.md

OpenCode uses an AGENTS.md file analogous to Claude Code’s CLAUDE.md—a project-specific configuration file that persists across sessions via the file system. It is loaded at session start and provides the agent with project conventions, build commands, and architectural context. Combined with OpenCode’s LSP integration, AGENTS.md provides static knowledge while LSP provides dynamic, real-time code intelligence.

# AGENTS.md (OpenCode project configuration)
Project: API Gateway Service
Language: Go 1.22
Build: make build
Test: go test ./...
Key packages: cmd/, internal/, pkg/

Conventions:
- Use context.Context for all handler functions
- Errors wrap with fmt.Errorf("operation: %w", err)
- Table-driven tests preferred
- Run golangci-lint before committing

Architecture:
- cmd/gateway/main.go is the entrypoint
- internal/handlers/ contains HTTP handlers
- internal/middleware/ contains auth and logging
- pkg/client/ contains external API clients

5. The Memory Gap — Industry Challenge

Key Finding: Memory Is the Least Mature Capability

Memory is the least mature capability across all coding agents. Only Letta Code provides true persistent memory with self-modifying memory blocks and vector-based archival storage. Most agents start every session from scratch, relying on file-based workarounds (CLAUDE.md, AGENTS.md) or session replay. This means that today, even the most sophisticated agents cannot genuinely learn from weeks of collaboration—they can only reference static files that the user or agent has manually maintained. The gap between what is architecturally possible (Letta’s full memory stack) and what is commonly deployed (session-based with config files) represents the single largest opportunity for differentiation in the coding agent market.

The Memory Hierarchy Opportunity

Future coding agents will likely adopt a tiered memory approach, analogous to CPU cache hierarchies. Each tier trades capacity for access speed and relevance. No single agent implements all four tiers today, but the trajectory across the 13 agents analyzed points clearly toward convergence on this model:

  • L1 — Working Memory (context window): Current session messages, tool outputs, active file contents. Fastest access, most limited capacity. Every agent has this.
  • L2 — Short-term Memory (compacted summaries): Recent session summaries, compressed trajectories. Survives compaction but not session boundaries. Claude Code, Replit, and Warp implement this.
  • L3 — Long-term Memory (memory blocks): Persistent facts about the user, project, and agent identity. Survives across all sessions. Only Letta Code fully implements this; Claude Code and Qwen Code approximate it with files.
  • L4 — Archival Memory (vector DB): Unlimited searchable storage for historical context, large documents, and past experiences. Only Letta Code implements this via its archival memory system.
THE MEMORY HIERARCHY (Future Coding Agent Architecture) ═══════════════════════════════════════════════════════════════════ ┌─────────────────────────────────────┐ │ L1: WORKING MEMORY │ Fastest │ (Context Window) │ ~200k tokens │ │ │ Current messages, tool output, │ │ active file contents │ │ │ │ ALL 13 agents have this │ ├─────────────────────────────────────┤ │ L2: SHORT-TERM MEMORY │ Fast │ (Compacted Summaries) │ ~10-50k tokens │ │ │ Session summaries, compressed │ │ trajectories, recent decisions │ │ │ │ Claude Code, Replit, Warp │ ├─────────────────────────────────────┤ │ L3: LONG-TERM MEMORY │ Medium │ (Memory Blocks / Persistent Files) │ ~5-20k tokens │ │ │ User preferences, project facts, │ │ agent identity, learned skills │ │ │ │ Letta Code (full), Claude Code │ │ (CLAUDE.md), Qwen Code (save_mem) │ ├─────────────────────────────────────┤ │ L4: ARCHIVAL MEMORY │ Slowest │ (Vector DB) │ Unlimited │ │ │ Historical conversations, large │ │ code reviews, documentation, │ │ past debugging sessions │ │ │ │ Only Letta Code (archival_memory) │ Capacity └─────────────────────────────────────┘ Access Speed ▲ ▼ Storage Capacity CURRENT STATE OF ADOPTION: ────────────────────────────────────────────────────────────── L1 only: Codex CLI, Goose, Droid, Warp, OpenManus L1 + L2: Replit (trajectory), Warp (summarization) L1 + L3: Claude Code (CLAUDE.md), OpenCode (AGENTS.md), Qwen Code (save_memory), Vibe CLI (history/) L1 + L2 + L3: (No agent fully implements this yet) L1-L4 (full): Letta Code (only agent with all 4 tiers)

The implication is clear: an agent that implements all four memory tiers—fast working memory, compacted short-term memory, persistent long-term blocks, and unlimited archival search—would have a substantial advantage over today’s session-based agents. It would never re-learn what it already knows, never retry approaches that already failed in previous sessions, and progressively build a deep, personalized understanding of the user and their codebase. The engineering challenge lies in managing the complexity of four memory tiers while maintaining reliability. Letta Code proves the concept is viable; the question is whether mainstream agents like Claude Code and Codex CLI will adopt similar architectures.

Summary: Memory & Context Management

Memory architecture is the most consequential and least converged design decision in coding agent engineering. Across the 13 agents analyzed, the fundamental tension is between simplicity (session-based, no state to corrupt) and capability (persistent memory, compounding intelligence). Key takeaways:

  • Letta Code is the only agent with true persistent, agent-managed memory blocks and vector-based archival storage—the most ambitious architecture but also the most complex to operate.
  • Claude Code’s auto-compaction with PreCompact hooks provides the best within-session memory management, complemented by the CLAUDE.md hierarchy for cross-session persistence.
  • Aider’s tree-sitter repo map achieves remarkable context efficiency—~1024 tokens for a full repository structural overview—by trading content for structure.
  • Replit’s trajectory compression solves the unique challenge of very long autonomous sessions (200 minutes), condensing histories to key decision points.
  • Qwen Code’s save_memory tool and OpenCode’s AGENTS.md represent pragmatic middle-ground approaches to cross-session persistence.
  • The Memory Hierarchy (L1–L4) framework maps the industry trajectory: from today’s mostly-L1 agents toward full four-tier memory systems that learn, persist, and retrieve across unlimited sessions.

Sources & References

Memory Systems: Letta Documentation, Claude Code Documentation

Context Management: SWE-agent (observation collapsing), Aider Repository Map

Agent Repositories: Aider, Codex CLI, Qwen Code, OpenCode, OpenManus

Platforms: Warp, Replit, Factory.ai (Droid), Letta, Mistral (Vibe CLI)

Part 3 of 6 · Coding Agent Engineering Analysis · January 2026