Part 4: Agent Deep-Dives — Aider to Letta

Coding Agent Engineering Analysis
January 2026 · 7 Agents · Architecture Diagrams & Implementation Details

← Part 3: Memory Part 4 of 6 Next: Deep-Dives O–W →

Overview

This section provides detailed architecture diagrams, implementation analysis, and special considerations for seven coding agents in the A–L alphabetical range: Aider, Claude Code, Cline, Codex CLI, Droid, Goose, and Letta Code. Each profile covers the agent's core architecture, unique technical contributions, tool implementations, and edge-case behaviors that matter for production deployment.

These agents span the full spectrum of design philosophies: open-source CLI tools (Aider, Codex CLI), proprietary platforms (Claude Code, Droid), IDE extensions (Cline), MCP-first frameworks (Goose), and memory-first architectures (Letta Code). Understanding their differences is essential for choosing the right tool—or synthesizing the best patterns into a new one.

Quick Reference: Agents at a Glance

Agent	Type	License	Key Differentiator
Aider	CLI (Python)	Apache-2.0	Architect/Editor dual-model, repo map via tree-sitter
Claude Code	CLI (TS)	Proprietary	8-event hook system, 18 tools, 4 subagent types
Cline	VS Code Ext	Apache-2.0	Shadow git, Puppeteer browser, 33+ providers
Codex CLI	CLI (Rust)	Apache-2.0	OS-native sandbox (Seatbelt/Landlock/seccomp)
Droid	CLI	Proprietary	HyperCode/ByteRank retrieval, Terminal-Bench #1
Goose	CLI + Desktop	Apache-2.0	MCP-first architecture, 3000+ extensions
Letta Code	CLI	Apache-2.0	Persistent memory blocks, archival vector DB, skill learning

1. Aider

Architecture Overview

Aider is an open-source, terminal-based AI pair-programming tool that pioneered the Architect/Editor dual-model pattern and repository-aware context via tree-sitter AST parsing. It operates as a conversational CLI that deeply integrates with git, auto-committing every AI change with descriptive messages for easy review and rollback.

AIDER ARCHITECTURE ================================================================ ┌───────────────────────────┐ │ USER INPUT │ │ (text, voice, images, │ │ URLs, /commands) │ └─────────────┬─────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ AIDER CLI │ ├──────────────┬──────────────┬──────────────┬────────────────┤ │ │ │ │ │ │ ┌───────────▼──────────┐ │ ┌───────────▼──────────┐ │ │ │ CHAT MODES │ │ │ EDIT FORMATS │ │ │ │ ┌────────────────┐ │ │ │ ┌────────────────┐ │ │ │ │ │ /code │ │ │ │ │ diff (default)│ │ │ │ │ │ (edits only) │ │ │ │ │ unified diff │ │ │ │ │ ├────────────────┤ │ │ │ ├────────────────┤ │ │ │ │ │ /architect │ │ │ │ │ whole │ │ │ │ │ │ (plan → edit) │ │ │ │ │ full file │ │ │ │ │ ├────────────────┤ │ │ │ ├────────────────┤ │ │ │ │ │ /ask │ │ │ │ │ editor-diff │ │ │ │ │ │ (Q&A only) │ │ │ │ │ editor merge │ │ │ │ │ └────────────────┘ │ │ │ └────────────────┘ │ │ │ └──────────────────────┘ │ └──────────────────────┘ │ │ │ │ │ │ │ ┌───────────▼──────────────▼──────────────▼──────────┐ │ │ │ REPO MAP (tree-sitter) │ │ │ │ - Parse all tracked files for AST structure │ │ │ │ - Extract: function sigs, class defs, imports │ │ │ │ - Build cross-file relationship graph │ │ │ │ - Prioritize files relevant to current chat │ │ │ │ - Budget: ~1024 tokens for repo overview │ │ │ └────────────────────────┬───────────────────────────┘ │ │ │ │ │ ┌─────────────────────────▼─────────────────────────┐ │ │ │ GIT INTEGRATION │ │ │ │ - Auto-commit every AI change │ │ │ │ - Descriptive commit messages (LLM-generated) │ │ │ │ - /diff, /undo for easy review/rollback │ │ │ │ - Respects .gitignore │ │ │ └───────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────┘

Architect/Editor Dual-Model Pattern

Aider's most significant contribution is the Architect/Editor separation, which decouples reasoning from code generation. A reasoning-optimized model (the "Architect") analyzes the problem and produces a natural-language solution plan. A code-generation model (the "Editor") then translates that plan into precise file edits. This division of labor consistently outperforms single-model approaches.

ARCHITECT/EDITOR PATTERN ================================================================ User Request: "Add JWT authentication to the API" │ ▼ ┌──────────────────────────────────────────────┐ │ ARCHITECT MODEL │ │ (Reasoning-optimized) │ │ │ │ Models: o1-preview, DeepSeek Reasoner, │ │ Claude Opus, Gemini Pro │ │ │ │ Output: Natural-language solution plan │ │ ┌─────────────────────────────────────────┐ │ │ │ "1. Install jsonwebtoken package. │ │ │ │ 2. Create middleware/auth.ts with │ │ │ │ verifyToken() function. │ │ │ │ 3. Add auth middleware to protected │ │ │ │ routes in routes/api.ts. │ │ │ │ 4. Create /login endpoint that issues │ │ │ │ JWT tokens in controllers/auth.ts." │ │ │ └─────────────────────────────────────────┘ │ └──────────────────────┬────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────┐ │ EDITOR MODEL │ │ (Code-generation) │ │ │ │ Models: Claude Sonnet, DeepSeek Coder, │ │ GPT-4o, Codestral │ │ │ │ Output: Formatted file edits (diff/whole) │ │ ┌─────────────────────────────────────────┐ │ │ │ --- a/middleware/auth.ts │ │ │ │ +++ b/middleware/auth.ts │ │ │ │ @@ -0,0 +1,18 @@ │ │ │ │ +import jwt from 'jsonwebtoken'; │ │ │ │ +export function verifyToken(req, ...) │ │ │ │ + ... │ │ │ └─────────────────────────────────────────┘ │ └──────────────────────────────────────────────┘ BENCHMARK RESULTS (Aider Polyglot Leaderboard): ┌────────────────────────────────────┬─────────┐ │ Configuration │ Score │ ├────────────────────────────────────┼─────────┤ │ o1-preview (Architect) + DeepSeek │ 85% │ │ o1-preview (Architect) + o1-mini │ 85% │ │ Claude Sonnet solo │ 72% │ │ GPT-4o solo │ 66% │ └────────────────────────────────────┴─────────┘

Repository Map via tree-sitter

Aider builds a compact, AST-aware map of the repository using tree-sitter parsers. This map provides the LLM with a structural overview of the codebase—function signatures, class definitions, import relationships—without consuming the full file contents. The map is dynamically weighted toward files most relevant to the current conversation.

REPO MAP STRATEGY:
1. Parse all tracked files via tree-sitter AST:
   - Function signatures (name, params, return type)
   - Class definitions (name, methods, inheritance)
   - Import/export statements (module graph)

2. Build relationship graph:
   - Which functions call which
   - Module dependency chains
   - Type hierarchies

3. Include in context (~1024 tokens budget):
   - Only signatures, not implementations
   - Prioritize files referenced in chat
   - Dynamically expand on-demand with /add

Example repo map output:
┌─────────────────────────────────────────┐
│ src/auth/login.ts                       │
│   export async function login(creds)    │
│   export function validateToken(token)  │
│                                         │
│ src/api/users.ts                        │
│   import { login } from '../auth/login' │
│   export class UserService              │
│     async getUser(id: string)           │
│     async updateUser(id, data)          │
│                                         │
│ src/middleware/cors.ts                   │
│   export function corsMiddleware(opts)   │
└─────────────────────────────────────────┘

Special Considerations

--watch-files Mode: Aider monitors files for AI-comment markers (e.g., // AI: fix this bug), enabling IDE integration without a dedicated extension. Changes trigger Aider to read the comment, implement the request, and auto-commit.
Voice Input: Built-in speech-to-text allows speaking requests directly. Aider transcribes, interprets, and implements code changes from voice commands.
Image & Web Support: Screenshots and URLs can be added to the chat context. Aider fetches web content and processes images as part of the conversation, useful for implementing designs from mockups.
Prompt Caching: Integrates with providers that support prompt caching (Anthropic, DeepSeek) to reduce costs and latency for repeated context windows. Particularly effective given the repo map is largely static across turns.
Browser Mode: A streamlined paste-optimized output mode designed for web-based chat UIs. Outputs are formatted for easy copy-paste into web interfaces when the CLI is not available.
Edit Format Selection: Automatically selects the optimal edit format (diff, whole, editor-diff) based on the model's capabilities. Weaker models get whole-file output; stronger models use diff format for token efficiency.

Architectural Insight: Why Architect/Editor Wins

The Architect/Editor pattern succeeds because it aligns model strengths with task requirements. Reasoning models (o1-preview, Opus) excel at planning and understanding complex requirements but produce verbose, sometimes imprecise code. Code models (Sonnet, DeepSeek) excel at precise syntax and formatting but can struggle with high-level architectural decisions. By separating concerns, Aider achieves 85% on its polyglot benchmark—a 13-point improvement over the best single-model approach (72%). This pattern has since been adopted by Cline (Architect mode) and influenced Claude Code's subagent design.

2. Claude Code

Architecture Overview

Claude Code is Anthropic's official CLI agent, distinguished by its multi-client architecture, 8-event hook system, 18 built-in tools, and 4 subagent types. It supports a 200k-token context window with automatic compaction, parallel tool execution, and a layered configuration system designed for both individual developers and enterprise deployment via MDM.

CLAUDE CODE ARCHITECTURE ================================================================ ┌─────────────────────────────────────────────────────────────┐ │ MULTI-CLIENT INTERFACE │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ CLI │ │ VS Code │ │ JetBrains│ │ GitHub │ │ │ │ Terminal │ │Extension │ │ Plugin │ │ Action │ │ │ │ (TUI) │ │ │ │ │ │ (CI/CD) │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ └──────────────┴──────┬─────┴──────────────┘ │ └─────────────────────────────┼───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ CORE AGENT ENGINE │ │ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ SYSTEM PROMPT (110+ parts, dynamically assembled) │ │ │ │ - Base instructions - Tool definitions │ │ │ │ - CLAUDE.md content - MCP server tools │ │ │ │ - Hook configurations - Permission rules │ │ │ │ - Subagent definitions - Active context │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ TOOL ORCHESTRATOR (18 built-in tools) │ │ │ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ │ │ Read │ │ Write │ │ Edit │ │Notebook│ │ │ │ │ │ │ │ │ │ │ │ Edit │ │ │ │ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │ │ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ │ │ Glob │ │ Grep │ │ Bash │ │Computer│ │ │ │ │ │ │ │(ripgrep│ │(shell) │ │(Chrome)│ │ │ │ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │ │ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ │ │ Task │ │WebFetch│ │ Web │ │ Todo │ │ │ │ │ │(subagt)│ │ │ │ Search │ │ Write │ │ │ │ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │ │ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ │ │ Enter │ │ Exit │ │ Skill │ │ Memory │ │ │ │ │ │PlanMode│ │PlanMode│ │(.md) │ │(blocks)│ │ │ │ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │ │ │ ┌────────┐ ┌────────┐ │ │ │ │ │ Slash │ │ LSP │ + MCP server tools │ │ │ │ │Command │ │ │ (dynamically loaded) │ │ │ │ └────────┘ └────────┘ │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ SUBAGENT DISPATCHER │ │ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ │ │ Task │ │ Explore │ │ Plan │ │ │ │ │ │ (parallel, │ │ (read-only │ │ (structured│ │ │ │ │ │ isolated │ │ research) │ │ design) │ │ │ │ │ │ context) │ │ │ │ │ │ │ │ │ └────────────┘ └────────────┘ └────────────┘ │ │ │ │ ┌─────────────────────────────────────────────┐ │ │ │ │ │ Custom (.claude/agents/*.md) │ │ │ │ │ │ User-defined subagents with custom tools, │ │ │ │ │ │ model selection, and system prompts │ │ │ │ │ └─────────────────────────────────────────────┘ │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ HOOK SYSTEM (8 lifecycle events) │ │ │ │ SessionStart → UserPromptSubmit → PreToolUse → │ │ │ │ [Execute] → PostToolUse → Stop / SubagentStop │ │ │ │ PreCompact, Notification │ │ │ └───────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ┌─────────────────────────────┼───────────────────────────────┐ │ CONFIGURATION LAYERS │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ ~/.claude/settings.json (user global) │ │ │ │ .claude/settings.json (project, committed) │ │ │ │ .claude/settings.local.json (personal, ignored) │ │ │ │ CLAUDE.md / CLAUDE.local.md (context files) │ │ │ │ .claude/agents/*.md (custom subagents) │ │ │ │ Enterprise MDM config (managed policies) │ │ │ └─────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘

Subagent System

Claude Code's subagent system enables delegation of work to specialized child agents, each running in its own context window with configurable tool access. This is critical for complex tasks that exceed what a single context window can handle, or where parallel execution improves throughput.

SUBAGENT TYPES:

1. TASK SUBAGENT (parallel, isolated context)
   ─────────────────────────────────────────────────
   - Spawns a new Claude instance with limited context
   - Can run multiple Task subagents in parallel
   - Configurable tool access per subagent
   - Reports results back to parent agent
   - Use: Complex subtasks, parallel file processing

   Example: "Refactor auth module" spawns Task subagents for:
     - Task 1: Update authentication middleware
     - Task 2: Update user service
     - Task 3: Update test files

2. EXPLORE SUBAGENT (read-only research)
   ─────────────────────────────────────────────────
   - Read-only access: Read, Grep, Glob, Bash(readonly)
   - Cannot modify files or run destructive commands
   - Use: Codebase research, documentation lookup

3. PLAN SUBAGENT (structured design)
   ─────────────────────────────────────────────────
   - EnterPlanMode / ExitPlanMode flow
   - Produces structured implementation plan
   - User approval required before execution
   - Use: Architecture decisions, large refactors

4. CUSTOM SUBAGENT (.claude/agents/*.md)
   ─────────────────────────────────────────────────
   File: .claude/agents/reviewer.md
   ---
   name: code-reviewer
   description: Reviews code for quality and security
   tools: ["Read", "Grep", "Glob", "Bash"]
   model: opus
   ---
   You are a senior code reviewer. Analyze changes for:
   1. Security vulnerabilities (injection, auth bypass)
   2. Performance issues (N+1 queries, memory leaks)
   3. Code style violations (project conventions)
   ...

Configuration Hierarchy

CONFIGURATION PRECEDENCE (highest → lowest):
┌──────────────────────────────────────────────────────────┐
│  Enterprise MDM Config                                    │  ← Cannot be overridden
│  (managed by organization, deployed via MDM)              │
├──────────────────────────────────────────────────────────┤
│  .claude/settings.json (project)                          │  ← Committed to repo
│  Shared team settings: hooks, permissions, MCP servers    │
├──────────────────────────────────────────────────────────┤
│  .claude/settings.local.json (personal)                   │  ← .gitignored
│  User-specific overrides: API keys, model preferences     │
├──────────────────────────────────────────────────────────┤
│  ~/.claude/settings.json (global)                         │  ← User defaults
│  Cross-project defaults: theme, behavior preferences      │
└──────────────────────────────────────────────────────────┘

CLAUDE.md CONTEXT FILES (loaded into system prompt):
- CLAUDE.md           → Project root instructions
- CLAUDE.local.md     → Personal instructions (.gitignored)
- Parent dir CLAUDE.md files also loaded (monorepo support)
- Imported via @path references for modular configs

Special Considerations

200k Context Window: Largest context of any CLI agent. Auto-compaction triggers at ~80% usage, preserving recent messages, active files, todo state, and error context while discarding old tool outputs.
Parallel Tool Calls: With Sonnet 4.5+, Claude Code executes multiple independent tool calls simultaneously (e.g., reading several files at once), significantly reducing round-trip latency.
Git Safety Protocol: Hard-coded safety rules: never force push, never skip hooks (--no-verify), never amend without explicit permission, never run destructive git commands (reset --hard, clean -f) unless directly instructed.
Session Resume: claude --resume restores the previous session's full conversation state, including tool outputs and context. Critical for long-running tasks interrupted by network issues or terminal closure.
Hook Extensibility: The 8-event hook lifecycle (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, Stop, SubagentStop, PreCompact, Notification) enables deterministic automation at every lifecycle point. Hooks can approve/deny tool calls, inject context, audit actions, and force continuation.
MCP Integration: Full MCP client support with server configuration in .claude/settings.json. Tool names follow the mcp__<server>__<tool> pattern. Supports stdio and HTTP transport.

Architectural Insight: System Prompt as Configuration

Claude Code's system prompt is dynamically assembled from 110+ parts, making it the most configurable system prompt in the market. The prompt includes base instructions, tool definitions, CLAUDE.md content, hook configurations, permission rules, MCP server tool definitions, and active session context. This "prompt as code" approach means that the agent's behavior can be extensively customized without modifying source code—a key advantage for enterprise deployment where different teams need different agent behaviors within the same organization.

3. Cline

Architecture Overview

Cline is the most popular open-source VS Code AI extension, distinguished by its shadow git checkpoint system, native Puppeteer browser automation, and support for 33+ LLM providers. It operates as a VS Code extension with a React-based webview UI, communicating with a backend Controller/Orchestrator that manages tool execution, approval workflows, and browser sessions.

CLINE ARCHITECTURE ================================================================ ┌─────────────────────────────────────────────────────────────┐ │ VS CODE EXTENSION │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌────────────────────────┐ ┌────────────────────────────┐ │ │ │ Controller │ │ Webview (React) │ │ │ │ (Orchestrator) │ │ ┌────────────────────┐ │ │ │ │ │◀─▶│ │ ChatView │ │ │ │ │ - Message routing │ │ │ HistoryView │ │ │ │ │ - Tool dispatch │ │ │ SettingsView │ │ │ │ │ - Approval gating │ │ │ DiffView │ │ │ │ │ - State management │ │ └────────────────────┘ │ │ │ └───────────┬────────────┘ └────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ TOOL EXECUTOR │ │ │ │ (Registration-based tool dispatch system) │ │ │ └───┬──────────┬──────────┬──────────┬──────────┬───────┘ │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ │ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌──────────┐ │ │ │ File │ │Terminal│ │Browser │ │ MCP │ │ Auto │ │ │ │Handler │ │Manager │ │Session │ │ Client │ │ Approve │ │ │ │ │ │ │ │(Puppe- │ │ │ │ System │ │ │ │-read │ │-execute│ │ teer) │ │-stdio │ │ │ │ │ │-write │ │-stream │ │ │ │-HTTP │ │-per tool │ │ │ │-search │ │-bg proc│ │-launch │ │-SSE │ │-per mode │ │ │ │-diff │ │ │ │-nav │ │ │ │-yolo │ │ │ │ │ │ │ │-click │ │ │ │ │ │ │ │ │ │ │ │-type │ │ │ │ │ │ │ │ │ │ │ │-screen │ │ │ │ │ │ │ └────────┘ └────────┘ └────────┘ └────────┘ └──────────┘ │ │ │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ MODES │ │ │ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌───────────┐ │ │ │ │ │ Plan │ │ Act │ │ Ask │ │ Architect │ │ │ │ │ │(R/O) │ │(R/W) │ │(Q&A) │ │ (Design) │ │ │ │ │ └──────┘ └──────┘ └──────┘ └───────────┘ │ │ │ └────────────────────────────────────────────────────────┘ │ │ │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ SHADOW GIT (Checkpoint System) │ │ │ │ - Separate from user's .git │ │ │ │ - Checkpoint created per tool call │ │ │ │ - Granular rollback to any point │ │ │ │ - git reset --hard <checkpoint_sha> │ │ │ └────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘

Browser Automation (Puppeteer)

Cline includes native browser automation via a BrowserSession class built on Puppeteer. Unlike other agents that rely on MCP servers for browser access, Cline's browser support is built in, enabling visual verification of web application changes without leaving the IDE.

BROWSER SESSION CLASS (Puppeteer-based):
┌─────────────────────────────────────────────────────────┐
│  BrowserSession                                          │
├─────────────────────────────────────────────────────────┤
│                                                           │
│  doAction(action) → {screenshot, consoleLog, result}     │
│                                                           │
│  SUPPORTED ACTIONS:                                       │
│  ┌──────────────┬───────────────────────────────────────┐ │
│  │ launch       │ Start browser (local headless or      │ │
│  │              │ connect to remote instance)            │ │
│  ├──────────────┼───────────────────────────────────────┤ │
│  │ navigate     │ Go to URL, wait for load              │ │
│  ├──────────────┼───────────────────────────────────────┤ │
│  │ click        │ Click at (x, y) coordinates           │ │
│  ├──────────────┼───────────────────────────────────────┤ │
│  │ type         │ Enter text into focused element       │ │
│  ├──────────────┼───────────────────────────────────────┤ │
│  │ screenshot   │ Capture viewport as base64 PNG        │ │
│  ├──────────────┼───────────────────────────────────────┤ │
│  │ close        │ End session, cleanup resources        │ │
│  └──────────────┴───────────────────────────────────────┘ │
│                                                           │
│  SCREENSHOT FEEDBACK LOOP:                                │
│  1. Agent performs action (e.g., click button)            │
│  2. Screenshot captured automatically                     │
│  3. Screenshot sent to LLM as base64 image               │
│  4. LLM verifies result visually                          │
│  5. Decides next action based on visual state             │
│                                                           │
│  MODES:                                                    │
│  - Local:  Headless Chromium, isolated environment        │
│  - Remote: Connect to existing browser (DevTools)         │
└─────────────────────────────────────────────────────────┘

Shadow Git Checkpoint System

CHECKPOINT ARCHITECTURE:
┌───────────────────────────────────────────────────────────┐
│              Shadow Git Repository                         │
│  (Separate .git, runs alongside user's actual git repo)   │
├───────────────────────────────────────────────────────────┤
│                                                             │
│  Tool Call #1 (file write)                                  │
│       │                                                     │
│       ▼                                                     │
│   Checkpoint ──▶ Checkpoint ──▶ Checkpoint ──▶ ...         │
│      #1             #2             #3                       │
│   (write to       (terminal      (browser                   │
│    auth.ts)       command)       navigate)                   │
│                                                             │
│   USER CONTROL:                                             │
│   - "Undo last change" → revert to Checkpoint #2           │
│   - "Start over" → revert to Checkpoint #1                 │
│   - Granular per-tool-call rollback                        │
│                                                             │
│   IMPLEMENTATION:                                           │
│   - git add -A && git commit (shadow repo)                │
│   - Rollback: git reset --hard <checkpoint_sha>           │
│   - Does NOT affect user's actual git history              │
│   - Restores both workspace files and Cline task state     │
└───────────────────────────────────────────────────────────┘

Special Considerations

XML Tool Calling Fallback: For models without native JSON function calling (e.g., older local models via Ollama), Cline falls back to XML-formatted tool calls parsed from the model's text output. This enables compatibility with virtually any LLM.
Generative Streaming UI: Real-time visualization of diffs, terminal output, and browser screenshots as the agent works. The React webview updates live, showing the agent's "thought process" in real time.
33+ LLM Providers: Supports OpenRouter, Anthropic, OpenAI, Google, AWS Bedrock, Azure, Ollama, LM Studio, vLLM, and many more. Provider-specific optimizations for token counting and feature support.
Shadow Git: Every tool execution creates a checkpoint in a separate git repository. Users can roll back to any specific tool call, not just the last prompt. This granularity is unmatched by other agents.
Multi-Workspace: @workspace:path syntax enables working across multiple directories in a monorepo. Cline tracks context across workspace boundaries.
Auto-Approve System: Configurable per-tool approval: read operations can be auto-approved while write operations require confirmation. "YOLO mode" bypasses all approvals for trusted environments.

Implementation Detail: XML Tool Calling

Cline's XML fallback mechanism works by wrapping tool invocations in XML tags within the model's text stream: <tool_use><name>write_to_file</name><path>src/app.ts</path><content>...</content></tool_use>. The parser extracts these tags in real-time from the streaming response, enabling tool execution to begin before the full response completes. This approach makes Cline compatible with models that only support text completion, not function calling—a critical advantage for local/self-hosted model deployments.

4. Codex CLI

Architecture Overview

Codex CLI is OpenAI's open-source Rust-based coding agent, distinguished by its OS-native sandboxing (Seatbelt on macOS, Landlock+seccomp on Linux, restricted tokens on Windows), Op/Event protocol separating frontend from core logic, and network-disabled-by-default security posture. The Rust crate architecture enables maximum code reuse across CLI, TUI, exec, and MCP server modes.

CODEX CLI ARCHITECTURE ================================================================ ┌─────────────────────────────────────────────────────────────┐ │ codex-rs/ CRATE STRUCTURE │ ├──────────────┬──────────────────────────────────────────────┤ │ │ │ │ FRONTEND │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ LAYER │ │ cli/ │ │ tui/ │ │ exec/ │ │ │ │ │ (REPL) │ │ (Bubble │ │ (non- │ │ │ │ │ │ │ Tea UI) │ │interact)│ │ │ │ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ └────────────┼────────────┘ │ ├──────────────┤ │ Op / Event Protocol │ │ │ ▼ │ │ CORE │ ┌─────────────────────────────────────┐ │ │ LAYER │ │ core/ │ │ │ │ │ ┌──────────────────────────────┐ │ │ │ │ │ │ ThreadManager │ │ │ │ │ │ │ (conversation state) │ │ │ │ │ │ ├──────────────────────────────┤ │ │ │ │ │ │ ModelClient │ │ │ │ │ │ │ (OpenAI API comms) │ │ │ │ │ │ ├──────────────────────────────┤ │ │ │ │ │ │ ToolOrchestrator │ │ │ │ │ │ │ (sandboxed execution) │ │ │ │ │ │ └──────────────────────────────┘ │ │ │ │ └─────────────────────────────────────┘ │ ├──────────────┤ │ │ │ │ ▼ │ │ SECURITY │ ┌──────────────────────────────────────┐ │ │ LAYER │ │ ┌────────────┐ ┌─────────────────┐ │ │ │ │ │ │linux-sandbox│ │windows-sandbox- │ │ │ │ │ │ │(Landlock + │ │rs (Restricted │ │ │ │ │ │ │ seccomp) │ │ Tokens + Jobs) │ │ │ │ │ │ └────────────┘ └─────────────────┘ │ │ │ │ │ ┌────────────┐ ┌─────────────────┐ │ │ │ │ │ │ execpolicy │ │ Seatbelt │ │ │ │ │ │ │ (approval │ │ (macOS sandbox │ │ │ │ │ │ │ engine) │ │ profiles) │ │ │ │ │ │ └────────────┘ └─────────────────┘ │ │ │ │ └──────────────────────────────────────┘ │ ├──────────────┤ │ │ │ │ ▼ │ │ SERVICES │ ┌────────────┐ ┌────────────┐ ┌──────────┐ │ │ │ │mcp-server/ │ │file-search/│ │ otel/ │ │ │ │ │(MCP proto) │ │(repo search│ │(OpenTele-│ │ │ │ │ │ │ indexing) │ │ metry) │ │ │ │ └────────────┘ └────────────┘ └──────────┘ │ │ │ ┌────────────┐ │ │ │ │keyring- │ │ │ │ │store/ │ │ │ │ │(credential │ │ │ │ │ storage) │ │ │ │ └────────────┘ │ └──────────────┴──────────────────────────────────────────────┘

Op/Event Protocol

Codex CLI separates the frontend (CLI, TUI, exec) from the core agent via a typed message protocol. Frontends send Ops (requests) and receive Events (responses). This enables the same agent core to be driven by different UIs, automated scripts, or even MCP servers.

OP/EVENT PROTOCOL:

┌──────────────────┐              ┌──────────────────┐
│    FRONTEND      │   Op (Req)   │      CORE        │
│                  │ ────────────▶│                  │
│  CLI / TUI /     │              │  ThreadManager   │
│  exec / MCP      │              │  ModelClient     │
│                  │ ◀────────────│  ToolOrchestrator│
│                  │  Event (Res)  │                  │
└──────────────────┘              └──────────────────┘

Op Types:
  - UserMessage(text)          → Submit user prompt
  - ApproveToolCall(id)        → Approve pending tool
  - DenyToolCall(id, reason)   → Deny pending tool
  - Cancel()                   → Abort current operation

Event Types:
  - AgentMessage(text)         → Agent response text
  - ToolCallRequested(id, fn)  → Needs approval
  - ToolCallExecuting(id)      → Approved, running
  - ToolCallResult(id, output) → Execution complete
  - SessionComplete()          → Agent done
  - Error(message)             → Error occurred

Sandbox Implementation Details

SANDBOX MODES:
┌──────────────────────────────────────────────────────────────┐
│ read-only          │ Read anywhere, write NOWHERE             │
│                    │ Network: blocked                          │
│                    │ Use: Code review, analysis                │
├────────────────────┼──────────────────────────────────────────┤
│ workspace-write    │ Read anywhere, write to workspace + /tmp │
│ (DEFAULT)          │ Network: blocked by default              │
│                    │ Use: Normal development                   │
├────────────────────┼──────────────────────────────────────────┤
│ danger-full-access │ No restrictions whatsoever                │
│                    │ Network: enabled                          │
│                    │ Use: Isolated VMs / CI containers only    │
└────────────────────┴──────────────────────────────────────────┘

PLATFORM IMPLEMENTATIONS:

macOS (Seatbelt):
  - sandbox-exec with Scheme-like profile
  - (version 1) (deny default)
  - (allow file-read* (subpath "/usr"))
  - (allow file-read* file-write* (subpath "/workspace"))
  - (deny network*)

Linux (Landlock + seccomp):
  - Kernel 5.13+ required for Landlock
  - Filesystem rules: per-path read/write/execute
  - seccomp: syscall-level filtering (blocks network syscalls)
  - codex-linux-sandbox: separate setuid binary
  - Falls back to Docker if kernel too old

Windows:
  - CreateRestrictedToken() API
  - Job objects for process resource limits
  - WSL preferred for full Linux sandboxing semantics
  - windows-sandbox-rs crate handles token creation

Special Considerations

Network Disabled by Default: The most aggressive security posture of any agent. Prevents prompt injection via malicious URLs, exfiltration of code, and supply-chain attacks. Network can be explicitly enabled with --network flag when needed (e.g., npm install).
JSONL Session Format: RolloutRecorder writes every conversation turn, tool call, and result to a JSONL file. Enables complete session replay, audit trails, and debugging. Sessions can be replayed for testing or compliance review.
OpenTelemetry Integration: Enterprise audit logging via OTLP exporter. Traces every tool call, model request, and approval decision without weakening the security sandbox. Configurable to exclude user prompts for privacy.
Code Review Mode: codex review analyzes PRs in read-only sandbox mode. Reviews code changes, identifies issues, and provides feedback without any write access—ideal for automated PR review in CI pipelines.
Crate Reuse: The Rust crate architecture (core, cli, tui, exec, mcp-server) means new frontends can be built by implementing the Op/Event protocol against the core crate. This is why Codex supports CLI, TUI, exec, and MCP server modes from the same codebase.

Architectural Insight: Security Through Architecture

Codex CLI's approach to security is fundamentally different from every other agent. While Claude Code, Cline, and Goose rely on permission workflows (asking the user before dangerous actions), Codex uses OS-level enforcement (the kernel blocks unauthorized actions regardless of what the agent attempts). This means even a prompt-injected agent cannot escape the sandbox. The trade-off is reduced flexibility: operations like npm install require explicitly enabling network access, adding friction to the development workflow. For high-security environments (financial services, healthcare, government), this trade-off is strongly favorable.

5. Droid (Factory.ai)

Architecture Overview

Droid by Factory.ai is a proprietary, LLM-agnostic multi-model agent that holds the #1 position on Terminal-Bench (58.8% with Opus 4.1). Its key differentiators are the proprietary HyperCode & ByteRank codebase retrieval system, specialized Droid variants for different tasks, and the DroidShield compliance layer with ISO 42001/SOC 2 certifications. Droid uses hierarchical prompting with model-specific optimizations.

DROID ARCHITECTURE (Factory.ai) ================================================================ ┌─────────────────────────────────────────────────────────────┐ │ USER INTERFACES │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ CLI │ │ IDE │ │ Droid │ │ CI/CD │ │ │ │ Terminal │ │Extensions│ │ Exec │ │ (GitHub │ │ │ │ │ │ │ │(headless)│ │ Action) │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ └──────────────┴──────┬─────┴──────────────┘ │ └─────────────────────────────┼───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ MULTI-MODEL COMPOSITION │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ HIERARCHICAL PROMPTING │ │ │ │ │ │ │ │ Layer 1: Tool Descriptions │ │ │ │ └─ What tools are available and how to use them │ │ │ │ Layer 2: System Prompts │ │ │ │ └─ Agent persona, project context, AGENTS.md │ │ │ │ Layer 3: System Notifications │ │ │ │ └─ Runtime events, errors, status updates │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ MODEL-SPECIFIC OPTIMIZATIONS │ │ │ │ ┌───────────────┐ ┌───────────────┐ │ │ │ │ │ Claude Opus/ │ │ GPT-5 │ │ │ │ │ │ Sonnet │ │ │ │ │ │ │ │ - XML edits │ │ - JSON edits │ │ │ │ │ │ - Extended │ │ - Structured │ │ │ │ │ │ thinking │ │ outputs │ │ │ │ │ └───────────────┘ └───────────────┘ │ │ │ │ ┌───────────────┐ ┌───────────────┐ │ │ │ │ │ Gemini │ │ GLM 4.6 │ │ │ │ │ │ │ │ │ │ │ │ │ │ - Long ctx │ │ - Chinese │ │ │ │ │ │ optimized │ │ codebase │ │ │ │ │ └───────────────┘ └───────────────┘ │ │ │ └─────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ HYPERCODE & BYTERANK RETRIEVAL │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ MULTI-RESOLUTION CODEBASE REPRESENTATION │ │ │ │ │ │ │ │ ┌──────────────────────┐ ┌──────────────────────┐ │ │ │ │ │ EXPLICIT GRAPH │ │ IMPLICIT LATENT │ │ │ │ │ │ RELATIONSHIPS │ │ SPACE SIMILARITY │ │ │ │ │ │ │ │ │ │ │ │ │ │ - AST structure │ │ - Code embeddings │ │ │ │ │ │ - Call graphs │ │ - Semantic search │ │ │ │ │ │ - Import chains │ │ - Pattern matching │ │ │ │ │ │ - Type hierarchy │ │ - Fuzzy retrieval │ │ │ │ │ └──────────────────────┘ └──────────────────────┘ │ │ │ │ │ │ │ │ Combined: Multi-hop retrieval across both │ │ │ │ representations for comprehensive context │ │ │ └─────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ SPECIALIZED DROIDS │ │ │ │ ┌─────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ │ │ Code Droid │ │Knowledge │ │ Reliability Droid │ │ │ │ │ │Droid │ │ │ │ │ │ - Features │ │ - Research │ │ - Incident response │ │ │ │ - Bug fixes │ │ - Docs │ │ - Root cause │ │ │ │ - Refactors │ │ - Wikis │ │ - Monitoring fixes │ │ │ └─────────────┘ └──────────────┘ └──────────────────────┘ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Product Droid │ │ │ │ - Backlog management - Spec generation │ │ │ │ - Story breakdown - Acceptance criteria │ │ │ └──────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ DROIDSHIELD (Compliance & Security) │ │ │ │ ┌──────────────────┐ ┌─────────────────────────────────┐ │ │ │ Real-time Static │ │ Sandboxed Execution │ │ │ │ Analysis │ │ - Isolated containers │ │ │ │ - Code scanning │ │ - Resource limits │ │ │ │ - Vuln detection │ │ - Network policies │ │ │ └──────────────────┘ └─────────────────────────────────┘ │ │ │ │ Certifications: ISO 42001 · SOC 2 · ISO 27001 │ │ Compliance: GDPR · CCPA │ └─────────────────────────────────────────────────────────────┘

HyperCode & ByteRank

Factory.ai's proprietary codebase retrieval system combines two complementary approaches to achieve high-precision context retrieval. Unlike simpler approaches (tree-sitter maps, grep-based search), HyperCode operates on multi-resolution representations that capture both structural relationships and semantic similarity.

HYPERCODE & BYTERANK:

EXPLICIT GRAPH RELATIONSHIPS:
  - Full AST parse of all repository files
  - Call graph: which functions invoke which
  - Import chains: module dependency resolution
  - Type hierarchy: inheritance, interfaces, generics
  - File-level dependency graph

IMPLICIT LATENT SPACE SIMILARITY:
  - Code embeddings (proprietary model)
  - Semantic similarity search
  - Pattern matching across codebases
  - "This code is similar to..." retrieval

COMBINED RETRIEVAL:
  Query: "Fix the authentication bug in login flow"
  1. Graph lookup: auth module → login function → dependencies
  2. Semantic search: code related to "authentication" + "login"
  3. Merge & rank: ByteRank scores relevance across both
  4. Return: Precise context with multi-hop dependencies

ADVANTAGE OVER TREE-SITTER (Aider) / GREP (most agents):
  - Understands semantic relationships, not just text
  - Multi-hop: finds code 2-3 dependency levels away
  - Learns codebase patterns over time
  - Trade-off: proprietary, requires Factory infrastructure

Droid Exec & Integrations

DROID EXEC (Headless Mode):
  - Run Droid without interactive terminal
  - Use cases: CI/CD pipelines, cron jobs, pre-commit hooks
  - AGENTS.md: project-level configuration (similar to CLAUDE.md)
  - Install: curl -fsSL https://app.factory.ai/cli | sh

INTEGRATIONS:
  ┌──────────────┬────────────────────────────────────────┐
  │ Category     │ Services                                │
  ├──────────────┼────────────────────────────────────────┤
  │ VCS          │ GitHub, GitLab                          │
  ├──────────────┼────────────────────────────────────────┤
  │ Project Mgmt │ Jira, Notion                            │
  ├──────────────┼────────────────────────────────────────┤
  │ Communication│ Slack                                   │
  ├──────────────┼────────────────────────────────────────┤
  │ Observability│ Datadog, Sentry                         │
  └──────────────┴────────────────────────────────────────┘

PRICING:
  Free:  BYOK (bring your own key)
  Pro:   $20/month (includes credits)
  Teams: $40 base + $10/user/month

Special Considerations

Terminal-Bench #1: Droid achieves 58.8% on Terminal-Bench (with Opus 4.1), the highest score on this comprehensive benchmark spanning coding, build/test, data/ML, systems, networking, security, and CLI workflows across 80 Dockerized tasks.
SWE-bench Lite: 31.67% pass@1 on SWE-bench Lite. Factory stopped running full SWE-bench Verified, citing its Python-only and debugging-only limitations as unrepresentative of real-world coding tasks.
LLM-Agnostic: Supports Claude Opus/Sonnet, GPT-5, Gemini, and GLM 4.6 with model-specific edit format optimizations. Different models produce different edit formats for maximum accuracy.
Specialized Droids: Code Droid (features/bugs), Knowledge Droid (research/docs), Reliability Droid (incident response), and Product Droid (backlog/specs) each have tailored prompts and tool access for their domain.
AGENTS.md: Project-level context file (analogous to CLAUDE.md) that configures Droid's behavior, coding conventions, and project-specific instructions.
DroidShield: Enterprise compliance layer with real-time static analysis, sandboxed execution, and certifications including ISO 42001 (AI management), SOC 2, ISO 27001, GDPR, and CCPA.

Benchmark Context

Factory.ai actively argues against SWE-bench as a primary benchmark, noting it only tests Python debugging in open-source repositories. Terminal-Bench, where Droid excels, tests a much broader range of tasks across multiple languages and domains. When evaluating Droid's capabilities, Terminal-Bench scores are more representative of its real-world performance than SWE-bench Lite scores.

6. Goose

Architecture Overview

Goose is an MCP-first open-source agent from Block (the parent company of Square, Cash App, and TIDAL). Its defining characteristic is that everything is an MCP extension. Rather than building a large set of built-in tools, Goose delegates all functionality to MCP servers, making it the most extensible agent in the ecosystem with access to 3,000+ MCP servers. It runs locally with a privacy-first design and supports both a desktop app (Electron) and CLI.

GOOSE ARCHITECTURE ================================================================ ┌─────────────────────────────────────────────────────────────┐ │ USER INTERFACES │ │ ┌──────────────────────┐ ┌──────────────────────┐ │ │ │ Desktop App │ │ CLI │ │ │ │ (Electron) │ │ (Terminal) │ │ │ │ │ │ │ │ │ │ - System tray │ │ - goose session │ │ │ │ - GUI configuration │ │ - goose configure │ │ │ │ - Visual feedback │ │ - goose run │ │ │ └──────────┬───────────┘ └──────────┬───────────┘ │ │ └──────────────┬───────────┘ │ └────────────────────────────┼────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ CORE ENGINE │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ MCP CLIENT (first-class citizen) │ │ │ │ │ │ │ │ - All tools are MCP servers │ │ │ │ - stdio and HTTP transport │ │ │ │ - Dynamic server discovery │ │ │ │ - Hot-reload on config change │ │ │ └────────────────────┬────────────────────────────────┘ │ │ │ │ │ ┌──────────┬───────┼───────┬──────────┬──────────┐ │ │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ ▼ │ │ ┌────────┐┌────────┐┌─────┐┌────────┐┌────────┐┌────────┐ │ │ │ GitHub ││ Jira ││Slack││Playwrt ││ File ││ Custom │ │ │ │ MCP ││ MCP ││ MCP ││ MCP ││System ││ MCP │ │ │ │ ││ ││ ││ ││ MCP ││Servers │ │ │ │-issues ││-ticket ││-msg ││-browse ││ ││ │ │ │ │-PRs ││-search ││-chan ││-click ││-read ││-any │ │ │ │-repos ││-create ││-post││-type ││-write ││ tool │ │ │ └────────┘└────────┘└─────┘└────────┘└────────┘└────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ LLM ROUTING │ │ │ │ - Multi-model: route by task complexity │ │ │ │ - Cost optimization: cheap models for simple tasks │ │ │ │ - Provider agnostic: any model with tool calling │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ LOCAL-FIRST EXECUTION │ │ - All processing on your machine │ │ - No cloud dependency for core functionality │ │ - API calls only for LLM inference │ │ - Privacy: code never leaves your machine │ └─────────────────────────────────────────────────────────────┘

Extension System

Goose's extension system is built entirely on MCP. Extensions are configured via YAML and can be added interactively with goose configure or by editing the config file directly.

EXTENSION CONFIGURATION (~/.config/goose/config.yaml):

extensions:
  # Browser automation
  playwright:
    command: npx @playwright/mcp@latest
    timeout: 300
    description: Browser automation for web testing

  # GitHub integration
  github:
    command: npx -y @modelcontextprotocol/server-github
    env:
      GITHUB_TOKEN: ${GITHUB_TOKEN}

  # Project management
  jira:
    command: npx -y @atlassian/mcp-jira
    env:
      JIRA_TOKEN: ${JIRA_TOKEN}
      JIRA_URL: https://myteam.atlassian.net

  # Database access
  postgres:
    command: npx -y @modelcontextprotocol/server-postgres
    env:
      DATABASE_URL: ${DATABASE_URL}

ADDING EXTENSIONS INTERACTIVELY:
$ goose configure
> Add Extension
> Command-line Extension
> Name: sentry
> Command: npx -y @sentry/mcp-server
> Timeout: 60
> Environment variables? Yes
> SENTRY_AUTH_TOKEN: ***
> Added successfully. Restart session to activate.

Special Considerations

Block Origin: Built by Block, Inc. (formerly Square), the parent company of Square, Cash App, and TIDAL. This backing ensures long-term maintenance and enterprise-grade quality. Goose was originally an internal tool before being open-sourced.
3,000+ MCP Servers: The largest extension ecosystem of any agent. Because everything is an MCP extension, Goose can integrate with any service that has an MCP server—from databases to monitoring tools to communication platforms.
Multi-Model Routing: Goose can route different tasks to different models based on complexity. Simple file reads go to a cheap, fast model; complex architectural decisions go to a more capable (and expensive) model. This optimizes both cost and quality.
Privacy-First: All execution happens locally on the user's machine. Code is never sent to Goose's servers. The only external calls are LLM inference requests to the configured model provider. This makes Goose suitable for proprietary codebases where data residency is a concern.
Interactive Configuration: goose configure provides a guided setup experience for adding models, extensions, and environment variables. This lowers the barrier to entry compared to manual config file editing.
Linux Foundation Governance: As an MCP-native agent, Goose benefits from MCP's move to the Linux Foundation's Agentic AI Foundation (AAIF) in December 2025, ensuring neutral governance and long-term protocol stability.

Architectural Insight: Everything is MCP

Goose's "everything is MCP" philosophy is a bold architectural bet. The advantage is maximum extensibility: any new capability is just another MCP server away. The disadvantage is that core operations (file I/O, search, execution) have the overhead of MCP's protocol layer, and the agent's capabilities are bounded by the MCP ecosystem's quality. In practice, Goose excels in environments where integration breadth matters more than raw coding speed—for example, a developer who needs to read Jira tickets, check Datadog alerts, browse documentation, and write code in a single session. For pure code-generation tasks, purpose-built tools like Claude Code or Codex CLI may be faster.

7. Letta Code

Architecture Overview

Letta Code is the only agent with persistent, server-side memory that survives across sessions. Built on the Letta framework (formerly MemGPT), it treats memory as a first-class architectural component: agents maintain memory blocks (persona, human, project, skills) that are updated via a dedicated memory() tool, and archival memory backed by a vector database for long-term storage and retrieval. Letta Code is the #1 model-agnostic harness on Terminal-Bench.

LETTA CODE ARCHITECTURE ================================================================ ┌─────────────────────────────────────────────────────────────┐ │ LETTA CODE CLI │ │ │ │ Commands: │ │ letta-code → Start interactive session │ │ letta-code /init → Deep research, populate memory │ │ letta-code /remember → Extract learnings from session │ │ letta-code /clear → Clear messages, KEEP memory │ │ letta-code /skill → Trigger skill extraction │ └────────────────────────────┬────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ LETTA API SERVER │ │ (cloud.letta.com or self-hosted) │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ AGENT INSTANCES (stateful) │ │ │ │ │ │ │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │ │ Agent "proj-A" │ │ Agent "proj-B" │ ... │ │ │ │ │ │ │ │ │ │ │ │ │ ┌───────────┐ │ │ ┌───────────┐ │ │ │ │ │ │ │ MEMORY │ │ │ │ MEMORY │ │ │ │ │ │ │ │ BLOCKS │ │ │ │ BLOCKS │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ persona: │ │ │ │ persona: │ │ │ │ │ │ │ │ "I prefer │ │ │ │ "I focus │ │ │ │ │ │ │ │ TDD..." │ │ │ │ on perf" │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ human: │ │ │ │ human: │ │ │ │ │ │ │ │ "Uses TS, │ │ │ │ "Uses Go, │ │ │ │ │ │ │ │ Next.js" │ │ │ │ Postgres"│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ project: │ │ │ │ project: │ │ │ │ │ │ │ │ "E-comm │ │ │ │ "Data │ │ │ │ │ │ │ │ platform"│ │ │ │ pipeline"│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ skills: │ │ │ │ skills: │ │ │ │ │ │ │ │ "API migr │ │ │ │ "ETL │ │ │ │ │ │ │ │ TDD wkfl"│ │ │ │ Bench" │ │ │ │ │ │ │ └───────────┘ │ │ └───────────┘ │ │ │ │ │ └─────────────────┘ └─────────────────┘ │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ ARCHIVAL MEMORY │ │ │ │ (Vector DB for long-term storage) │ │ │ │ │ │ │ │ - Conversations too old for context window │ │ │ │ - Code patterns and solutions encountered │ │ │ │ - Project documentation and decisions │ │ │ │ - Searchable via semantic similarity │ │ │ │ - Agent can insert/search/delete entries │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ TOOL SYSTEM │ │ │ │ - memory(action, block, updates) → self-edit │ │ │ │ - archival_insert(content) → long-term save │ │ │ │ - archival_search(query) → recall │ │ │ │ - Standard coding tools (file, exec, search) │ │ │ └─────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘

Memory Tool & Self-Editing System Prompt

Letta Code's most innovative feature is the memory() tool, which allows the agent to edit its own system prompt in real time. Memory blocks are injected into the system prompt and persist across sessions on the Letta server. The agent decides when and what to remember—the user does not directly edit memory blocks.

MEMORY BLOCK STRUCTURE (injected into system prompt):
+========================================+
| SYSTEM PROMPT                          |
+========================================+
| CORE MEMORY (BLOCKS)                   |  ← Agent updates via memory() tool
| - <persona>                            |    Persisted to Letta server
|     I am a coding assistant that       |    Survives across sessions
|     prefers functional programming.    |
|     Always run tests before commits.   |
|   </persona>                           |
| - <human>                              |
|     User prefers TypeScript over JS.   |
|     Works on e-commerce platform.      |
|     Dislikes ORMs, prefers raw SQL.    |
|   </human>                             |
| - <project>                            |
|     Framework: Next.js 14              |
|     Testing: Jest + React Testing Lib  |
|     DB: PostgreSQL with Drizzle ORM    |
|   </project>                           |
+========================================+
| MESSAGES                               |
| * User → Assistant (recent only)       |
+========================================+

MEMORY TOOL INVOCATION:
memory(
  action: "edit",
  block: "human",
  updates: "Add: User prefers Drizzle ORM over Prisma.
            User runs on macOS with Homebrew."
)

memory(
  action: "edit",
  block: "project",
  updates: "Update: Migrated from Jest to Vitest.
            Add: Using Turborepo for monorepo."
)

Skill Learning System

SKILL LIFECYCLE:
┌─────────────────────────────────────────────────────────────┐
│  1. EXPERIENCE                                               │
│     Work through a complex task with user coaching           │
│     Example: Migrate REST API to GraphQL                     │
│                                                               │
│  2. REFLECT (/skill command)                                 │
│     Agent reviews the conversation for reusable patterns     │
│     "What did I learn that could apply to future tasks?"     │
│                                                               │
│  3. EXTRACT                                                   │
│     Agent identifies and formalizes reusable steps            │
│     Creates structured skill definition                       │
│                                                               │
│  4. STORE                                                     │
│     Skill saved as .md file in .skills/ directory             │
│     Also recorded in "skills" memory block                    │
│                                                               │
│  5. LOAD                                                      │
│     Future sessions load skill via skill tool                 │
│     Agent applies learned patterns automatically              │
└─────────────────────────────────────────────────────────────┘

SKILL FILE (.skills/api-migration/SKILL.md):
---
name: API Migration Pattern
description: Migrate REST APIs to GraphQL
triggers: ["migrate", "graphql", "api upgrade"]
---
# API Migration Skill

## Prerequisites
- Identify all REST endpoints
- Map to GraphQL schema types

## Steps
1. Create GraphQL schema from REST response types
2. Implement resolvers that call existing services
3. Add deprecation notices to REST endpoints
4. Create integration tests for GraphQL endpoints
5. Update client code to use GraphQL queries

## Common Pitfalls
- N+1 query problem: use DataLoader
- Auth middleware: ensure GraphQL context includes auth
- Error handling: map REST errors to GraphQL errors

SKILLS MEMORY BLOCK (in system prompt):
<skills>
Available skills:
- api-migration: Migrate REST to GraphQL (3 uses)
- testing-patterns: TDD workflow for React (7 uses)
- db-schema: Database migration best practices (2 uses)
</skills>

Special Considerations

#1 Model-Agnostic Harness: Letta Code is the top-performing model-agnostic harness on Terminal-Bench, meaning it achieves the best results regardless of which underlying LLM is used. This validates the memory-first architecture as a universal performance booster.
Server-Side State: All agent state (memory blocks, archival memory, conversation history) lives on the Letta API server, not the local filesystem. This enables accessing the same agent from different machines, sharing agents across teams, and centralized management.
/init Command: Triggers a deep research phase where the agent explores the entire codebase, reads README files, analyzes project structure, and populates memory blocks with comprehensive project knowledge. This is the "onboarding" step for a new project.
/remember Command: Explicitly triggers the agent to extract learnings from the current conversation and persist them to memory blocks. Use after completing a complex task to ensure the knowledge is retained for future sessions.
/clear Preserves Memory: /clear only clears the conversation messages, not the memory blocks. This is a critical distinction: the agent retains all learned preferences, project knowledge, and skills even after clearing the chat. This enables fresh conversations that still benefit from accumulated knowledge.
Self-Hosted Option: The Letta server can be self-hosted for organizations that cannot use cloud services. This provides full control over data residency while maintaining the persistent memory architecture.

Architectural Insight: Memory as a First-Class Citizen

Letta Code represents a fundamentally different paradigm from session-based agents. In Claude Code or Codex CLI, each session starts from scratch (with only CLAUDE.md/config files providing continuity). In Letta Code, the agent remembers: your coding preferences, project conventions, past decisions, learned skills, and even mistakes to avoid. Over time, a Letta agent becomes increasingly tailored to its user and project. The trade-off is complexity (requires a running Letta server) and the risk of memory staleness (outdated preferences persisting). The /clear command's memory-preserving behavior is the key UX innovation: it gives users a fresh conversation context while maintaining the agent's accumulated knowledge—analogous to a human developer starting a new day but retaining their project experience.

Cross-Agent Architecture Comparison

Capability	Aider	Claude Code	Cline	Codex CLI	Droid	Goose	Letta
Multi-Model	Yes (Architect/ Editor)	Single (Sonnet/ Opus)	Any (33+ providers)	Single (GPT)	Yes (4+ models)	Yes (routing)	Any (agnostic)
Sandboxing	None	Permission workflow	Shadow git + approval	OS-native	DroidShield	Local trust	Server-side
Memory	Repo map (session)	CLAUDE.md (file)	Session only	JSONL replay	AGENTS.md	Session only	Persistent blocks
Browser	URL fetch	Chrome (Computer)	Puppeteer native	None (network off)	Via integrations	Playwright MCP	Via tools
MCP Support	No	Yes (client)	Yes (client)	Yes (server)	No	Core arch	Yes
Hooks/Events	No	8 events	Auto-approve config	Op/Event protocol	Hierarchical prompts	MCP-based	Memory tool
IDE Integration	--watch-files	VS Code + JetBrains	VS Code native	Standalone	IDE extensions	Desktop + CLI	CLI only
License	Apache-2.0	Proprietary	Apache-2.0	Apache-2.0	Proprietary	Apache-2.0	Apache-2.0

Key Takeaways

Best for security-sensitive environments: Codex CLI (OS-native sandboxing, network disabled by default)
Best for long-running projects: Letta Code (persistent memory blocks, skill learning across sessions)
Best for extensibility: Goose (MCP-first, 3,000+ extensions) or Claude Code (8-event hook system)
Best for multi-model composition: Aider (Architect/Editor) or Droid (model-specific optimizations)
Best for IDE integration: Cline (native VS Code, shadow git, browser automation)
Best for enterprise compliance: Droid (ISO 42001, SOC 2, ISO 27001, GDPR, CCPA)
Best for benchmark performance: Droid (Terminal-Bench #1) or Claude Code (200k context, parallel tools)

← Part 3: Memory Part 4 of 6 Next: Deep-Dives O–W →

Sources & References

GitHub Repositories: Aider, Claude Code (Anthropic), Cline, Codex CLI, Droid/Factory, Goose, Letta

Benchmarks: SWE-bench, Terminal-Bench, Aider Polyglot Leaderboard

Platforms: Factory.ai, Letta, Model Context Protocol

Part 4 of 6 · Coding Agent Engineering Analysis · January 2026