This report is split into six focused parts for manageable reading. Each section covers a specific aspect of coding agent engineering:
Tool inventories for all 13 agents, ACI design principles, edit patterns (search-replace vs diff vs whole-file), and tool success rate analysis.
Claude Code's 8-event hook lifecycle, MCP ecosystem, OS-level sandboxing (Seatbelt/Landlock), approval workflows, and checkpoint systems.
Session-based vs persistent memory, Letta's memory blocks, Replit's trajectory compression, Aider's repo map, and compaction strategies.
Architecture diagrams and analysis for Aider, Claude Code, Cline, Codex CLI, Droid, Goose, and Letta Code.
Architecture diagrams and analysis for OpenCode, OpenManus, Qwen Code, Replit Agent, Vibe CLI, and Warp.
Context window management, error recovery patterns, prompt injection risks, enterprise deployment, and decision framework for choosing an agent.
Based on analysis of all 13 agents, here is the recommended architecture for a competitive coding agent. This synthesizes the best patterns: Claude Code's hook system, Codex CLI's sandboxing, Letta's memory blocks, OpenCode's LSP integration, Warp's multi-model composition, and OpenManus's ReAct hierarchy.
| Component | Recommended | Rationale |
|---|---|---|
| Language | Rust + TypeScript | Rust for core/sandbox (Codex CLI pattern), TS for extensions/UI (Claude Code pattern) |
| TUI Framework | Ratatui (Rust) or Bubble Tea (Go) | Native performance, rich terminal UI; Warp uses GPU-rendered Rust |
| IDE Extension | VS Code Extension API | Widest reach; Cline and Qwen Code prove the model |
| IPC Protocol | gRPC with Protocol Buffers | Type-safe, efficient; OpenCode and Warp validate this approach |
| State Storage | SQLite + JSONL | Local-first, portable, auditable; Codex CLI uses RolloutRecorder |
| Tool Extensions | MCP (Model Context Protocol) | Industry standard, 3000+ servers, Linux Foundation governance |
| Agent Framework | ReAct loop with stuck detection | OpenManus validates 4-level hierarchy: Base → ReAct → ToolCall → Domain |
| Sandboxing | Seatbelt (macOS) + Landlock (Linux) | Codex CLI proves OS-native sandboxing is production-ready |
| Agent | Vendor | Type | License | Stars | Key Differentiator |
|---|---|---|---|---|---|
| Aider | Open Source | CLI | Apache-2.0 | ~25k | Architect/Editor dual-model, repo map via tree-sitter |
| Claude Code | Anthropic | CLI | Proprietary | — | 8-event hook system, 18 built-in tools, subagent dispatch |
| Cline | Community | VS Code Ext | Apache-2.0 | ~30k | Shadow git checkpoints, 33+ LLM providers, browser automation |
| Codex CLI | OpenAI | CLI (Rust) | Apache-2.0 | ~18k | OS-native sandboxing (Seatbelt/Landlock/seccomp) |
| Droid | Factory.ai | CLI | Proprietary | ~504 | HyperCode/ByteRank retrieval, Terminal-Bench #1 |
| Goose | Block (Square) | CLI + Desktop | Apache-2.0 | ~10k | MCP-first architecture, 3000+ extension ecosystem |
| Letta Code | Letta | CLI | Apache-2.0 | — | Persistent memory blocks, archival vector DB, skill learning |
| OpenCode | Community | CLI (Go) | MIT | ~5k | LSP integration, 75+ LLM providers, client-server design |
| OpenManus | MetaGPT | Framework | MIT | ~53.9k | 4-level ReAct hierarchy, PlanningFlow orchestration |
| Qwen Code | Alibaba | CLI (TS) | Apache-2.0 | ~17.9k | Free 2000 req/day, forked from Gemini CLI, Docker sandbox |
| Replit Agent | Replit | Web IDE | Proprietary | — | Python DSL tool invocation, 200-min autonomy, self-testing |
| Vibe CLI | Mistral | CLI | Apache-2.0 | — | Cheapest inference ($0.40/1M input), single-GPU deployable |
| Warp | Warp | ADE | Proprietary | — | Full Terminal Control (interactive PTY), GPU-rendered Rust UI |
| Agent | SWE-bench Verified | Terminal-Bench | Other Benchmarks |
|---|---|---|---|
| Warp | 75.8% (GPT-5) | 52% (#1) | 3.2B total lines edited |
| Codex CLI | 74.9% (GPT-5) | 42.8% | Code review mode |
| Vibe CLI | 72.2% (Devstral 2) | — | 68.0% (Small 2, 24B) |
| Claude Code | ~72% (Sonnet 4) | 43.2% | 200k context window |
| Qwen Code | 67–69.6% | 37.5% (480B) | SOTA open-source agentic coding |
| Droid | 31.67% (Lite) | 58.8% (#1) | Top 3 across 3 models |
| Replit Agent | 2nd place (Lite) | — | 135 apps in 24h (Rokt) |
| OpenManus | — | — | GAIA: 74.3% |
| Aider | — | — | Architect+Editor: 85% (polyglot) |
| Cline | — | — | 33+ provider support |
| OpenCode | — | — | LSP-assisted editing |
| Goose | — | — | 3000+ MCP extensions |
| Letta Code | — | — | #1 model-agnostic on TerminalBench |
SWE-bench scores depend heavily on the underlying model, not just the agent harness. Factory (Droid) stopped running SWE-bench citing its Python-only, debugging-only limitations. Terminal-Bench provides a more holistic evaluation spanning coding, build/test, data/ML, systems, networking, security, and CLI workflows across 80 Dockerized tasks.
| Agent | Pricing Model | Free Tier | Open Source |
|---|---|---|---|
| Aider | Free (BYOK) | ✓ Unlimited | ✓ Apache-2.0 |
| Claude Code | $20–200/mo (API usage) | ✗ | ✗ Proprietary |
| Cline | Free (BYOK) | ✓ Unlimited | ✓ Apache-2.0 |
| Codex CLI | ChatGPT plan included | With plan | ✓ Apache-2.0 |
| Droid | Free–$40/mo + overage | ✓ (BYOK) | ✗ Proprietary |
| Goose | Free (BYOK) | ✓ Unlimited | ✓ Apache-2.0 |
| Letta Code | Free (BYOK + Letta server) | ✓ Unlimited | ✓ Apache-2.0 |
| OpenCode | Free (BYOK) | ✓ Unlimited | ✓ MIT |
| OpenManus | Free (BYOK) | ✓ Unlimited | ✓ MIT |
| Qwen Code | Free (OAuth: 2000 req/day) | ✓ 2000/day | ✓ Apache-2.0 |
| Replit Agent | $0–35/mo + credits | Limited daily | ✗ Proprietary |
| Vibe CLI | Free API (promotional) | ✓ | ✓ Apache-2.0 |
| Warp | Free–Pro (100–10k req/mo) | ✓ 100 req/mo | ✗ Proprietary |
| Model | Input | Output | Used By |
|---|---|---|---|
| Devstral Small 2 (24B) | $0.10 | $0.30 | Vibe CLI |
| Devstral 2 (123B) | $0.40 | $2.00 | Vibe CLI |
| Qwen3-Coder-480B | Free (OAuth) | Free (OAuth) | Qwen Code |
| Claude Sonnet 4 | $3.00 | $15.00 | Claude Code, Warp, Droid |
| Claude Opus 4 | $15.00 | $75.00 | Claude Code, Droid |
| GPT-5 | ~$2.50 | ~$10.00 | Codex CLI, Warp, Droid |
GitHub Repositories: Claude Code (Anthropic), Codex CLI, Qwen Code, OpenManus, Aider, Cline, Goose, OpenCode, Droid/Factory
Benchmarks: SWE-bench, Terminal-Bench, GAIA
Platforms: Warp, Replit, Factory.ai, Mistral, Letta
Document Version: 3.0 Enhanced Edition · January 2026 · Classification: Internal Engineering Document