Coding Agent Engineering Analysis

This report is split into six focused parts for manageable reading. Each section covers a specific aspect of coding agent engineering:

Key Technical Findings

Hook Systems Are Critical: Claude Code's 8-event hook lifecycle (UserPromptSubmit, PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, PreCompact, Notification) provides the most comprehensive extensibility model.
Sandboxing Divergence: Codex CLI uses OS-native sandboxing (Seatbelt/Landlock/seccomp). Warp uses cloud sandboxes via Namespace. Most others rely on permission-based approval workflows.
Memory Architecture Split: Letta Code's persistent memory blocks vs. Replit's trajectory compression vs. session-based approaches (most others). Each trades differently between continuity and context efficiency.
Multi-Model Composition Wins: Warp, Droid, and Replit compose multiple LLMs (reasoning + coding models). Warp found single-agent with focused tools outperforms multi-agent approaches.
LSP Integration Emerging: OpenCode's Language Server Protocol integration provides deterministic code intelligence, reducing edit hallucination.
MCP as Universal Standard: MCP is now governed by the Linux Foundation's AAIF with 3,000+ servers. Every major agent supports it.
Open-Source Parity: Qwen Code (67–69.6% SWE-bench) and OpenManus (53.9k GitHub stars) show open-source agents reaching near-parity with proprietary solutions.
Tool Invocation Innovation: Replit Agent uses Python DSL code generation instead of JSON function calling, achieving ~90% tool invocation success rate.

Recommended Architecture

Based on analysis of all 13 agents, here is the recommended architecture for a competitive coding agent. This synthesizes the best patterns: Claude Code's hook system, Codex CLI's sandboxing, Letta's memory blocks, OpenCode's LSP integration, Warp's multi-model composition, and OpenManus's ReAct hierarchy.

RECOMMENDED CODING AGENT ARCHITECTURE ═══════════════════════════════════════════════════════════════════ ┌─────────────────────────────────────────────────────────────────┐ │ MULTI-INTERFACE LAYER │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ CLI │ │ VS Code │ │ JetBrains│ │ GitHub │ │ │ │ (TUI) │ │Extension │ │ Plugin │ │ Action │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ └───────┼─────────────┼────────────┼─────────────┼───────────────┘ └─────────────┴──────┬─────┴─────────────┘ │ gRPC / Protocol ┌────────────────────────────┼───────────────────────────────────┐ │ CORE AGENT ENGINE │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ MULTI-MODEL COMPOSITION │ │ │ │ Planning: Reasoning Model (o3/Opus) │ │ │ │ Execution: Code Model (Sonnet/GPT-5/Devstral) │ │ │ │ Fallback: Alternate provider (rate limit recovery) │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ HOOK SYSTEM (8 events) │ │ │ │ SessionStart → UserPromptSubmit → PreToolUse → │ │ │ │ [Execute] → PostToolUse → Stop/SubagentStop │ │ │ │ PreCompact, Notification │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ TOOL ORCHESTRATOR │ │ │ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ │ │ File │ │Search │ │ Exec │ │ Web │ │ Memory │ │ │ │ │ │ Ops │ │(+LSP) │ │(sand- │ │ (MCP) │ │ Blocks │ │ │ │ │ │ │ │ │ │ boxed) │ │ │ │ │ │ │ │ │ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ SUBAGENT DISPATCHER │ │ │ │ Task (parallel) │ Explore (read-only) │ Plan (design) │ │ │ └──────────────────────────────────────────────────────────┘ │ └────────────────────────────────────────────────────────────────┘ │ ┌────────────────────────────┼───────────────────────────────────┐ │ PERSISTENCE LAYER │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │Session State│ │Memory Blocks│ │Skill Library│ │ │ │(SQLite/JSONL│ │(Server-side)│ │(.md files) │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ └────────────────────────────────────────────────────────────────┘ │ ┌────────────────────────────┼───────────────────────────────────┐ │ SECURITY LAYER │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ OS Sandbox │ │ Approval │ │ Checkpoint │ │ │ │ (Seatbelt/ │ │ Workflow │ │ System │ │ │ │ Landlock) │ │(configurable│ │(shadow git) │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ └────────────────────────────────────────────────────────────────┘

Component	Recommended	Rationale
Language	Rust + TypeScript	Rust for core/sandbox (Codex CLI pattern), TS for extensions/UI (Claude Code pattern)
TUI Framework	Ratatui (Rust) or Bubble Tea (Go)	Native performance, rich terminal UI; Warp uses GPU-rendered Rust
IDE Extension	VS Code Extension API	Widest reach; Cline and Qwen Code prove the model
IPC Protocol	gRPC with Protocol Buffers	Type-safe, efficient; OpenCode and Warp validate this approach
State Storage	SQLite + JSONL	Local-first, portable, auditable; Codex CLI uses RolloutRecorder
Tool Extensions	MCP (Model Context Protocol)	Industry standard, 3000+ servers, Linux Foundation governance
Agent Framework	ReAct loop with stuck detection	OpenManus validates 4-level hierarchy: Base → ReAct → ToolCall → Domain
Sandboxing	Seatbelt (macOS) + Landlock (Linux)	Codex CLI proves OS-native sandboxing is production-ready

Agent	Vendor	Type	License	Stars	Key Differentiator
Aider	Open Source	CLI	Apache-2.0	~25k	Architect/Editor dual-model, repo map via tree-sitter
Claude Code	Anthropic	CLI	Proprietary	—	8-event hook system, 18 built-in tools, subagent dispatch
Cline	Community	VS Code Ext	Apache-2.0	~30k	Shadow git checkpoints, 33+ LLM providers, browser automation
Codex CLI	OpenAI	CLI (Rust)	Apache-2.0	~18k	OS-native sandboxing (Seatbelt/Landlock/seccomp)
Droid	Factory.ai	CLI	Proprietary	~504	HyperCode/ByteRank retrieval, Terminal-Bench #1
Goose	Block (Square)	CLI + Desktop	Apache-2.0	~10k	MCP-first architecture, 3000+ extension ecosystem
Letta Code	Letta	CLI	Apache-2.0	—	Persistent memory blocks, archival vector DB, skill learning
OpenCode	Community	CLI (Go)	MIT	~5k	LSP integration, 75+ LLM providers, client-server design
OpenManus	MetaGPT	Framework	MIT	~53.9k	4-level ReAct hierarchy, PlanningFlow orchestration
Qwen Code	Alibaba	CLI (TS)	Apache-2.0	~17.9k	Free 2000 req/day, forked from Gemini CLI, Docker sandbox
Replit Agent	Replit	Web IDE	Proprietary	—	Python DSL tool invocation, 200-min autonomy, self-testing
Vibe CLI	Mistral	CLI	Apache-2.0	—	Cheapest inference ($0.40/1M input), single-GPU deployable
Warp	Warp	ADE	Proprietary	—	Full Terminal Control (interactive PTY), GPU-rendered Rust UI

Agent	SWE-bench Verified	Terminal-Bench	Other Benchmarks
Warp	75.8% (GPT-5)	52% (#1)	3.2B total lines edited
Codex CLI	74.9% (GPT-5)	42.8%	Code review mode
Vibe CLI	72.2% (Devstral 2)	—	68.0% (Small 2, 24B)
Claude Code	~72% (Sonnet 4)	43.2%	200k context window
Qwen Code	67–69.6%	37.5% (480B)	SOTA open-source agentic coding
Droid	31.67% (Lite)	58.8% (#1)	Top 3 across 3 models
Replit Agent	2nd place (Lite)	—	135 apps in 24h (Rokt)
OpenManus	—	—	GAIA: 74.3%
Aider	—	—	Architect+Editor: 85% (polyglot)
Cline	—	—	33+ provider support
OpenCode	—	—	LSP-assisted editing
Goose	—	—	3000+ MCP extensions
Letta Code	—	—	#1 model-agnostic on TerminalBench

Agent	Pricing Model	Free Tier	Open Source
Aider	Free (BYOK)	✓ Unlimited	✓ Apache-2.0
Claude Code	$20–200/mo (API usage)	✗	✗ Proprietary
Cline	Free (BYOK)	✓ Unlimited	✓ Apache-2.0
Codex CLI	ChatGPT plan included	With plan	✓ Apache-2.0
Droid	Free–$40/mo + overage	✓ (BYOK)	✗ Proprietary
Goose	Free (BYOK)	✓ Unlimited	✓ Apache-2.0
Letta Code	Free (BYOK + Letta server)	✓ Unlimited	✓ Apache-2.0
OpenCode	Free (BYOK)	✓ Unlimited	✓ MIT
OpenManus	Free (BYOK)	✓ Unlimited	✓ MIT
Qwen Code	Free (OAuth: 2000 req/day)	✓ 2000/day	✓ Apache-2.0
Replit Agent	$0–35/mo + credits	Limited daily	✗ Proprietary
Vibe CLI	Free API (promotional)	✓	✓ Apache-2.0
Warp	Free–Pro (100–10k req/mo)	✓ 100 req/mo	✗ Proprietary

Model	Input	Output	Used By
Devstral Small 2 (24B)	$0.10	$0.30	Vibe CLI
Devstral 2 (123B)	$0.40	$2.00	Vibe CLI
Qwen3-Coder-480B	Free (OAuth)	Free (OAuth)	Qwen Code
Claude Sonnet 4	$3.00	$15.00	Claude Code, Warp, Droid
Claude Opus 4	$15.00	$75.00	Claude Code, Droid
GPT-5	~$2.50	~$10.00	Codex CLI, Warp, Droid

Coding Agent Engineering Analysis

Deep-Dive Sections

Tools & Agent Computer Interface

Hooks, MCP & Security

Memory & Context Management

Agent Deep-Dives: A–L

Agent Deep-Dives: O–W

Production & Enterprise

Key Technical Findings

Recommended Architecture

Implementation Priority

Technology Stack Recommendations

Agent Landscape: 13 Tools Analyzed

Benchmark Comparison

⚠️ Benchmark Caveats

Cost & Licensing

💡 Model Cost Comparison (per 1M tokens)

Sources & References