This section provides a comprehensive analysis of the tool systems that power modern AI coding agents. We examine tool inventories across all 13 agents, covering file operations, search and navigation, command execution, web access, memory systems, and planning capabilities. The analysis reveals that tool design is not merely an implementation detail but a primary determinant of agent performance: SWE-agent's research demonstrated that well-designed Agent Computer Interfaces (ACI) improved SWE-bench pass rates by +12.29% over standard bash-only approaches.
Key themes include the convergence toward search-replace edit patterns (adopted by Claude Code, Qwen Code, and Droid), the emergence of MCP as a universal tool extension protocol, and Replit's innovative Python DSL approach to tool invocation that achieves ~90% success rates while reducing costs by 15%. We also cover ACI design principles, edit tool pattern trade-offs, and a comprehensive tool comparison matrix spanning all 13 agents.
Think of a coding agent as a skilled developer who can only interact with the world through a specific set of buttons on a control panel. Each "tool" is one of those buttons: one reads files, one writes code, one runs terminal commands, one searches for patterns. The quality and design of those buttons determine how effective the developer is. A poorly labeled button causes mistakes. A button that does too many things at once is confusing. The best agents have carefully designed control panels where every button does exactly one thing, gives clear feedback, and prevents accidents. That's what Agent Computer Interface (ACI) design is all about: building the perfect control panel for AI.
All 13 analyzed agents implement variations of core tool categories. While naming conventions and implementation details vary, the fundamental capabilities have converged around six primary categories. The table below maps each category to its common implementations and the critical design decisions that differentiate agents.
| Category | Common Tools | Implementation Notes |
|---|---|---|
| File Operations | Read, Write, Edit, Patch, Create, NotebookEdit | Edit tools vary significantly: search-replace (Claude Code, Qwen Code), line-range (SWE-agent), whole-file replacement (simple agents), diff-based (Aider), and FIND_AND_REPLACE + V4A diff (Droid). Search-replace is now the dominant pattern due to token efficiency and precision. |
| Search & Navigation | Grep, Glob, Find, LSP, ReadManyFiles | ripgrep integration is near-universal. LSP (OpenCode) provides semantic search with go-to-definition and find-references. Droid's HyperCode/ByteRank provides codebase retrieval optimized for LLM context. Warp uses codebase embeddings for semantic search. |
| Execution | Bash, Shell, Command, PythonExecute | Sandboxing varies widely: OS-native (Codex CLI via Seatbelt/Landlock), Docker containers (Qwen Code), cloud sandboxes (Warp via Namespace), permission-based approval (Claude Code, Cline). Background process support differs by agent. |
| Version Control | Git status, diff, commit, checkout | Auto-commit policies differ: Aider commits after every change with descriptive messages, Cline uses shadow git for checkpoints. Claude Code follows a strict git safety protocol (never force push, never skip hooks). |
| Web & Browser | WebFetch, WebSearch, BrowserUseTool, Puppeteer, Crawl4AI | Browser automation via MCP (Playwright) or native integration (Cline's Puppeteer). OpenManus includes Crawl4AI for structured web scraping. Codex CLI disables network by default to prevent prompt injection via malicious URLs. |
| Context & Memory | TodoWrite, Memory, Skill, save_memory | Task tracking (TodoWrite) is common across Claude Code, Qwen Code, and others. Letta Code has persistent memory blocks (persona, human, project, skills). Qwen Code includes save_memory for cross-session knowledge. Most agents rely on session-only context. |
| Planning & Orchestration | PlanningTool, EnterPlanMode, ExitPlanMode, Task, AskHuman | Claude Code and OpenManus include explicit planning tools. Claude Code's Task tool spawns isolated subagents with limited context. OpenManus uses PlanningFlow for multi-step orchestration. Replit's trajectory compression preserves planning decisions. |
| Extension & Integration | MCP bridge, SlashCommand, Skill | MCP is the universal standard with 3,000+ servers (Linux Foundation governance). Every major agent supports it. Goose is MCP-first architecture. OpenManus bridges MCP tools into its agent hierarchy. |
Droid, which tops Terminal-Bench at 58.8%, uses a deliberately minimalist tool design. Factory.ai found that complex tool schemas "exponentially increase error rates" for LLMs. Conversely, Claude Code's 18 tools and OpenManus's 15+ tools achieve their performance through careful ACI design rather than sheer tool count. The critical factor is how well tool interfaces are optimized for LLM comprehension, not how many tools are available.
Each agent makes distinctive architectural choices in its tool system. Below we examine the tool implementations of eight agents in detail, revealing how design philosophy shapes capability.
Claude Code provides the most extensively documented tool system, with 18 built-in tools organized into seven categories. The system prompt (comprising 110+ parts) includes detailed usage instructions for each tool, ensuring the LLM understands constraints like "old_string must be unique in file" for the Edit tool.
CLAUDE CODE TOOL INVENTORY (from system prompt analysis):
├── File Tools
│ ├── Read - Read file contents with line range support
│ ├── Write - Create new files (atomic operation)
│ ├── Edit - String replacement (must be unique in file)
│ ├── NotebookEdit - Jupyter notebook cell editing
│ └── Glob - File pattern matching (sorted by modification time)
├── Search Tools
│ ├── Grep - Content search via ripgrep (regex support)
│ └── LSP - Language Server Protocol queries
├── Execution Tools
│ ├── Bash - Shell command execution (2-min default timeout)
│ └── Computer - Chrome browser automation
├── Planning Tools
│ ├── EnterPlanMode - Switch to planning mode (no edits)
│ ├── ExitPlanMode - Present plan for user approval
│ └── TodoWrite - Task list management (persistent across compaction)
├── Agent Tools
│ ├── Task - Spawn subagent with isolated context window
│ └── SlashCommand - Execute registered skill commands
├── Web Tools
│ ├── WebFetch - Retrieve URL contents (markdown conversion)
│ └── WebSearch - Search engine queries
└── Utility Tools
├── Skill - Load skill .md files from .claude/skills/
└── Memory - Edit persistent CLAUDE.md memory blocks
old_string must be unique within the file. If it appears multiple times, the tool fails and the agent must provide more surrounding context. This prevents ambiguous edits at the cost of occasional retries.git add -A (may include sensitive files) and grep in bash (prefer the built-in Grep tool for proper permissions).Codex CLI (OpenAI) is built entirely in Rust, with a crate-based architecture that separates core logic, sandbox implementation, and execution policy. The tool system is tightly coupled with OS-level sandboxing, making security a first-class concern at the tool layer.
CODEX CLI CRATE STRUCTURE:
codex-rs/
├── core/ # Business logic (reusable library)
│ ├── ThreadManager - Conversation state management
│ ├── ModelClient - API communication (OpenAI)
│ └── ToolOrchestrator - Sandboxed tool execution
├── execpolicy/ # Execution policy engine
│ ├── on-request - Ask before every action
│ ├── workspace-write - Auto-approve within workspace
│ └── danger-full-access - No restrictions (CI/VM only)
├── linux-sandbox/ # Landlock + seccomp (separate binary)
│ ├── Landlock rules - Filesystem access control
│ └── seccomp filters - Syscall-level network blocking
├── windows-sandbox-rs/ # Windows restricted tokens
├── mcp-server/ # MCP protocol support
├── file-search/ # Repository search capabilities
├── keyring-store/ # Secure credential storage
└── otel/ # OpenTelemetry observability
TOOL EXECUTION FLOW:
User Request → Policy Check → Sandbox Setup → Execute → Record → Return
│ │
│ └── Seatbelt (macOS) / Landlock (Linux)
└── on-request / workspace-write / danger-full-access
SESSION RECORDING (RolloutRecorder):
- Every tool call recorded as JSONL
- Enables audit replay and debugging
- OpenTelemetry integration for enterprise logging
SWE-agent introduced the foundational concept of Agent Computer Interface (ACI) -- designing tools specifically for LLM ergonomics rather than human ergonomics. Their custom commands replace standard Unix utilities with LLM-friendly alternatives.
SWE-AGENT ACI TOOL DESIGN: ┌─────────────────────────────────────────────────────────────────┐ │ DESIGN PRINCIPLES: │ │ 1. Simple, parseable output formats │ │ 2. Guardrails to prevent/correct errors │ │ 3. Compact, efficient actions │ │ 4. Context management for conciseness │ └─────────────────────────────────────────────────────────────────┘ CUSTOM COMMANDS (replacing standard Unix tools): ├── find_file - Locate files by name (vs. find -name) ├── search_file - Search within a single file ├── search_dir - Recursive directory search (vs. grep -r) ├── open - View file in scrollable window ├── goto - Jump to specific line number ├── scroll_up/down - Navigate file view (windowed display) ├── create - Create new file with validation ├── edit - Line-range replacement with linter check └── submit - Generate patch output for evaluation SPECIAL FEATURES: - Linter integration: Blocks syntactically invalid edits BEFORE applying - Observation collapsing: Past tool outputs → single-line summaries - Error recovery: Automatic retry on malformed LLM output - Window-based viewing: Shows 100-line window (not entire file) IMPACT ON PERFORMANCE: - ACI-optimized tools: 12.29% improvement over bash-only baselines - Linter guardrail alone prevents ~30% of syntax errors from persisting
OpenManus (MetaGPT, 53.9k GitHub stars) implements the richest tool ecosystem through a 4-level agent hierarchy. Each level inherits and extends capabilities from its parent, enabling both simple tool calls and complex multi-step orchestration.
OPENMANUS 4-LEVEL AGENT HIERARCHY:
═══════════════════════════════════════════════════════════════
Level 1: BaseAgent
│ Foundation: Memory management, state tracking, LLM interface
│
└── Level 2: ReActAgent
│ Adds: Reasoning-Action loop (think → act → observe → repeat)
│
└── Level 3: ToolCallAgent
│ Adds: Tool registration, execution, result parsing
│
└── Level 4: Manus (Domain Agent)
Adds: Domain-specific tools, PlanningFlow, browser control
TOOL INVENTORY (15+ tools):
├── Code Execution
│ ├── PythonExecute - Run Python code in sandboxed environment
│ └── Bash - Shell command execution
├── Web & Browser
│ ├── BrowserUseTool - Full browser automation (Playwright-based)
│ ├── WebSearch - Search engine queries
│ └── Crawl4AI - Structured web scraping and extraction
├── File Operations
│ ├── FileOperators - Read, write, list, copy, move, delete
│ └── StrReplaceEditor - Search-and-replace editing (Claude Code pattern)
├── Planning
│ └── PlanningTool - Multi-step plan creation and tracking
├── System
│ ├── ComputerUseTool - Desktop GUI automation
│ ├── AskHuman - Request human input/clarification
│ └── Terminate - End agent execution
└── Integration
└── MCP Bridge - Connect to any MCP server as tool source
PLANNINGFLOW ORCHESTRATION:
┌──────────────────────────────────────────┐
│ 1. Decompose task into sub-goals │
│ 2. Assign tools to each sub-goal │
│ 3. Execute with ReAct loop per step │
│ 4. Verify results against plan │
│ 5. Adjust plan if needed (re-plan) │
└──────────────────────────────────────────┘
Qwen Code (Alibaba, 17.9k stars) is forked from Gemini CLI and built in TypeScript. It provides free access to Qwen3-Coder-480B via OAuth (2,000 requests/day) and implements a tool set deliberately modeled after Claude Code's design, including identical tool names.
QWEN CODE TOOL INVENTORY (13 tools):
├── File Operations
│ ├── ReadFile - Read file with line range support
│ ├── ReadManyFiles - Batch file reading (reduces turn count)
│ ├── WriteFile - Create or overwrite files
│ └── Edit - Search-replace editing (Claude Code pattern)
├── Search
│ ├── Glob - File pattern matching
│ └── Grep - Content search via ripgrep
├── Execution
│ └── run_shell_command - Shell execution with Docker sandbox option
├── Web
│ ├── web_fetch - URL content retrieval
│ └── web_search - Search engine integration
├── Memory & Planning
│ ├── save_memory - Persist knowledge across sessions
│ ├── todo_write - Task list management
│ └── exit_plan_mode - Transition from planning to execution
└── Agent
└── task - Subagent delegation
ARCHITECTURE NOTES:
- Language: TypeScript (forked from Gemini CLI codebase)
- Sandbox: Docker container option for command execution
- Free Tier: 2,000 requests/day via OAuth authentication
- Model: Qwen3-Coder-480B (67-69.6% SWE-bench Verified)
- Context: Supports both Qwen and third-party models via BYOK
DESIGN PHILOSOPHY:
- Mirror Claude Code's tool naming for developer familiarity
- ReadManyFiles tool reduces round-trips for multi-file tasks
- save_memory enables cross-session knowledge persistence
- Docker sandbox provides isolation without OS-level complexity
Droid (Factory.ai) takes the opposite approach to tool proliferation. Factory's research found that complex tool schemas "exponentially increase error rates" for LLMs. Droid achieves Terminal-Bench #1 (58.8%) with a deliberately minimal tool surface.
DROID TOOL DESIGN PHILOSOPHY:
═══════════════════════════════════════════════════════════════
PRINCIPLE: "Fewer tools, better schemas, higher success rates"
┌─────────────────────────────────────────────────────────────┐
│ Complex schemas with many parameters → MORE LLM errors │
│ Simple schemas with clear semantics → FEWER LLM errors │
│ │
│ Factory.ai finding: Error rates increase EXPONENTIALLY │
│ with tool schema complexity │
└─────────────────────────────────────────────────────────────┘
CORE EDITING TOOLS:
├── FIND_AND_REPLACE - Search-replace with clear semantics
└── V4A Diff - Structured diff format for larger changes
CODEBASE RETRIEVAL (proprietary):
├── HyperCode - Semantic code search engine
│ └── Optimized for finding relevant code across large repos
└── ByteRank - Code ranking algorithm
└── Prioritizes most relevant files for LLM context
MULTI-MODEL COMPOSITION:
┌───────────────┐ ┌───────────────┐
│ Planning │────▶│ Execution │
│ (Reasoning) │ │ (Coding) │
│ o3 / Opus │ │ Sonnet / GPT │
└───────────────┘ └───────────────┘
BENCHMARK RESULTS:
- Terminal-Bench: 58.8% (#1 overall)
- SWE-bench Lite: 31.67%
- Top 3 performance across 3 different underlying models
- Stopped running SWE-bench Verified (Python-only limitation)
Droid's success with minimal tooling challenges the assumption that more tools equals better performance. The lesson: invest in tool schema clarity over tool quantity. Every additional parameter in a tool schema is a potential point of failure. When designing agent tools, start with the minimum viable set and add tools only when measurable performance gains justify the added complexity.
Warp (75.8% SWE-bench Verified with GPT-5) is an Agent Development Environment (ADE) that reimagines the terminal as a first-class AI development surface. Its distinguishing feature is Full Terminal Control -- the ability to drive interactive terminal sessions through a PTY (pseudo-terminal).
WARP TOOL CATEGORIES:
═══════════════════════════════════════════════════════════════
File Editing:
├── File read/write operations
└── Code modification tools
Code Understanding:
├── grep - Content search
├── find - File discovery
├── cat - File viewing
└── Codebase embeddings (semantic search across entire repo)
Command Execution:
└── Full Terminal Control (FTC)
├── Interactive PTY driver
│ └── Can handle: prompts, confirmations, pagination
├── Reads terminal output in real-time
├── Sends keystrokes (including Ctrl-C, arrow keys)
└── Manages interactive tools (vim, less, ssh, etc.)
Planning & Tracking:
└── TODO generation and task management
Integration:
├── MCP client support
└── Codebase embeddings for semantic retrieval
FULL TERMINAL CONTROL (FTC) - UNIQUE DIFFERENTIATOR:
┌─────────────────────────────────────────────────────────────┐
│ Traditional Agent: │
│ agent → exec("npm install") → wait → read stdout │
│ FAILS on: interactive prompts, sudo, ssh, vim │
│ │
│ Warp FTC: │
│ agent → PTY.spawn("ssh server") → │
│ ← "Password: " │
│ agent → PTY.write(password + "\n") → │
│ ← "server$ " │
│ agent → PTY.write("tail -f /var/log/app.log\n") → │
│ ← (streaming log output) │
│ agent → PTY.write("\x03") → (Ctrl-C to stop) │
│ │
│ This enables Warp to handle ANY terminal workflow, │
│ including interactive installers, debuggers, and REPLs │
└─────────────────────────────────────────────────────────────┘
CLOUD SANDBOX:
- Powered by Namespace
- Isolated execution environment
- Prevents local machine damage
- GPU-rendered Rust UI for performance
ARCHITECTURAL FINDING:
"Single-agent with focused tools outperforms multi-agent approaches"
- Warp tested multi-agent setups and found degraded performance
- Better to give one agent the right tools than split across agents
Replit Agent introduces the most innovative tool invocation mechanism: instead of JSON function calling, the agent generates Python code that calls tool functions. This approach achieves ~90% tool invocation success rate and enables multi-tool calling within a single generation.
REPLIT TOOL INVOCATION: PYTHON DSL
═══════════════════════════════════════════════════════════════
TRADITIONAL (JSON Function Calling):
{
"name": "edit_file",
"arguments": {
"path": "src/app.py",
"old_text": "def hello():",
"new_text": "def hello(name):"
}
}
→ One tool per generation
→ Rigid schema validation
→ No inter-tool dependencies
REPLIT (Python DSL Code Generation):
```python
# Agent generates executable Python
file_content = read_file("src/app.py")
if "def hello():" in file_content:
edit_file("src/app.py",
old_text="def hello():",
new_text="def hello(name):")
run_command("python -m pytest tests/")
```
→ Multiple tools in single generation
→ Conditional logic between tools
→ Natural programming patterns
PERFORMANCE IMPACT:
┌─────────────────────────────────────────┐
│ Success Rate: ~90% tool invocation │
│ Cost Savings: 15% (fewer turns) │
│ Speed: 30% faster execution │
│ Autonomy: 200-minute sessions │
│ Scale Proof: 135 apps in 24 hours │
│ (Rokt partnership) │
└─────────────────────────────────────────┘
MULTI-TOOL CALLING BENEFITS:
- Read → Check → Edit → Test in ONE generation
- Eliminates round-trip latency between tool calls
- Agent can express conditional logic naturally
- Reduces total tokens consumed (fewer system prompts)
TRAJECTORY COMPRESSION:
- Long sessions compressed to key decision points
- Preserves: file changes, test results, errors
- Discards: redundant reads, intermediate states
- Enables 200-minute autonomous sessions
Replit's Python DSL approach suggests a fundamental rethinking of tool invocation. Rather than constraining LLMs to rigid JSON schemas, letting them express tool usage in a programming language they already understand (Python) leverages their strongest capability. The 15% cost savings and 30% speed improvement come from reducing round-trips: instead of call-wait-call-wait, the agent plans multiple tool calls in a single code block. This pattern may become industry-standard as agents handle increasingly complex workflows.
The concept of Agent Computer Interface was introduced by the SWE-agent paper, arguing that tools should be designed for LLM ergonomics rather than human ergonomics. Just as Human-Computer Interaction (HCI) shaped GUI design, ACI shapes how AI agents interact with development environments. The core insight: LLMs process text differently than humans, so optimal tool interfaces diverge from traditional CLI design.
Format tool output so LLMs can easily extract key information. Avoid ambiguous formatting, excessive decoration, or interleaved content. Numbered results, count-first summaries, and clear delimiters outperform raw command output.
Build validation directly into tools. SWE-agent's linter integration blocks syntactically invalid edits before they are applied, preventing ~30% of syntax errors from persisting. Claude Code's uniqueness constraint on the Edit tool forces the agent to provide sufficient context, preventing ambiguous replacements.
Consolidate multi-step operations into single tools. Qwen Code's ReadManyFiles tool reads multiple files in one call instead of requiring N sequential Read calls. Claude Code's parallel tool execution achieves the same effect at the model level.
Collapse old observations to single-line summaries. Keep recent context detailed, older context summarized. SWE-agent's observation collapsing converts previous tool outputs to summaries, preserving context window budget for the current task.
TRADITIONAL BASH (human-optimized): ───────────────────────────────────────────────────────────────── $ grep -r "TODO" --include="*.js" src/ src/utils.js:5:// TODO: Refactor this function src/api.js:23:// TODO: Add error handling src/api.js:45:// TODO: Implement caching src/auth.js:12:// TODO: Add rate limiting src/db.js:67:// TODO: Connection pooling ... (output continues for 50+ lines with no structure) PROBLEMS FOR LLMs: - No count or summary upfront - No way to know if output is truncated - Inconsistent formatting across tools - No suggested next action - Raw paths require agent to parse mentally SWE-AGENT ACI (LLM-optimized): ───────────────────────────────────────────────────────────────── > search_dir "TODO" src/ --extensions js Found 5 results in 4 files: [1] src/utils.js:5 - "TODO: Refactor this function" [2] src/api.js:23 - "TODO: Add error handling" [3] src/api.js:45 - "TODO: Implement caching" [4] src/auth.js:12 - "TODO: Add rate limiting" [5] src/db.js:67 - "TODO: Connection pooling" To view context around a result, use: open <file> <line> KEY ACI IMPROVEMENTS: ───────────────────────────────────────────────────────────────── Feature │ Traditional CLI │ ACI-Optimized ───────────────────┼────────────────────┼────────────────── Result count │ Not shown │ "Found 5 results" Numbering │ None │ [1], [2], [3]... Next step guidance │ None │ "use: open <file>" Truncation │ Silent │ Explicit notice Output format │ Varies by tool │ Consistent schema Error messages │ Cryptic stderr │ Actionable guidance
When Claude Code's system prompt says "NEVER use grep in Bash -- use the built-in Grep tool instead," this is an ACI principle in action. The built-in Grep tool provides structured output, proper permissions handling, and consistent formatting. Raw grep output can include binary file warnings, permission errors, and inconsistent line formatting that confuse LLMs. The performance gap between ACI-optimized and raw CLI tools is +12.29% on SWE-bench (SWE-agent paper).
The file editing tool is the single most critical tool in any coding agent. It is invoked more than any other tool, and edit failures cascade into debugging cycles that consume context window budget. Four distinct patterns have emerged, each with different trade-offs.
SEARCH-REPLACE PATTERN:
Tool: Edit
Input:
file_path: "src/api.js"
old_string: "function getData() {"
new_string: "async function getData() {"
CONSTRAINT: old_string must be UNIQUE within the file.
If old_string appears 0 times → error: "String not found"
If old_string appears 2+ times → error: "String not unique, provide more context"
STRENGTHS:
+ Minimal token usage (only changed region in output)
+ Precise: no ambiguity about what changes
+ Human-readable: easy to review in logs
+ Naturally encourages small, focused edits
WEAKNESSES:
- Fails if target string is duplicated
- Agent must know exact current file content
- Cannot handle simultaneous edits to repeated patterns
- Requires Read before Edit pattern (extra round-trip)
ADOPTION: Claude Code, Qwen Code, Droid (FIND_AND_REPLACE),
OpenManus (StrReplaceEditor), Vibe CLI (search_replace)
LINE-RANGE PATTERN:
Tool: edit
Input:
file: "src/api.js"
start_line: 23
end_line: 25
replacement: "async function getData() {\n const response = await fetch(url);"
STRENGTHS:
+ Works even with duplicate strings
+ Clear scope of change
+ Compatible with linter validation
WEAKNESSES:
- Line numbers shift after edits (stale references)
- Agent must track line numbers accurately
- Multiple edits in same file require recalculation
- Less readable than search-replace in logs
ADOPTION: SWE-agent (primary pattern)
WHOLE-FILE PATTERN:
Tool: write_file
Input:
path: "src/api.js"
content: "... entire file content ..."
STRENGTHS:
+ Always works (no uniqueness/line number issues)
+ Simple implementation
+ Guaranteed consistent file state
WEAKNESSES:
- Extremely expensive (full file in LLM output tokens)
- High risk of merge conflicts
- Can accidentally drop content
- Difficult to review what changed
- Token cost scales with file size
ADOPTION: Simple agents, fallback strategy for others
NOTE: Claude Code's Write tool is for NEW files only;
Edit is required for modifying existing files
DIFF-BASED PATTERN:
Tool: diff_edit
Input:
file: "src/api.js"
diff: |
@@ -23,3 +23,4 @@
-function getData() {
+async function getData() {
+ const response = await fetch(url);
return data;
STRENGTHS:
+ Standard format (unified diff), widely understood
+ Reviewable by humans and CI tools
+ Can express multiple changes in one operation
+ Works well with git workflows
WEAKNESSES:
- LLMs frequently generate invalid diffs
- Off-by-one errors in line numbers
- Context lines must match exactly
- Requires more tokens than search-replace
ADOPTION: Aider (primary), Cline (secondary)
| Pattern | Token Cost | Reliability | Primary Agent | Best For |
|---|---|---|---|---|
| Search-Replace | Low | High (with uniqueness) | Claude Code | Small, focused edits |
| Line-Range | Low | Medium (line drift) | SWE-agent | Repeated patterns |
| Whole-File | High | High (always works) | Simple agents | New files, small files |
| Diff-Based | Medium | Medium (invalid diffs) | Aider | Multi-region changes |
Industry trend: Search-replace has become the dominant pattern in 2025-2026, adopted by 6 of 13 agents. Its combination of low token cost, high precision, and human readability makes it the preferred choice for production coding agents.
The following matrix maps all 13 agents against eight tool capability categories. A checkmark indicates native support; "MCP" indicates the capability is available via MCP extensions; a dash indicates the capability is absent or not documented.
| Agent | File Ops | Search | Exec | Git | Web | Memory | Planning | MCP |
|---|---|---|---|---|---|---|---|---|
| Aider | Diff/Whole | Repo map (tree-sitter) | Shell | Auto-commit | URL/Image | Session | Architect mode | — |
| Claude Code | Read/Write/Edit/Notebook | Grep/Glob/LSP | Bash (2-min timeout) | Safety protocol | WebFetch/WebSearch | CLAUDE.md + Memory | PlanMode/TodoWrite/Task | Full client |
| Cline | Read/Write/Diff | File search | Terminal (approval) | Shadow git checkpoint | Puppeteer (native) | Session | Plan/Act/Ask modes | Full client |
| Codex CLI | Read/Write/Patch | File search | Sandboxed (Seatbelt/Landlock) | Via shell | Disabled by default | JSONL recorder | Review mode | Server + Client |
| Droid | FIND_AND_REPLACE/V4A diff | HyperCode/ByteRank | Shell | Via shell | Via shell | Session | Multi-model planning | — |
| Goose | Via MCP | Via MCP | Local trust | Via MCP | Via MCP (Playwright) | Session | Via MCP | MCP-first (3000+) |
| Letta Code | File tools | Search tools | Shell | Via shell | Via shell | Persistent blocks + Archival DB | Skill learning | Client |
| OpenCode | Read/Write/Edit | Grep/Glob/LSP (semantic) | Shell | Via shell | Via MCP | Session + AGENTS.md | Agent configs | Full client |
| OpenManus | FileOperators/StrReplace | File search | Bash/PythonExecute | Via shell | Browser/WebSearch/Crawl4AI | Agent memory | PlanningTool/AskHuman | MCP bridge |
| Qwen Code | Read/ReadMany/Write/Edit | Grep/Glob | Shell (Docker option) | Via shell | web_fetch/web_search | save_memory | exit_plan_mode/todo_write/task | Client |
| Replit Agent | Python DSL file ops | Python DSL search | Python DSL exec | Auto-deploy | Python DSL web | Trajectory compression | Multi-tool code blocks | — |
| Vibe CLI | read/write/search_replace | grep | Bash (stateful) | Via shell | — | Session + history | todo/task subagent | — |
| Warp | File editing | grep/find/embeddings | Full Terminal Control (PTY) | Via terminal | Via terminal/MCP | Session | TODO generation | Client |
The tool layer is where coding agents win or lose. SWE-agent proved that ACI-optimized tools yield +12.29% over raw CLI. Droid proved that minimalist tool design with clear schemas can top benchmarks. Replit proved that Python DSL invocation can beat JSON function calling on success rate, cost, and speed. Claude Code proved that comprehensive tooling with careful constraints (uniqueness requirement, parallel execution, subagent isolation) scales to production.
Recommendations for tool system design:
Primary Sources: SWE-agent paper (ACI concept), Claude Code system prompt analysis, Codex CLI source, OpenManus source, Qwen Code source, Factory.ai technical blog (Droid), Warp engineering blog, Replit Agent technical documentation
Benchmarks: SWE-bench Verified, Terminal-Bench, GAIA
Part 1 of 6 · Coding Agent Engineering Analysis · January 2026