Part 1: Tools & Agent Computer Interface

Coding Agent Engineering Analysis
Deep-Dive Series · January 2026 · 13 Agents Analyzed

Section Overview

This section provides a comprehensive analysis of the tool systems that power modern AI coding agents. We examine tool inventories across all 13 agents, covering file operations, search and navigation, command execution, web access, memory systems, and planning capabilities. The analysis reveals that tool design is not merely an implementation detail but a primary determinant of agent performance: SWE-agent's research demonstrated that well-designed Agent Computer Interfaces (ACI) improved SWE-bench pass rates by +12.29% over standard bash-only approaches.

Key themes include the convergence toward search-replace edit patterns (adopted by Claude Code, Qwen Code, and Droid), the emergence of MCP as a universal tool extension protocol, and Replit's innovative Python DSL approach to tool invocation that achieves ~90% success rates while reducing costs by 15%. We also cover ACI design principles, edit tool pattern trade-offs, and a comprehensive tool comparison matrix spanning all 13 agents.

ELI5: What Are Agent Tools?

Think of a coding agent as a skilled developer who can only interact with the world through a specific set of buttons on a control panel. Each "tool" is one of those buttons: one reads files, one writes code, one runs terminal commands, one searches for patterns. The quality and design of those buttons determine how effective the developer is. A poorly labeled button causes mistakes. A button that does too many things at once is confusing. The best agents have carefully designed control panels where every button does exactly one thing, gives clear feedback, and prevents accidents. That's what Agent Computer Interface (ACI) design is all about: building the perfect control panel for AI.

1. Universal Tool Categories

All 13 analyzed agents implement variations of core tool categories. While naming conventions and implementation details vary, the fundamental capabilities have converged around six primary categories. The table below maps each category to its common implementations and the critical design decisions that differentiate agents.

Category	Common Tools	Implementation Notes
File Operations	Read, Write, Edit, Patch, Create, NotebookEdit	Edit tools vary significantly: search-replace (Claude Code, Qwen Code), line-range (SWE-agent), whole-file replacement (simple agents), diff-based (Aider), and FIND_AND_REPLACE + V4A diff (Droid). Search-replace is now the dominant pattern due to token efficiency and precision.
Search & Navigation	Grep, Glob, Find, LSP, ReadManyFiles	ripgrep integration is near-universal. LSP (OpenCode) provides semantic search with go-to-definition and find-references. Droid's HyperCode/ByteRank provides codebase retrieval optimized for LLM context. Warp uses codebase embeddings for semantic search.
Execution	Bash, Shell, Command, PythonExecute	Sandboxing varies widely: OS-native (Codex CLI via Seatbelt/Landlock), Docker containers (Qwen Code), cloud sandboxes (Warp via Namespace), permission-based approval (Claude Code, Cline). Background process support differs by agent.
Version Control	Git status, diff, commit, checkout	Auto-commit policies differ: Aider commits after every change with descriptive messages, Cline uses shadow git for checkpoints. Claude Code follows a strict git safety protocol (never force push, never skip hooks).
Web & Browser	WebFetch, WebSearch, BrowserUseTool, Puppeteer, Crawl4AI	Browser automation via MCP (Playwright) or native integration (Cline's Puppeteer). OpenManus includes Crawl4AI for structured web scraping. Codex CLI disables network by default to prevent prompt injection via malicious URLs.
Context & Memory	TodoWrite, Memory, Skill, save_memory	Task tracking (TodoWrite) is common across Claude Code, Qwen Code, and others. Letta Code has persistent memory blocks (persona, human, project, skills). Qwen Code includes save_memory for cross-session knowledge. Most agents rely on session-only context.
Planning & Orchestration	PlanningTool, EnterPlanMode, ExitPlanMode, Task, AskHuman	Claude Code and OpenManus include explicit planning tools. Claude Code's Task tool spawns isolated subagents with limited context. OpenManus uses PlanningFlow for multi-step orchestration. Replit's trajectory compression preserves planning decisions.
Extension & Integration	MCP bridge, SlashCommand, Skill	MCP is the universal standard with 3,000+ servers (Linux Foundation governance). Every major agent supports it. Goose is MCP-first architecture. OpenManus bridges MCP tools into its agent hierarchy.

Key Finding: Tool Count Does Not Correlate with Performance

Droid, which tops Terminal-Bench at 58.8%, uses a deliberately minimalist tool design. Factory.ai found that complex tool schemas "exponentially increase error rates" for LLMs. Conversely, Claude Code's 18 tools and OpenManus's 15+ tools achieve their performance through careful ACI design rather than sheer tool count. The critical factor is how well tool interfaces are optimized for LLM comprehension, not how many tools are available.

2. Agent-Specific Tool Implementations

Each agent makes distinctive architectural choices in its tool system. Below we examine the tool implementations of eight agents in detail, revealing how design philosophy shapes capability.

2.1 Claude Code: 18 Built-in Tools

Claude Code provides the most extensively documented tool system, with 18 built-in tools organized into seven categories. The system prompt (comprising 110+ parts) includes detailed usage instructions for each tool, ensuring the LLM understands constraints like "old_string must be unique in file" for the Edit tool.

CLAUDE CODE TOOL INVENTORY (from system prompt analysis):
├── File Tools
│   ├── Read          - Read file contents with line range support
│   ├── Write         - Create new files (atomic operation)
│   ├── Edit          - String replacement (must be unique in file)
│   ├── NotebookEdit  - Jupyter notebook cell editing
│   └── Glob          - File pattern matching (sorted by modification time)
├── Search Tools
│   ├── Grep          - Content search via ripgrep (regex support)
│   └── LSP           - Language Server Protocol queries
├── Execution Tools
│   ├── Bash          - Shell command execution (2-min default timeout)
│   └── Computer      - Chrome browser automation
├── Planning Tools
│   ├── EnterPlanMode - Switch to planning mode (no edits)
│   ├── ExitPlanMode  - Present plan for user approval
│   └── TodoWrite     - Task list management (persistent across compaction)
├── Agent Tools
│   ├── Task          - Spawn subagent with isolated context window
│   └── SlashCommand  - Execute registered skill commands
├── Web Tools
│   ├── WebFetch      - Retrieve URL contents (markdown conversion)
│   └── WebSearch     - Search engine queries
└── Utility Tools
    ├── Skill         - Load skill .md files from .claude/skills/
    └── Memory        - Edit persistent CLAUDE.md memory blocks

Notable Design Decisions

Edit Tool Constraint: The old_string must be unique within the file. If it appears multiple times, the tool fails and the agent must provide more surrounding context. This prevents ambiguous edits at the cost of occasional retries.
Parallel Tool Calls: Sonnet 4 and later models execute multiple independent tools simultaneously, significantly reducing turn count for tasks like "read these 5 files."
Subagent Isolation: Task subagents get their own context window but limited tool access, preventing runaway operations while enabling parallel work.
Context Awareness: The system prompt warns against git add -A (may include sensitive files) and grep in bash (prefer the built-in Grep tool for proper permissions).

2.2 Codex CLI: Rust-Native Tool Architecture

Codex CLI (OpenAI) is built entirely in Rust, with a crate-based architecture that separates core logic, sandbox implementation, and execution policy. The tool system is tightly coupled with OS-level sandboxing, making security a first-class concern at the tool layer.

CODEX CLI CRATE STRUCTURE:
codex-rs/
├── core/                    # Business logic (reusable library)
│   ├── ThreadManager        - Conversation state management
│   ├── ModelClient          - API communication (OpenAI)
│   └── ToolOrchestrator     - Sandboxed tool execution
├── execpolicy/              # Execution policy engine
│   ├── on-request           - Ask before every action
│   ├── workspace-write      - Auto-approve within workspace
│   └── danger-full-access   - No restrictions (CI/VM only)
├── linux-sandbox/           # Landlock + seccomp (separate binary)
│   ├── Landlock rules       - Filesystem access control
│   └── seccomp filters      - Syscall-level network blocking
├── windows-sandbox-rs/      # Windows restricted tokens
├── mcp-server/              # MCP protocol support
├── file-search/             # Repository search capabilities
├── keyring-store/           # Secure credential storage
└── otel/                    # OpenTelemetry observability

TOOL EXECUTION FLOW:
User Request → Policy Check → Sandbox Setup → Execute → Record → Return
                  │                │
                  │                └── Seatbelt (macOS) / Landlock (Linux)
                  └── on-request / workspace-write / danger-full-access

SESSION RECORDING (RolloutRecorder):
- Every tool call recorded as JSONL
- Enables audit replay and debugging
- OpenTelemetry integration for enterprise logging

2.3 SWE-agent: Agent Computer Interface (ACI) Pioneer

SWE-agent introduced the foundational concept of Agent Computer Interface (ACI) -- designing tools specifically for LLM ergonomics rather than human ergonomics. Their custom commands replace standard Unix utilities with LLM-friendly alternatives.

SWE-AGENT ACI TOOL DESIGN:
┌─────────────────────────────────────────────────────────────────┐
│  DESIGN PRINCIPLES:                                             │
│  1. Simple, parseable output formats                            │
│  2. Guardrails to prevent/correct errors                        │
│  3. Compact, efficient actions                                  │
│  4. Context management for conciseness                          │
└─────────────────────────────────────────────────────────────────┘

CUSTOM COMMANDS (replacing standard Unix tools):
├── find_file      - Locate files by name (vs. find -name)
├── search_file    - Search within a single file
├── search_dir     - Recursive directory search (vs. grep -r)
├── open           - View file in scrollable window
├── goto           - Jump to specific line number
├── scroll_up/down - Navigate file view (windowed display)
├── create         - Create new file with validation
├── edit           - Line-range replacement with linter check
└── submit         - Generate patch output for evaluation

SPECIAL FEATURES:
- Linter integration:      Blocks syntactically invalid edits BEFORE applying
- Observation collapsing:  Past tool outputs → single-line summaries
- Error recovery:          Automatic retry on malformed LLM output
- Window-based viewing:    Shows 100-line window (not entire file)

IMPACT ON PERFORMANCE:
- ACI-optimized tools: 12.29% improvement over bash-only baselines
- Linter guardrail alone prevents ~30% of syntax errors from persisting

2.4 OpenManus: 15+ Tools with 4-Level Hierarchy

OpenManus (MetaGPT, 53.9k GitHub stars) implements the richest tool ecosystem through a 4-level agent hierarchy. Each level inherits and extends capabilities from its parent, enabling both simple tool calls and complex multi-step orchestration.

OPENMANUS 4-LEVEL AGENT HIERARCHY:
═══════════════════════════════════════════════════════════════

Level 1: BaseAgent
    │   Foundation: Memory management, state tracking, LLM interface
    │
    └── Level 2: ReActAgent
        │   Adds: Reasoning-Action loop (think → act → observe → repeat)
        │
        └── Level 3: ToolCallAgent
            │   Adds: Tool registration, execution, result parsing
            │
            └── Level 4: Manus (Domain Agent)
                    Adds: Domain-specific tools, PlanningFlow, browser control

TOOL INVENTORY (15+ tools):
├── Code Execution
│   ├── PythonExecute    - Run Python code in sandboxed environment
│   └── Bash             - Shell command execution
├── Web & Browser
│   ├── BrowserUseTool   - Full browser automation (Playwright-based)
│   ├── WebSearch        - Search engine queries
│   └── Crawl4AI         - Structured web scraping and extraction
├── File Operations
│   ├── FileOperators    - Read, write, list, copy, move, delete
│   └── StrReplaceEditor - Search-and-replace editing (Claude Code pattern)
├── Planning
│   └── PlanningTool     - Multi-step plan creation and tracking
├── System
│   ├── ComputerUseTool  - Desktop GUI automation
│   ├── AskHuman         - Request human input/clarification
│   └── Terminate        - End agent execution
└── Integration
    └── MCP Bridge       - Connect to any MCP server as tool source

PLANNINGFLOW ORCHESTRATION:
┌──────────────────────────────────────────┐
│  1. Decompose task into sub-goals        │
│  2. Assign tools to each sub-goal        │
│  3. Execute with ReAct loop per step     │
│  4. Verify results against plan          │
│  5. Adjust plan if needed (re-plan)      │
└──────────────────────────────────────────┘

2.5 Qwen Code: 13 Tools (Gemini CLI Fork)

Qwen Code (Alibaba, 17.9k stars) is forked from Gemini CLI and built in TypeScript. It provides free access to Qwen3-Coder-480B via OAuth (2,000 requests/day) and implements a tool set deliberately modeled after Claude Code's design, including identical tool names.

QWEN CODE TOOL INVENTORY (13 tools):
├── File Operations
│   ├── ReadFile         - Read file with line range support
│   ├── ReadManyFiles    - Batch file reading (reduces turn count)
│   ├── WriteFile        - Create or overwrite files
│   └── Edit             - Search-replace editing (Claude Code pattern)
├── Search
│   ├── Glob             - File pattern matching
│   └── Grep             - Content search via ripgrep
├── Execution
│   └── run_shell_command - Shell execution with Docker sandbox option
├── Web
│   ├── web_fetch        - URL content retrieval
│   └── web_search       - Search engine integration
├── Memory & Planning
│   ├── save_memory       - Persist knowledge across sessions
│   ├── todo_write        - Task list management
│   └── exit_plan_mode    - Transition from planning to execution
└── Agent
    └── task              - Subagent delegation

ARCHITECTURE NOTES:
- Language:    TypeScript (forked from Gemini CLI codebase)
- Sandbox:    Docker container option for command execution
- Free Tier:  2,000 requests/day via OAuth authentication
- Model:      Qwen3-Coder-480B (67-69.6% SWE-bench Verified)
- Context:    Supports both Qwen and third-party models via BYOK

DESIGN PHILOSOPHY:
- Mirror Claude Code's tool naming for developer familiarity
- ReadManyFiles tool reduces round-trips for multi-file tasks
- save_memory enables cross-session knowledge persistence
- Docker sandbox provides isolation without OS-level complexity

2.6 Droid: Minimalist Design Philosophy

Droid (Factory.ai) takes the opposite approach to tool proliferation. Factory's research found that complex tool schemas "exponentially increase error rates" for LLMs. Droid achieves Terminal-Bench #1 (58.8%) with a deliberately minimal tool surface.

DROID TOOL DESIGN PHILOSOPHY:
═══════════════════════════════════════════════════════════════

PRINCIPLE: "Fewer tools, better schemas, higher success rates"

┌─────────────────────────────────────────────────────────────┐
│  Complex schemas with many parameters → MORE LLM errors    │
│  Simple schemas with clear semantics  → FEWER LLM errors   │
│                                                             │
│  Factory.ai finding: Error rates increase EXPONENTIALLY     │
│  with tool schema complexity                                │
└─────────────────────────────────────────────────────────────┘

CORE EDITING TOOLS:
├── FIND_AND_REPLACE    - Search-replace with clear semantics
└── V4A Diff            - Structured diff format for larger changes

CODEBASE RETRIEVAL (proprietary):
├── HyperCode           - Semantic code search engine
│   └── Optimized for finding relevant code across large repos
└── ByteRank            - Code ranking algorithm
    └── Prioritizes most relevant files for LLM context

MULTI-MODEL COMPOSITION:
┌───────────────┐     ┌───────────────┐
│  Planning     │────▶│  Execution    │
│  (Reasoning)  │     │  (Coding)     │
│  o3 / Opus    │     │  Sonnet / GPT │
└───────────────┘     └───────────────┘

BENCHMARK RESULTS:
- Terminal-Bench:     58.8% (#1 overall)
- SWE-bench Lite:     31.67%
- Top 3 performance across 3 different underlying models
- Stopped running SWE-bench Verified (Python-only limitation)

Design Lesson from Droid

Droid's success with minimal tooling challenges the assumption that more tools equals better performance. The lesson: invest in tool schema clarity over tool quantity. Every additional parameter in a tool schema is a potential point of failure. When designing agent tools, start with the minimum viable set and add tools only when measurable performance gains justify the added complexity.

2.7 Warp: Full Terminal Control

Warp (75.8% SWE-bench Verified with GPT-5) is an Agent Development Environment (ADE) that reimagines the terminal as a first-class AI development surface. Its distinguishing feature is Full Terminal Control -- the ability to drive interactive terminal sessions through a PTY (pseudo-terminal).

WARP TOOL CATEGORIES:
═══════════════════════════════════════════════════════════════

File Editing:
├── File read/write operations
└── Code modification tools

Code Understanding:
├── grep              - Content search
├── find              - File discovery
├── cat               - File viewing
└── Codebase embeddings (semantic search across entire repo)

Command Execution:
└── Full Terminal Control (FTC)
    ├── Interactive PTY driver
    │   └── Can handle: prompts, confirmations, pagination
    ├── Reads terminal output in real-time
    ├── Sends keystrokes (including Ctrl-C, arrow keys)
    └── Manages interactive tools (vim, less, ssh, etc.)

Planning & Tracking:
└── TODO generation and task management

Integration:
├── MCP client support
└── Codebase embeddings for semantic retrieval

FULL TERMINAL CONTROL (FTC) - UNIQUE DIFFERENTIATOR:
┌─────────────────────────────────────────────────────────────┐
│  Traditional Agent:                                         │
│    agent → exec("npm install") → wait → read stdout         │
│    FAILS on: interactive prompts, sudo, ssh, vim            │
│                                                             │
│  Warp FTC:                                                  │
│    agent → PTY.spawn("ssh server") →                        │
│      ← "Password: "                                         │
│    agent → PTY.write(password + "\n") →                     │
│      ← "server$ "                                           │
│    agent → PTY.write("tail -f /var/log/app.log\n") →        │
│      ← (streaming log output)                               │
│    agent → PTY.write("\x03") → (Ctrl-C to stop)            │
│                                                             │
│  This enables Warp to handle ANY terminal workflow,         │
│  including interactive installers, debuggers, and REPLs     │
└─────────────────────────────────────────────────────────────┘

CLOUD SANDBOX:
- Powered by Namespace
- Isolated execution environment
- Prevents local machine damage
- GPU-rendered Rust UI for performance

ARCHITECTURAL FINDING:
"Single-agent with focused tools outperforms multi-agent approaches"
- Warp tested multi-agent setups and found degraded performance
- Better to give one agent the right tools than split across agents

2.8 Replit Agent: Python DSL Tool Invocation

Replit Agent introduces the most innovative tool invocation mechanism: instead of JSON function calling, the agent generates Python code that calls tool functions. This approach achieves ~90% tool invocation success rate and enables multi-tool calling within a single generation.

REPLIT TOOL INVOCATION: PYTHON DSL
═══════════════════════════════════════════════════════════════

TRADITIONAL (JSON Function Calling):
{
  "name": "edit_file",
  "arguments": {
    "path": "src/app.py",
    "old_text": "def hello():",
    "new_text": "def hello(name):"
  }
}
→ One tool per generation
→ Rigid schema validation
→ No inter-tool dependencies

REPLIT (Python DSL Code Generation):
```python
# Agent generates executable Python
file_content = read_file("src/app.py")
if "def hello():" in file_content:
    edit_file("src/app.py",
              old_text="def hello():",
              new_text="def hello(name):")
    run_command("python -m pytest tests/")
```
→ Multiple tools in single generation
→ Conditional logic between tools
→ Natural programming patterns

PERFORMANCE IMPACT:
┌─────────────────────────────────────────┐
│  Success Rate:    ~90% tool invocation  │
│  Cost Savings:    15% (fewer turns)     │
│  Speed:           30% faster execution  │
│  Autonomy:        200-minute sessions   │
│  Scale Proof:     135 apps in 24 hours  │
│                   (Rokt partnership)     │
└─────────────────────────────────────────┘

MULTI-TOOL CALLING BENEFITS:
- Read → Check → Edit → Test in ONE generation
- Eliminates round-trip latency between tool calls
- Agent can express conditional logic naturally
- Reduces total tokens consumed (fewer system prompts)

TRAJECTORY COMPRESSION:
- Long sessions compressed to key decision points
- Preserves: file changes, test results, errors
- Discards: redundant reads, intermediate states
- Enables 200-minute autonomous sessions

Innovation Spotlight: Code-as-Tool-Call

Replit's Python DSL approach suggests a fundamental rethinking of tool invocation. Rather than constraining LLMs to rigid JSON schemas, letting them express tool usage in a programming language they already understand (Python) leverages their strongest capability. The 15% cost savings and 30% speed improvement come from reducing round-trips: instead of call-wait-call-wait, the agent plans multiple tool calls in a single code block. This pattern may become industry-standard as agents handle increasingly complex workflows.

3. Agent Computer Interface (ACI) Design Principles

The concept of Agent Computer Interface was introduced by the SWE-agent paper, arguing that tools should be designed for LLM ergonomics rather than human ergonomics. Just as Human-Computer Interaction (HCI) shaped GUI design, ACI shapes how AI agents interact with development environments. The core insight: LLMs process text differently than humans, so optimal tool interfaces diverge from traditional CLI design.

3.1 The Four ACI Principles

Principle 1: Simple, Parseable Outputs

Format tool output so LLMs can easily extract key information. Avoid ambiguous formatting, excessive decoration, or interleaved content. Numbered results, count-first summaries, and clear delimiters outperform raw command output.

Principle 2: Guardrails to Prevent Errors

Build validation directly into tools. SWE-agent's linter integration blocks syntactically invalid edits before they are applied, preventing ~30% of syntax errors from persisting. Claude Code's uniqueness constraint on the Edit tool forces the agent to provide sufficient context, preventing ambiguous replacements.

Principle 3: Compact, Efficient Actions

Consolidate multi-step operations into single tools. Qwen Code's ReadManyFiles tool reads multiple files in one call instead of requiring N sequential Read calls. Claude Code's parallel tool execution achieves the same effect at the model level.

Principle 4: Context Management for Conciseness

Collapse old observations to single-line summaries. Keep recent context detailed, older context summarized. SWE-agent's observation collapsing converts previous tool outputs to summaries, preserving context window budget for the current task.

3.2 ACI vs Traditional CLI: A Concrete Comparison

TRADITIONAL BASH (human-optimized):
─────────────────────────────────────────────────────────────────
$ grep -r "TODO" --include="*.js" src/
src/utils.js:5:// TODO: Refactor this function
src/api.js:23:// TODO: Add error handling
src/api.js:45:// TODO: Implement caching
src/auth.js:12:// TODO: Add rate limiting
src/db.js:67:// TODO: Connection pooling
... (output continues for 50+ lines with no structure)

PROBLEMS FOR LLMs:
- No count or summary upfront
- No way to know if output is truncated
- Inconsistent formatting across tools
- No suggested next action
- Raw paths require agent to parse mentally

SWE-AGENT ACI (LLM-optimized):
─────────────────────────────────────────────────────────────────
> search_dir "TODO" src/ --extensions js
Found 5 results in 4 files:
[1] src/utils.js:5    - "TODO: Refactor this function"
[2] src/api.js:23     - "TODO: Add error handling"
[3] src/api.js:45     - "TODO: Implement caching"
[4] src/auth.js:12    - "TODO: Add rate limiting"
[5] src/db.js:67      - "TODO: Connection pooling"

To view context around a result, use: open <file> <line>

KEY ACI IMPROVEMENTS:
─────────────────────────────────────────────────────────────────
Feature            │ Traditional CLI    │ ACI-Optimized
───────────────────┼────────────────────┼──────────────────
Result count       │ Not shown          │ "Found 5 results"
Numbering          │ None               │ [1], [2], [3]...
Next step guidance │ None               │ "use: open <file>"
Truncation         │ Silent             │ Explicit notice
Output format      │ Varies by tool     │ Consistent schema
Error messages     │ Cryptic stderr     │ Actionable guidance

Why Traditional CLI Fails for Agents

When Claude Code's system prompt says "NEVER use grep in Bash -- use the built-in Grep tool instead," this is an ACI principle in action. The built-in Grep tool provides structured output, proper permissions handling, and consistent formatting. Raw grep output can include binary file warnings, permission errors, and inconsistent line formatting that confuse LLMs. The performance gap between ACI-optimized and raw CLI tools is +12.29% on SWE-bench (SWE-agent paper).

4. Edit Tool Design Patterns

The file editing tool is the single most critical tool in any coding agent. It is invoked more than any other tool, and edit failures cascade into debugging cycles that consume context window budget. Four distinct patterns have emerged, each with different trade-offs.

Pattern 1: Search-Replace (Claude Code, Qwen Code, Droid)

SEARCH-REPLACE PATTERN:
Tool: Edit
Input:
  file_path: "src/api.js"
  old_string: "function getData() {"
  new_string: "async function getData() {"

CONSTRAINT: old_string must be UNIQUE within the file.
If old_string appears 0 times → error: "String not found"
If old_string appears 2+ times → error: "String not unique, provide more context"

STRENGTHS:
  + Minimal token usage (only changed region in output)
  + Precise: no ambiguity about what changes
  + Human-readable: easy to review in logs
  + Naturally encourages small, focused edits

WEAKNESSES:
  - Fails if target string is duplicated
  - Agent must know exact current file content
  - Cannot handle simultaneous edits to repeated patterns
  - Requires Read before Edit pattern (extra round-trip)

ADOPTION: Claude Code, Qwen Code, Droid (FIND_AND_REPLACE),
          OpenManus (StrReplaceEditor), Vibe CLI (search_replace)

Pattern 2: Line-Range Replacement (SWE-agent)

LINE-RANGE PATTERN:
Tool: edit
Input:
  file: "src/api.js"
  start_line: 23
  end_line: 25
  replacement: "async function getData() {\n  const response = await fetch(url);"

STRENGTHS:
  + Works even with duplicate strings
  + Clear scope of change
  + Compatible with linter validation

WEAKNESSES:
  - Line numbers shift after edits (stale references)
  - Agent must track line numbers accurately
  - Multiple edits in same file require recalculation
  - Less readable than search-replace in logs

ADOPTION: SWE-agent (primary pattern)

Pattern 3: Whole-File Replacement

WHOLE-FILE PATTERN:
Tool: write_file
Input:
  path: "src/api.js"
  content: "... entire file content ..."

STRENGTHS:
  + Always works (no uniqueness/line number issues)
  + Simple implementation
  + Guaranteed consistent file state

WEAKNESSES:
  - Extremely expensive (full file in LLM output tokens)
  - High risk of merge conflicts
  - Can accidentally drop content
  - Difficult to review what changed
  - Token cost scales with file size

ADOPTION: Simple agents, fallback strategy for others
NOTE: Claude Code's Write tool is for NEW files only;
      Edit is required for modifying existing files

Pattern 4: Diff-Based (Aider)

DIFF-BASED PATTERN:
Tool: diff_edit
Input:
  file: "src/api.js"
  diff: |
    @@ -23,3 +23,4 @@
    -function getData() {
    +async function getData() {
    +  const response = await fetch(url);
       return data;

STRENGTHS:
  + Standard format (unified diff), widely understood
  + Reviewable by humans and CI tools
  + Can express multiple changes in one operation
  + Works well with git workflows

WEAKNESSES:
  - LLMs frequently generate invalid diffs
  - Off-by-one errors in line numbers
  - Context lines must match exactly
  - Requires more tokens than search-replace

ADOPTION: Aider (primary), Cline (secondary)

Edit Pattern Comparison Summary

Pattern	Token Cost	Reliability	Primary Agent	Best For
Search-Replace	Low	High (with uniqueness)	Claude Code	Small, focused edits
Line-Range	Low	Medium (line drift)	SWE-agent	Repeated patterns
Whole-File	High	High (always works)	Simple agents	New files, small files
Diff-Based	Medium	Medium (invalid diffs)	Aider	Multi-region changes

Industry trend: Search-replace has become the dominant pattern in 2025-2026, adopted by 6 of 13 agents. Its combination of low token cost, high precision, and human readability makes it the preferred choice for production coding agents.

5. Comprehensive Tool Comparison Matrix

The following matrix maps all 13 agents against eight tool capability categories. A checkmark indicates native support; "MCP" indicates the capability is available via MCP extensions; a dash indicates the capability is absent or not documented.

Agent	File Ops	Search	Exec	Git	Web	Memory	Planning	MCP
Aider	Diff/Whole	Repo map (tree-sitter)	Shell	Auto-commit	URL/Image	Session	Architect mode	—
Claude Code	Read/Write/Edit/Notebook	Grep/Glob/LSP	Bash (2-min timeout)	Safety protocol	WebFetch/WebSearch	CLAUDE.md + Memory	PlanMode/TodoWrite/Task	Full client
Cline	Read/Write/Diff	File search	Terminal (approval)	Shadow git checkpoint	Puppeteer (native)	Session	Plan/Act/Ask modes	Full client
Codex CLI	Read/Write/Patch	File search	Sandboxed (Seatbelt/Landlock)	Via shell	Disabled by default	JSONL recorder	Review mode	Server + Client
Droid	FIND_AND_REPLACE/V4A diff	HyperCode/ByteRank	Shell	Via shell	Via shell	Session	Multi-model planning	—
Goose	Via MCP	Via MCP	Local trust	Via MCP	Via MCP (Playwright)	Session	Via MCP	MCP-first (3000+)
Letta Code	File tools	Search tools	Shell	Via shell	Via shell	Persistent blocks + Archival DB	Skill learning	Client
OpenCode	Read/Write/Edit	Grep/Glob/LSP (semantic)	Shell	Via shell	Via MCP	Session + AGENTS.md	Agent configs	Full client
OpenManus	FileOperators/StrReplace	File search	Bash/PythonExecute	Via shell	Browser/WebSearch/Crawl4AI	Agent memory	PlanningTool/AskHuman	MCP bridge
Qwen Code	Read/ReadMany/Write/Edit	Grep/Glob	Shell (Docker option)	Via shell	web_fetch/web_search	save_memory	exit_plan_mode/todo_write/task	Client
Replit Agent	Python DSL file ops	Python DSL search	Python DSL exec	Auto-deploy	Python DSL web	Trajectory compression	Multi-tool code blocks	—
Vibe CLI	read/write/search_replace	grep	Bash (stateful)	Via shell	—	Session + history	todo/task subagent	—
Warp	File editing	grep/find/embeddings	Full Terminal Control (PTY)	Via terminal	Via terminal/MCP	Session	TODO generation	Client

Matrix Analysis: Key Takeaways

File Operations are universal but edit strategies diverge: search-replace dominates (6 agents), with diff-based (2), whole-file (2), and DSL-based (1) as alternatives.
Search capabilities range widely: from basic grep (Vibe CLI) to LSP-powered semantic search (OpenCode) to proprietary codebase retrieval (Droid's HyperCode/ByteRank) to embedding-based search (Warp).
Execution sandboxing is the biggest differentiator: Codex CLI (OS-native), Warp (cloud), Qwen Code (Docker), vs. approval-based (most others). Warp's PTY-based Full Terminal Control is unique.
Memory is the least mature category: Only Letta Code has truly persistent memory. Most agents lose all context between sessions. Qwen Code's save_memory and Claude Code's CLAUDE.md are partial solutions.
MCP adoption is near-universal: 9 of 13 agents support MCP, with Goose built entirely on MCP. The remaining agents (Aider, Droid, Replit, Vibe CLI) may add support as the ecosystem matures.
Planning tools are increasingly explicit: Rather than relying on the LLM to plan implicitly, agents like Claude Code (EnterPlanMode), OpenManus (PlanningTool), and Qwen Code (exit_plan_mode) provide dedicated planning tools.

Conclusion: Tool Design as Competitive Advantage

The tool layer is where coding agents win or lose. SWE-agent proved that ACI-optimized tools yield +12.29% over raw CLI. Droid proved that minimalist tool design with clear schemas can top benchmarks. Replit proved that Python DSL invocation can beat JSON function calling on success rate, cost, and speed. Claude Code proved that comprehensive tooling with careful constraints (uniqueness requirement, parallel execution, subagent isolation) scales to production.

Recommendations for tool system design:

Start with search-replace editing. It is the most token-efficient and reliable pattern, adopted by the majority of top-performing agents.
Design for LLM ergonomics, not human ergonomics. Structured output with counts, numbered results, and next-step guidance outperforms raw CLI output.
Build guardrails into tools. Linter checks before edit application, uniqueness constraints, and clear error messages prevent error cascades.
Support MCP for extensibility. The ecosystem has 3,000+ servers under Linux Foundation governance. MCP is the extension model.
Minimize tool schema complexity. Every parameter is a potential point of failure. Droid's success with minimal tooling is not accidental.
Consider code-as-tool-call. Replit's Python DSL pattern reduces round-trips and cost. As models improve at code generation, this pattern may become dominant.

Sources & References

Primary Sources: SWE-agent paper (ACI concept), Claude Code system prompt analysis, Codex CLI source, OpenManus source, Qwen Code source, Factory.ai technical blog (Droid), Warp engineering blog, Replit Agent technical documentation

Benchmarks: SWE-bench Verified, Terminal-Bench, GAIA

Part 1 of 6 · Coding Agent Engineering Analysis · January 2026

← Hub Part 1 of 6 Next: Hooks, MCP & Security →