CLI vs MCP for AI Agents

Whether AI agents should use CLIs or MCP to interact with external services is a question with published benchmarks and no universal answer. The tradeoffs depend on specific conditions — tool familiarity, host environment, token budget, and security requirements.

What the Benchmarks Show

Mario Zechner ran 120 evaluation runs across standardized tasks comparing CLI and MCP approaches. Both achieved 100% success rates. MCP was 23% faster and 2.5% cheaper.

The speed advantage, though, didn’t come from inherent protocol superiority. It came primarily from bypassing Claude Code’s malicious command detection on bash invocations — a security check that adds latency to CLI commands. That latency is intentional, not overhead to be optimized away.

A separate browser automation benchmark found CLI scored 33% better on token efficiency and completed tasks that MCP structurally couldn’t. Field-masked CLI queries consumed 43x fewer tokens than equivalent MCP snapshots.

The divergent results indicate that the right choice depends on the specific tool and task.

Where CLI Wins

Token efficiency. CLIs have no ambient schema cost. With MCP, every connected server loads its tool definitions into the model’s context window before the conversation starts. A moderate MCP configuration — say, 5 servers with 10-80 tools each — can consume tens of thousands of tokens before a single user message. This overhead is real and cumulative. CLI tools, by contrast, only consume tokens when actively used.

Training data familiarity. LLMs have trained extensively on shell commands, documentation, and Stack Overflow discussions for tools like bq, gcloud, git, and curl. They know these interfaces fluently. Generating a bq query command is pattern-matching against a massive training corpus. Generating structured JSON for an MCP tool call relies on more limited fine-tuning data. The training data asymmetry is the explanation for why the Code Generation over Tool Calling Pattern tends to outperform tool calling for well-established tools.

Unix composability. CLI commands can be piped, chained, and composed in single expressions. An agent can write a shell pipeline that queries gws drive files list, filters with jq, extracts IDs, and passes them to a second gws command. MCP tool calls can’t compose this way — each call is a separate round trip through the model’s context.

Zero implementation overhead. No server to build, deploy, or maintain. The tool ships; you use it.

Where MCP Wins

Typed schemas and validation. MCP tool definitions use JSON Schema, which provides validation before a request hits the API. Malformed inputs get caught early. CLI tools generally don’t validate input structure — the error arrives from the API after the round trip, which is slower and less informative.

Client universality. Claude Desktop, VS Code Copilot, Cursor, and IDE extensions often lack shell access. MCP works in all of them. CLI commands require a bash-capable environment, which not all agent hosts provide.

Stateful operations. MCP connections are sessions that can hold auth tokens, connection pools, and other state across multiple tool calls. CLI commands are inherently stateless — each invocation starts fresh.

Security auditing. MCP provides fine-grained permissions and audit trails at the protocol level. You can specify exactly which tools a client can call, log every invocation, and revoke access per-tool. Shell command permissions are coarser — you can allow or deny Bash(gws *), but not Bash(gws drive files list) vs. Bash(gws drive files delete).

The Same Tool, Both Surfaces

The emerging answer to “CLI or MCP?” is increasingly “both, from the same binary.” The gws CLI exemplifies this:

# Standard CLI: agent writes shell commands
gws drive files list --params '{"pageSize": 10, "fields": "files(id,name)"}'

# MCP server: same underlying API, structured tools
gws mcp -s drive,gmail,calendar

When a tool offers both surfaces, you can let the agent choose based on context. For quick exploration and composable pipelines, CLI. For operations where typed validation matters or where the agent host lacks shell access, MCP. The choice happens at the task level, not the tool level.

{
  "mcpServers": {
    "gws": {
      "command": "gws",
      "args": ["mcp", "-s", "drive,gmail,calendar"]
    }
  },
  "permissions": {
    "allow": [
      "Bash(gws *)"
    ]
  }
}

With this setup, the agent has both paths available and can choose. For a simple “list my Drive files” request, it will likely write a shell command. For a complex multi-step workflow involving mutations where dry-run validation matters, it may prefer the MCP tools.

The Practical Decision Framework

Use CLI when:

The tool has deep training data representation (bq, gcloud, git, standard Unix tools, gws)
You need Unix composability — piping output from one command to another
You’re operating in an environment where shell access is available
Token efficiency matters more than typed validation

Use MCP when:

The agent host lacks shell access (Claude Desktop, IDE extensions)
You need fine-grained security auditing and per-tool permission control
The tool’s API doesn’t have training data representation (custom internal APIs, less-known SaaS)
Stateful operations require persistent connections across calls

Use both when:

The tool ships both surfaces (like gws)
Your workflow mixes exploratory queries (CLI-friendly) with mutating operations (MCP-friendly)
You want to let the agent choose the most efficient path per task

As Zechner’s summary puts it: “The protocol is just plumbing. What matters is whether your tool helps or hinders the agent’s ability to complete tasks.” For tools like gws, both protocols serve different parts of the task space.