MCP Context Window Overhead

Every MCP server you connect to Claude Code or Claude Desktop loads its tool definitions into the model’s context window before the conversation begins. These definitions describe each tool’s name, description, parameters, and expected output. The model needs them to know what tools are available and how to call them. But they consume tokens that could otherwise hold conversation history, code context, or reasoning.

This is the MCP token tax: a fixed upfront cost paid before a single user message, plus per-operation costs that accumulate through a session.

Measured Overhead

Anthropic’s Internal Testing

Anthropic’s own measurements found that a moderate MCP configuration — 58 tools across 5 MCP servers — consumed approximately 55,000 tokens before the conversation started. That’s just tool definitions, not conversation content.

In more aggressive configurations, tool definitions alone consumed 134,000 tokens. On Claude’s 200K context window, that’s 67% of available space gone before you ask your first question. The model has roughly 66,000 tokens left for system prompts, conversation history, file contents, and reasoning.

Real-World Claude Code Measurements

A practitioner documented their Claude Code setup with 7 MCP servers and broke down the token allocation:

Component	Tokens	% of 200K Context
System prompt	2,700	1.4%
Built-in tools	14,400	7.2%
MCP tools	67,300	33.7%
Total before conversation	84,400	42.2%

Even after cutting to 3 essential servers, the overhead was still 42,600 tokens (21.3% of context). For data engineering workflows that require long sessions — exploring schemas, profiling data, iterating on transformations, debugging pipelines — losing a fifth or more of your context window to tool definitions is a meaningful constraint.

BigQuery MCP Toolbox Specifically

The BigQuery MCP Toolbox’s full tool set runs approximately 2,000-5,000 tokens for tool definitions. This is modest compared to multi-server configurations but still significant when you consider that a bq query command occupies 15-30 tokens. The MCP overhead for a single tool’s definition often exceeds the cost of executing a CLI command that accomplishes the same thing.

The Per-Operation Cost

Beyond the upfront definitions, each MCP tool call has its own token cost:

MCP per-operation cost: ~150-250 tokens per tool call. This includes the structured JSON request, the tool’s response (which passes through the context window), and the model’s processing of that response.

CLI per-operation cost: ~40-60 tokens per command. The command itself is compact, and the output streams directly without structured JSON wrapping.

For a typical schema exploration session — listing tables, checking schemas for 4 tables, running profiling queries — the comparison looks like this:

Approach	Tool Definition Overhead	Per-Operation Cost	Total (4 tables + profiling)
CLI	0 tokens	~40-60 tokens/command	~300-400 tokens
MCP	2,000-5,000 tokens	~150-250 tokens/tool call	~3,000-6,500 tokens

For straightforward schema exploration, CLI wins on token efficiency by an order of magnitude. The gap narrows for complex multi-step workflows where MCP’s structured responses simplify downstream processing, but it never disappears entirely.

Why This Matters

Context Window as Scarce Resource

The context window is the model’s working memory. Everything the model knows about the current conversation, the codebase, the problem, and available tools must fit within this window.

When MCP tool definitions consume 20-40% of available context, the model has proportionally less space for:

Conversation history. Longer sessions mean earlier messages get dropped sooner. Instructions you gave at the start of the conversation may fall out of context.
Code context. Less room to hold file contents, which means the model may need to re-read files more often or work with less surrounding code.
Reasoning. The model’s chain of thought competes for the same token budget. More overhead means less room for complex reasoning about your query or task.

Compound Effect in Long Sessions

Data engineering sessions tend to be long. You might explore a dataset, write a transformation, test it, iterate on the logic, then document the results — all in one conversation. Over a 30-minute session with dozens of operations, the per-operation overhead accumulates:

20 CLI operations: ~800-1,200 total tokens
20 MCP operations: ~3,000-5,000 total tokens (plus 2,000-5,000 upfront)

The 5x-10x difference means the MCP approach consumes a meaningful portion of the context window on tool interactions alone. In practice, this manifests as the model “forgetting” earlier parts of the conversation sooner, or needing to be reminded of decisions already made.

Cost Implications

For Claude Code on usage-based pricing, tokens consumed by MCP tool definitions and tool interactions are billed the same as any other input tokens. The overhead isn’t free in dollar terms either. This compounds with the BigQuery cost of the queries themselves — MCP doesn’t just cost more tokens, it can also trigger more queries if the model needs additional context that fell out of the window.

Mitigation Strategies

Minimize connected servers. Only connect MCP servers you’ll actively use in the current session. Each server’s tool definitions are loaded regardless of whether you use them. Three servers with 5 tools each is 15 tool definitions competing for context; if you only need one server, disconnect the others.

Prefer CLI for simple operations. Use CLI for quick exploration (listing tables, checking schemas, running ad-hoc queries) and reserve MCP for operations that genuinely benefit from structured responses or security controls.

Use hybrid configurations. Claude Code supports both MCP and CLI access simultaneously. Configure MCP for tools where structured responses or audit trails matter, and allow CLI commands for everything else:

{
  "mcpServers": {
    "bigquery": {
      "command": "npx",
      "args": ["-y", "@toolbox-sdk/server", "--prebuilt", "bigquery", "--stdio"]
    }
  },
  "permissions": {
    "allow": [
      "Bash(bq *)",
      "Bash(gcloud *)"
    ]
  }
}

Start new conversations for new tasks. Rather than running one long session that accumulates MCP overhead, start fresh conversations for distinct tasks. Each new conversation resets the context window (though you pay the upfront tool definition cost again).

Watch for context exhaustion symptoms. If the model starts forgetting earlier instructions, repeating questions it already asked, or losing track of the conversation’s goals, context window pressure from MCP overhead may be a contributing factor. Reducing connected servers or switching to CLI for the remainder of the session can help.