ServicesAboutNotesContact Get in touch →
EN FR
Note

Custom Parameterized MCP Queries

Using the MCP Toolbox's tools.yaml to define constrained, parameterized queries that give AI assistants structured access to data without arbitrary SQL.

Planted
mcpbigqueryaidata engineering

The most useful feature of the MCP Toolbox for Databases isn’t the prebuilt tools. It’s the ability to define exactly which queries an AI assistant can run, with which parameters, via a tools.yaml configuration file. This gives you the benefits of AI-assisted data exploration without the risks of arbitrary SQL execution.

The Problem with Arbitrary SQL

When you give an AI assistant unrestricted execute_sql access, you accept several risks:

  • Cost: a curious AI asking broad questions about large tables can rack up charges. BigQuery bills by data scanned, and a single SELECT * on a large table can cost more than your entire monthly budget for ad-hoc queries. See AI Query Cost Control for BigQuery MCP for the full cost picture.
  • Safety: execute_sql can technically run INSERT, UPDATE, DELETE, and DDL statements. The AI could modify or drop data.
  • Scope creep: the AI might explore tables it shouldn’t access, even if it has permission. Sensitive data that’s technically readable might not be appropriate for AI exploration.

Custom parameterized queries address all three by constraining what the AI can ask and how it asks.

The tools.yaml Configuration

The Toolbox reads tool definitions from a YAML file. Each tool specifies a query template with named parameters that the AI fills in at runtime.

sources:
my-bigquery-source:
kind: bigquery
project: your-project-id
tools:
search_orders_by_date:
kind: bigquery-sql
source: my-bigquery-source
description: Search orders within a date range
statement: |
SELECT
order__id,
order__customer_id,
order__total_amount_usd,
order__created_at
FROM `your-project.ecommerce.mrt__sales__orders`
WHERE order__created_at BETWEEN @start_date AND @end_date
ORDER BY order__created_at DESC
LIMIT 100
parameters:
- name: start_date
type: string
description: Start date (YYYY-MM-DD)
- name: end_date
type: string
description: End date (YYYY-MM-DD)
get_customer_metrics:
kind: bigquery-sql
source: my-bigquery-source
description: Get aggregated metrics for a customer
statement: |
SELECT
customer__id,
customer__orders,
customer__lifetime_value_usd,
customer__last_ordered_at
FROM `your-project.ecommerce.mrt__sales__customers`
WHERE customer__id = @customer_id
parameters:
- name: customer_id
type: string
description: The customer ID to look up

Run the Toolbox with your custom config:

Terminal window
./toolbox --tools tools.yaml --stdio

Why This Pattern Matters

The tool descriptions are what the AI reads to decide when and how to use each tool. When a user asks “show me orders from last week,” the AI sees search_orders_by_date with its description and parameters, constructs the right date range, and calls the tool. It never writes SQL — it fills in parameters for SQL you wrote.

This means:

You control the query shape. The LIMIT 100 clause is baked in. The AI can’t accidentally scan entire tables or return unbounded result sets. The columns returned are the ones you chose, not SELECT *.

You control table access. The AI can only query tables referenced in your tool definitions. If mrt__sales__orders and mrt__sales__customers are the only tables in tools.yaml, those are the only tables the AI touches — regardless of what BigQuery permissions the service account has.

You encode business logic. The query definitions capture your team’s understanding of the data. Column names, table references, join patterns, filter conventions — all encoded once, used every time the AI needs that data.

You get a safety net for cost. Every query has a known shape with known scan patterns. You can estimate maximum cost per tool call and set expectations accordingly.

Designing Good Tool Definitions

Patterns that work well:

Name tools by what users would ask. search_orders_by_date is better than query_orders_table. The AI uses the name and description to match user intent to the right tool.

Always include LIMIT clauses. Even if you trust the data size today, tables grow. A LIMIT protects against future cost surprises.

Expose mart-layer tables, not raw data. Point tools at your mart models where data is clean, aggregated, and documented — not at raw source tables where column names are cryptic and data quality is uncertain.

Write descriptions for an AI reader. The description field is the AI’s only guide for when to use the tool. Be specific: “Get aggregated metrics for a customer” is better than “Customer data query.” Include what the tool returns, not just what it does.

Keep parameter types simple. Strings for dates, IDs, and names. Avoid complex types. The AI generates parameter values from natural language, so the simpler the interface, the fewer errors.

For Teams

A shared tools.yaml committed to your repository means the entire team gets the same AI-accessible query interface. New team members clone the repo, set up the Toolbox with claude mcp add, and have immediate access to the same curated set of queries.

This is particularly valuable for encoding institutional knowledge. “How do we calculate lifetime value?” is answered by the query in get_customer_metrics. “What’s the right table for order data?” is answered by the table reference in search_orders_by_date. The tools.yaml becomes a form of executable documentation.

For multi-database setups (BigQuery plus Cloud SQL, or BigQuery plus Spanner), you define multiple sources in the same file. The AI gets a unified interface across all your databases without knowing or caring about the infrastructure differences.