CLAUDE.md BigQuery Specifics

BigQuery has dialect-specific behavior and failure modes that warrant a dedicated section in CLAUDE.md. These are not style preferences — they are rules that, when violated, produce silent bugs, expensive full-table scans, or query failures. Each instruction exists because Claude will default to the wrong behavior without explicit guidance.

The Core BigQuery Section

Add this block to your dbt CLAUDE.md when running on BigQuery:

## BigQuery specifics
- Use GoogleSQL syntax (not legacy SQL)
- Single quotes for strings, `!=` for inequality
- Always filter on partition column in WHERE clauses
- Avoid SELECT * in production models
- For large tables, use incremental with insert_overwrite

Each line prevents a specific class of mistake.

GoogleSQL vs. Legacy SQL

BigQuery supports two SQL dialects: GoogleSQL (formerly Standard SQL) and Legacy SQL. Legacy SQL is the older dialect from the early BigQuery era. Most documentation uses GoogleSQL, most new features exist only in GoogleSQL, and the two dialects have real syntax differences that produce errors when mixed.

Claude has seen enough BigQuery content that it generally uses GoogleSQL by default — but not always. If it has encountered Legacy SQL patterns in training data or gets confused during a long context window, it can slip. The explicit instruction prevents this.

The most common Legacy SQL patterns to watch for: table decorators (table@timestamp), comma-separated FROM clauses instead of JOIN, GROUP EACH BY, and FLATTEN. If you see any of these in generated SQL, add an explicit reminder to your CLAUDE.md.

The Partition Filter Rule

This is the one that costs money. Partition pruning only works when the WHERE clause contains a literal value filtering on the partition column. If Claude writes a query that scans an entire partitioned table — because it forgot to include a partition filter, or because it used a subquery or function in the filter — the query runs correctly but costs orders of magnitude more than necessary.

The instruction “always filter on partition column in WHERE clauses” is deliberately absolute. You want Claude treating this as a hard constraint rather than a suggestion.

A subtle version of this mistake: wrapping the partition column in a function defeats pruning even when the intent is to filter by partition. Claude sometimes does this when working with timestamp partitions:

-- This does NOT prune, even though it looks like it should
WHERE DATE(created_at) = '2025-01-01'

-- This DOES prune (if created_at is the partition column)
WHERE created_at >= '2025-01-01' AND created_at < '2025-01-02'

If your team has a standard pattern for partition filtering, include it explicitly. The generic reminder helps; a concrete example helps more.

The Incremental Model Config Block

For large tables on BigQuery, insert_overwrite is typically the right incremental strategy. Claude knows about incremental models, but without a template it might choose a different strategy or miss BigQuery-specific config properties like require_partition_filter.

Include the config block template directly in CLAUDE.md:

## Partitioning template
{{ config(
  materialized='incremental',
  incremental_strategy='insert_overwrite',
  partition_by={"field": "created_at", "data_type": "timestamp", "granularity": "day"},
  require_partition_filter=true,
  cluster_by=["customer__id", "region"]
)}}

This template encodes several decisions at once:

insert_overwrite instead of merge (better for event data, avoids expensive full-table scans on the destination side)
require_partition_filter=true (forces all queries on this table to include a partition filter — enforces the cost guardrail at the table level, not just the query level)
cluster_by with the primary key and a common filter column (prevents expensive cluster-miss scans within partitions)

The require_partition_filter property in particular is worth including in the template because it’s easy to forget and has meaningful downstream impact: any query on this table without a partition filter will fail, which forces good habits on everyone querying the model, not just on the model itself.

String Quoting and Comparison Operators

Two small syntax differences that produce errors in GoogleSQL when Claude uses the wrong form:

String literals: GoogleSQL uses single quotes for strings. Legacy SQL used double quotes. If Claude generates WHERE event_name = "page_view", this will work in some contexts but fail in others (double quotes are reserved for identifiers in standard SQL). Include single quotes for strings explicitly.

Inequality: GoogleSQL uses !=. Legacy SQL used <>. Both work in BigQuery, but != is the standard form for GoogleSQL. This is a style preference more than a correctness issue, but consistency matters when SQLFluff lints your models — specify which form to use.

SELECT * in Production Models

SELECT * in a production model means schema changes in upstream tables propagate silently downstream. A new column added to a source appears in all marts without testing. A column removed from a source breaks models without a clear error message.

The instruction belongs in CLAUDE.md rather than just in documentation because Claude will use SELECT * as a shortcut when exploring or when the prompt doesn’t explicitly ask about column selection. Making it explicit prevents this in model generation.

The exception: SELECT * EXCEPT (column_name) is useful in base models for removing staging artifacts. That’s a different pattern and doesn’t need the same guardrail.

When to Expand This Section

These instructions cover the most common BigQuery-specific mistakes. Add more instructions only when Claude makes a specific mistake that isn’t covered.

Common expansions teams add over time:

Array unnesting patterns (UNNEST() syntax for repeated fields from GA4 or Pub/Sub exports)
SAFE_DIVIDE() instead of bare division (prevents ZeroDivisionError at query time)
TIMESTAMP_TRUNC() vs DATE_TRUNC() distinctions
Project-specific dataset reference patterns (backtick quoting for cross-project references)

Each of these is worth adding only after Claude actually makes the mistake. Speculative instructions consume instruction budget without preventing real problems.

Relationship to Warehouse-Level Controls

CLAUDE.md instructions are guidance for Claude — they shape generated SQL, but they don’t enforce anything at the infrastructure level. For true enforcement, complement your CLAUDE.md with:

maximum_bytes_billed in profiles.yml — hard cap on bytes scanned per query
require_partition_filter=true on tables in your base layer — forces partition filters at the query level regardless of who writes the query
Project-level BigQuery quotas — daily caps on bytes processed

Instructions tell Claude what to do. Infrastructure controls enforce it regardless of what Claude does.