CLAUDE.md for dbt Projects

A CLAUDE.md for a dbt project is grown from real mistakes rather than written upfront from hypothetical scenarios. This note covers the concrete template, the categories that earn their place, and the anti-patterns that dilute the file.

The Reactive Approach

Start without a CLAUDE.md at all. Use Claude Code on your dbt project for a few sessions. When it makes a mistake — uses stg_ instead of base__, creates a model in the wrong folder, writes a column alias without AS, forgets to update schema.yml after changing a model — that’s when you add a line.

This matters for two reasons. First, every instruction earns its place by solving a real problem you encountered. Second, you avoid wasting instruction budget on things Claude would figure out by reading your codebase anyway. Claude can discover your folder structure from your files; it can’t discover your team’s specific naming convention without being told.

The # key in Claude Code surfaces the memory command mid-conversation. When Claude makes a mistake, fix it, then immediately press # and Claude will ask which CLAUDE.md file to update. This is the actual workflow: encounter a problem, fix it, encode it so it doesn’t happen again.

What to Include

A mature dbt CLAUDE.md covers four categories. Not more.

Commands with context

Not just the commands, but when to use them. Claude uses these to verify its work:

## Commands
- `dbt run --select -s model_name`: Build specific models only
- `dbt test --select model_name+`: Test model and everything downstream
- `dbt compile`: Validate SQL/Jinja before running (do this first)
- `dbt ls --select +model_name+`: Check upstream and downstream impact
- `sqlfluff lint models/`: Check style compliance

The annotation next to each command matters. “Do this first” before dbt compile tells Claude the sequencing, not just the command.

Naming conventions that matter

This is where Claude makes the most mistakes without guidance. The double-underscore naming convention and layer prefixes are the most common source of errors:

## Naming
- Base: `base__<source>__<entity>.sql` (double underscore, not single)
- Intermediate: `int_<entity>_<verb>.sql`
- Marts: `mrt__<domain>__<entity>.sql`
- Primary keys: `<object>__id` (customer__id, not id or customer_id)

Without this, Claude defaults to community conventions: stg_, dim_, fct_, single underscores. Fine for generic dbt projects. Wrong for yours.

Warehouse-specific gotchas

Rules that prevent real bugs rather than style preferences. For BigQuery (see Claude MD BigQuery Specifics for the complete section):

## BigQuery
- Use GoogleSQL syntax (not legacy SQL)
- Single quotes for strings, `!=` for inequality
- Always filter on partition column in WHERE clauses
- For large tables, use incremental with insert_overwrite

For Snowflake, the broken USING clause in joins is the canonical example — Claude will generate valid ANSI SQL that fails on Snowflake without this explicit instruction.

Workflow sequence

How Claude should approach changes — check downstream impact first, update YAML docs, run tests before committing:

## Workflow
1. Check impact: `dbt ls --select +model_name+`
2. Make changes in correct layer (base/intermediate/marts)
3. Update schema.yml with descriptions
4. Compile → run → test → lint

This shapes how Claude structures its work, not just what code it writes. The sequence matters: checking impact before making changes prevents the common error of modifying a model without realizing five marts depend on it.

A Complete Template

Here’s what a mature CLAUDE.md looks like after weeks of iteration on a dbt/BigQuery project. Every line exists because Claude made a specific mistake:

# Analytics Engineering Context

dbt project on BigQuery. Run `dbt compile` before suggesting any model changes.

## Commands
- `dbt run --select -s model_name`: Build specific models only
- `dbt test --select model_name+`: Test model and everything downstream
- `dbt compile`: Validate SQL/Jinja before running (do this first)
- `sqlfluff lint models/`: Check style compliance

## Naming
- Base: `base__<source>__<entity>.sql` (double underscore)
- Intermediate: `int_<entity>_<verb>.sql`
- Marts: `mrt__`
- Primary keys: `<object>__id` (customer__id, not id)

## SQL patterns
- Trailing commas, 4-space indent
- Always use `AS` for aliases
- CTE imports at top: `WITH model_name AS (SELECT * FROM {{ ref('...') }})`

## Testing
- Primary keys need unique + not_null
- Add comments explaining why each test exists

## Workflow
1. Check impact: `dbt ls --select +model_name+`
2. Make changes in correct layer (base/intermediate/marts)
3. Update schema.yml with descriptions
4. Compile → run → test → lint

The file is about 40 lines. Short CLAUDE.md files are more consistently followed than long ones.

What Doesn’t Belong

The most common mistake is adding instructions that seem useful upfront but dilute the file:

Comprehensive SQL style guides. SQLFluff handles formatting deterministically. Instructions like “use trailing commas” or “4-space indent” compete for attention with instructions Claude actually needs to make non-obvious decisions. If your project runs SQLFluff in CI, you can drop most style instructions from CLAUDE.md entirely. Let hooks run the formatter automatically after every edit.

Full database schemas. Too much context, changes too often. Let Claude discover schema through queries or the dbt manifest. If you’re using the dbt MCP server, manifest discovery is automatic.

Generic dbt documentation. Claude already knows how ref() and source() work. How to write a Jinja macro. What schema.yml is for. Including this wastes instruction budget on knowledge Claude already has.

Auto-generated content from /init. The /init command produces a CLAUDE.md by scanning your project, but the output is mostly generic — things Claude would discover on its own by reading your code. A blank file that grows one line at a time serves you better than a generated file full of information Claude already knows.

Task-specific instructions. If an instruction only applies to one type of work you do occasionally, handle it in-conversation when that work comes up. Persistent memory should be for things that are universally relevant, session after session.

Commit your CLAUDE.md to version control. This is the simplest way to give everyone on the team the same baseline behavior from Claude Code — the same naming conventions, the same workflow sequence, the same guardrails.

For personal preferences that shouldn’t be shared (different formatting preferences, local file paths, personal workflows), use ~/.claude/CLAUDE.md at your home directory. That file loads for all your projects but never touches the repo.

For temporary personal overrides in a shared project, CLAUDE.local.md at the project root is gitignored by default. It loads alongside the committed CLAUDE.md but stays out of version control.

Where This Fits in the Broader Setup

CLAUDE.md handles guidance — Claude should follow it, but it’s probabilistic. For rules that must never be violated (blocking edits to production schemas, always running SQLFluff after every edit), use Claude Code Hooks instead. For repeatable multi-step workflows (generating tests, documenting models), use Claude Code Slash Commands for dbt.

The three layers complement each other: CLAUDE.md provides context and conventions, hooks enforce hard boundaries, slash commands automate workflows. A mature Claude Code setup for dbt uses all three.