How to set up CLAUDE.md for your dbt project (and actually make it useful)

If you’ve been using Claude Code for analytics engineering work, you’ve probably noticed it sometimes… guesses. Wrong naming conventions. Legacy SQL syntax in BigQuery. Forgetting that your team uses double underscores in model names.

There’s a file called CLAUDE.md that helps with this — it gives Claude persistent memory about your project. But the key insight isn’t how to create one. It’s knowing when you actually need one, and what belongs in it when you do.

Here’s everything I’ve learned about making it work for analytics engineering.

What CLAUDE.md actually is

Think of it as persistent memory for Claude Code. Every time you start a session, Claude automatically reads this file and treats it as context for everything it does afterward. No need to re-explain your conventions, your folder structure, or that one BigQuery quirk that trips everyone up.

The file lives at the root of your project (or in a .claude/ folder), and Claude discovers it automatically. You can also have a personal one at ~/.claude/CLAUDE.md for preferences that apply to all your projects.

Here’s the thing that is important to know: Claude Code loads these files hierarchically. It starts from your home folder, works through parent directories, and ends at your current working directory. In a monorepo, this means you can have org-wide conventions at the root and project-specific ones in subdirectories.

The “less is more” principle

This is the most important thing I’ve learned: shorter CLAUDE.md files work better.

It seems that Claude can reliably follow around 150-200 instructions before quality starts degrading. Your CLAUDE.md content competes for attention with everything else in the context window. If you dump your entire SQL style guide in there, the important stuff gets diluted.

Community consensus hovers around 300 lines max, with many people recommending under 60 lines for the core file. My approach now is to include only the conventions that:

Frequently cause errors when Claude guesses wrong
Are specific to my project (not generic best practices)
Can’t be handled by linters like SQLFluff

Everything else? I point to external docs with a simple reference: “See docs/testing-standards.md for the full testing guide.” Claude can pull those in when needed.

What actually belongs in there

After some trial and error, here’s what I’ve found most valuable for dbt projects:

Common commands with context. Not just dbt run, but dbt run --select $model_name with a note about when to use it. Claude uses these to verify its work.

Naming conventions that matter. The base__<source>__<entity> pattern with double underscores. The int__ and mrt__ prefixes. Primary key naming. These are exactly the things Claude will guess wrong without guidance. (For more on how these layers fit together, see the dbt project structure guide.)

Warehouse-specific gotchas. BigQuery’s GoogleSQL vs legacy SQL. Snowflake’s broken USING clause in joins. Partition filter requirements. These prevent real bugs.

Your workflow. How you want Claude to approach changes—check downstream impact first, update YAML docs, run tests before committing. This shapes how Claude structures its work.

What I’ve stopped including: comprehensive SQL formatting rules (SQLFluff handles that), database schemas (too much context), anything auto-generated without review.

A template that actually works

Here’s what a mature CLAUDE.md looks like after weeks of iteration. Every line in this file was added because Claude made a specific mistake — not because it seemed like a good idea upfront:

# Analytics Engineering Context

dbt project on BigQuery. Run `dbt compile` before suggesting any model changes.

## Commands
- `dbt run --select -s model_name`: Build specific models only
- `dbt test --select model_name+`: Test model and everything downstream
- `dbt compile`: Validate SQL/Jinja before running (do this first)
- `sqlfluff lint models/`: Check style compliance

## Naming
- Base: `base__<source>__<entity>.sql` (double underscore)
- Intermediate: `int_<entity>_<verb>.sql`
- Marts: `mrt__`
- Primary keys: `<object>__id` (customer__id, not id)

## SQL patterns
- Trailing commas, 4-space indent
- Always use `AS` for aliases
- CTE imports at top: `WITH model_name AS (SELECT * FROM {{ ref('...') }})`

## Testing
- Primary keys need unique + not_null
- Add comments explaining why each test exists

## Workflow
1. Check impact: `dbt ls --select +model_name+`
2. Make changes in correct layer (base/intermediate/marts)
3. Update schema.yml with descriptions
4. Compile → run → test → lint

Notice it’s about 40 lines. Everything important, nothing extra.

BigQuery-specific additions

If you’re on BigQuery, these additions prevent the most common issues:

## BigQuery specifics
- Use GoogleSQL syntax (not legacy SQL)
- Single quotes for strings, `!=` for inequality
- Always filter on partition column in WHERE clauses
- Avoid SELECT * in production models
- For large tables, use incremental with insert_overwrite

## Partitioning template
{{ config(
  materialized='table',
  partition_by={"field": "created_at", "data_type": "timestamp", "granularity": "day"},
  require_partition_filter=true,
  cluster_by=["customer__id", "region"]
)}}

The partition filter reminder is crucial. Claude will happily write queries that scan entire tables if you don’t remind it that partition pruning only works with literal values in WHERE clauses.

Just use the CLI

You might see mentions of MCP servers for dbt integration—there’s a dbt MCP server that gives Claude direct access to your manifest and metadata. Honestly? It’s probably overkill for most workflows.

The dbt or bq CLI works great with Claude Code out of the box. If your CLAUDE.md includes the right commands, Claude will just… use them. It’ll run dbt ls --select +model_name+ to check dependencies, dbt compile to validate SQL, dbt test to verify changes, bq ls to list objects in BigQuery. No extra setup required.

The key is making sure Claude knows when to use each command. That’s why the workflow section matters:

## Workflow
1. Check impact: `dbt ls --select +model_name+`
2. Make changes in correct layer
3. Update schema.yml
4. Compile → run → test → lint

Claude follows this sequence, runs the commands, reads the output, and adjusts. It’s surprisingly effective without any fancy integrations.

If you’re working with a massive project where manifest parsing would save significant time, MCP might be worth exploring. But for most dbt projects, the CLI does the job.

Commit your CLAUDE.md to version control. This way everyone on the team gets the same baseline behavior from Claude Code.

For personal preferences that shouldn’t be shared (maybe you like different formatting, or you have local paths), use the ~/.claude/CLAUDE.md file at your home directory level.

You can also create custom slash commands in .claude/commands/ for common workflows. Something like /debug-test-failure that walks through your team’s standard debugging process. For automation that runs without manual invocation, hooks fire automatically at key moments.

The iteration mindset

Here’s my honest take, and my position on this has evolved: I used to recommend creating a CLAUDE.md as a first step. Now I’d say: start without one.

Use Claude Code on your project with no CLAUDE.md at all. When it makes a mistake — uses legacy SQL syntax, forgets your naming convention, skips schema.yml updates — that’s when you add a line. Each instruction should trace back to a real problem you encountered, not a hypothetical one.

I’d also advise against using /init to auto-generate the file. The content it produces largely duplicates what Claude discovers on its own by reading your codebase, and it adds token cost without meaningfully improving results. A blank file that you grow one line at a time will serve you better than a generated one full of information Claude already knows.

You can add instructions on-the-fly during a session using the # key — Claude will ask which memory file should store it. This is the best workflow: encounter a problem, fix it, add the instruction so it doesn’t happen again.

What I’ve stopped doing

A few anti-patterns I’ve moved away from:

Auto-generating the whole file with /init. It produces generic content that Claude would discover on its own by reading your project. You end up with a file full of obvious information competing for context window space with the instructions that actually matter.

Comprehensive style guides. SQLFluff handles formatting better than instructions ever will. Let deterministic tools do deterministic work.

Database schemas. Too much context, changes too often. Let Claude discover schema through queries or MCP connections.

Repeating dbt documentation. Claude already knows how ref() and source() work. Focus on your project’s specific patterns.

Task-specific instructions. If it only applies to one type of work, it probably doesn’t belong in the persistent memory file.

The bottom line

The best CLAUDE.md files aren’t written — they’re grown. Each line should exist because you hit a real problem, not because it seemed like good documentation.

CLAUDE.md is most powerful when it encodes the tribal knowledge that would otherwise require constant re-explanation — the conventions, the gotchas, the “we do it this way because” stuff that makes your project yours.

Keep it short, keep it specific, and let real mistakes guide what goes in. That’s the whole strategy.

What conventions have you found most valuable to include in your CLAUDE.md? I’m always looking to improve my setup.