Adrienne Vermorel
How to set up CLAUDE.md for your dbt project (and actually make it useful)
If you’ve been using Claude Code for analytics engineering work, you’ve probably noticed it sometimes… guesses. Wrong naming conventions. Legacy SQL syntax in BigQuery. Forgetting that your team uses double underscores in model names.
The fix is a file called CLAUDE.md, and once I understood how it actually works, it changed how I use Claude Code for dbt projects entirely.
Here’s everything I’ve learned about making it work for analytics engineering.
What CLAUDE.md actually is
Think of it as persistent memory for Claude Code. Every time you start a session, Claude automatically reads this file and treats it as context for everything it does afterward. No need to re-explain your conventions, your folder structure, or that one BigQuery quirk that trips everyone up.
The file lives at the root of your project (or in a .claude/ folder), and Claude discovers it automatically. You can also have a personal one at ~/.claude/CLAUDE.md for preferences that apply to all your projects.
Here’s the thing that is important to know: Claude Code loads these files hierarchically. It starts from your home folder, works through parent directories, and ends at your current working directory. In a monorepo, this means you can have org-wide conventions at the root and project-specific ones in subdirectories.
The “less is more” principle
This is the most important thing I’ve learned: shorter CLAUDE.md files work better.
It seems that Claude can reliably follow around 150-200 instructions before quality starts degrading. Your CLAUDE.md content competes for attention with everything else in the context window. If you dump your entire SQL style guide in there, the important stuff gets diluted.
Community consensus hovers around 300 lines max, with many people recommending under 60 lines for the core file. My approach now is to include only the conventions that:
- Frequently cause errors when Claude guesses wrong
- Are specific to my project (not generic best practices)
- Can’t be handled by linters like SQLFluff
Everything else? I point to external docs with a simple reference: “See docs/testing-standards.md for the full testing guide.” Claude can pull those in when needed.
What actually belongs in there
After some trial and error, here’s what I’ve found most valuable for dbt projects:
Common commands with context. Not just dbt run, but dbt run --select $model_name with a note about when to use it. Claude uses these to verify its work.
Naming conventions that matter. The base__<source>__<entity> pattern with double underscores. The int__ and mrt__ prefixes. Primary key naming. These are exactly the things Claude will guess wrong without guidance.
Warehouse-specific gotchas. BigQuery’s GoogleSQL vs legacy SQL. Snowflake’s broken USING clause in joins. Partition filter requirements. These prevent real bugs.
Your workflow. How you want Claude to approach changes—check downstream impact first, update YAML docs, run tests before committing. This shapes how Claude structures its work.
What I’ve stopped including: comprehensive SQL formatting rules (SQLFluff handles that), database schemas (too much context), anything auto-generated without review.
A template that actually works
Here’s what I use as a starting point for dbt projects:
# Analytics Engineering Context
dbt project on BigQuery. Run `dbt compile` before suggesting any model changes.
## Commands- `dbt run --select -s model_name`: Build specific models only- `dbt test --select model_name+`: Test model and everything downstream- `dbt compile`: Validate SQL/Jinja before running (do this first)- `sqlfluff lint models/`: Check style compliance
## Naming- Base: `base__<source>__<entity>.sql` (double underscore)- Intermediate: `int_<entity>_<verb>.sql`- Marts: `mrt__`- Primary keys: `<object>__id` (customer__id, not id)
## SQL patterns- Trailing commas, 4-space indent- Always use `AS` for aliases- CTE imports at top: `WITH model_name AS (SELECT * FROM {{ ref('...') }})`
## Testing- Primary keys need unique + not_null- Add comments explaining why each test exists
## Workflow1. Check impact: `dbt ls --select +model_name+`2. Make changes in correct layer (base/intermediate/marts)3. Update schema.yml with descriptions4. Compile → run → test → lintNotice it’s about 40 lines. Everything important, nothing extra.
BigQuery-specific additions
If you’re on BigQuery, these additions prevent the most common issues:
## BigQuery specifics- Use GoogleSQL syntax (not legacy SQL)- Single quotes for strings, `!=` for inequality- Always filter on partition column in WHERE clauses- Avoid SELECT * in production models- For large tables, use incremental with insert_overwrite
## Partitioning template{{ config( materialized='table', partition_by={"field": "created_at", "data_type": "timestamp", "granularity": "day"}, require_partition_filter=true, cluster_by=["customer__id", "region"])}}The partition filter reminder is crucial. Claude will happily write queries that scan entire tables if you don’t remind it that partition pruning only works with literal values in WHERE clauses.
Just use the CLI
You might see mentions of MCP servers for dbt integration—there’s a dbt-mcp server that gives Claude direct access to your manifest and metadata. Honestly? It’s probably overkill for most workflows.
The dbt or bq CLI works great with Claude Code out of the box. If your CLAUDE.md includes the right commands, Claude will just… use them. It’ll run dbt ls --select +model_name+ to check dependencies, dbt compile to validate SQL, dbt test to verify changes, bq ls to list objects in BigQuery. No extra setup required.
The key is making sure Claude knows when to use each command. That’s why the workflow section matters:
## Workflow1. Check impact: `dbt ls --select +model_name+`2. Make changes in correct layer3. Update schema.yml4. Compile → run → test → lintClaude follows this sequence, runs the commands, reads the output, and adjusts. It’s surprisingly effective without any fancy integrations.
If you’re working with a massive project where manifest parsing would save significant time, MCP might be worth exploring. But for most dbt projects, the CLI does the job.
Team sharing
Commit your CLAUDE.md to version control. This way everyone on the team gets the same baseline behavior from Claude Code.
For personal preferences that shouldn’t be shared (maybe you like different formatting, or you have local paths), use the ~/.claude/CLAUDE.md file at your home directory level.
You can also create custom slash commands in .claude/commands/ for common workflows. Something like /debug-test-failure that walks through your team’s standard debugging process.
The iteration mindset
Here’s my honest take: your first CLAUDE.md won’t be perfect instantly.
The approach that works is to start minimal, then add instructions when you notice Claude making the same mistake twice. If Claude keeps using legacy SQL syntax, add a line about GoogleSQL. If it forgets to update schema.yml, add that to the workflow section.
The /init command can generate a starting point by analyzing your project, but treat it as a draft. It catches obvious patterns but misses the workflow nuances that make the real difference.
You can also add instructions on-the-fly during a session using the # key—Claude will ask which memory file should store it.
What I’ve stopped doing
A few anti-patterns I’ve moved away from:
Comprehensive style guides. SQLFluff handles formatting better than instructions ever will. Let deterministic tools do deterministic work.
Database schemas. Too much context, changes too often. Let Claude discover schema through queries or MCP connections.
Repeating dbt documentation. Claude already knows how ref() and source() work. Focus on your project’s specific patterns.
Task-specific instructions. If it only applies to one type of work, it probably doesn’t belong in the persistent memory file.
The bottom line
CLAUDE.md is most powerful when it encodes the tribal knowledge that would otherwise require constant re-explanation—the conventions, the gotchas, the “we do it this way because” stuff that makes your project yours.
Keep it short, keep it specific, and iterate based on where Claude actually makes mistakes. That’s the whole strategy.
What conventions have you found most valuable to include in your CLAUDE.md? I’m always looking to improve my setup.