dbt Documentation with Claude Code

This note describes a pattern for dbt documentation using Claude Code: combining dbt’s existing codegen tooling with AI to produce descriptions that reflect what the SQL does. The approach covers YAML scaffolding, docs blocks for reuse, lineage diagrams, and a slash command for repeatable documentation runs.

The Two-Step Pattern: Codegen + Claude Code

The approach has two phases: generate the YAML skeleton with dbt codegen, then fill in descriptions with Claude Code.

Phase 1: Generate the Scaffold

The dbt codegen package produces schema.yml structure with all column names pulled from the model:

dbt run-operation generate_model_yaml --args '{"model_names": ["base__ga4__events"], "upstream_descriptions": true}'

This gives you the YAML structure — model name, all column names, and any upstream descriptions that already exist. No descriptions for new columns, no tests, no model-level documentation. Just the skeleton.

The upstream_descriptions: true flag is important — it pulls in descriptions from source definitions or upstream models, so you don’t re-describe columns that already have documentation.

Phase 2: Fill with Claude Code

Now point Claude at the scaffold and the SQL:

Read the base__ga4__events.sql model and the generated schema.yml scaffold.
For each column:
1. Analyze the SQL transformation logic
2. Infer the business meaning
3. Write a clear, concise description
4. Add appropriate tests

Follow these standards:
- Use present tense
- Include data type in description
- Note any known limitations

Claude reads the SQL and writes descriptions that reflect the actual transformation. The output requires review — Claude can misinterpret complex business logic — but reviewing AI-generated descriptions is faster than writing from scratch.

Why Two Steps Instead of One?

You might wonder why not just ask Claude to generate the entire schema.yml directly. The codegen step adds value because it ensures every column is accounted for. Claude reading SQL might miss columns or hallucinate column names that don’t exist. Codegen pulls the actual column list from the compiled model, giving Claude a reliable template to fill in.

Docs Blocks: Same Column, Same Description, Everywhere

The same columns appear across multiple models. customer_id shows up in ten places. event_timestamp in twenty. Without coordination, each schema.yml entry gets a slightly different description, written by a different person (or AI session), in a different style.

dbt docs blocks solve this by creating reusable documentation that’s referenced rather than duplicated:

{% docs customer_id %}
Unique identifier for a customer account.
Format: UUID generated at account creation.
Source: Stripe customer API.
{% enddocs %}

{% docs event_timestamp %}
Timestamp when the event was recorded in UTC.
Microsecond precision from GA4 BigQuery export.
{% enddocs %}

Save this in a models/docs.md file (or multiple docs files organized by domain). Then reference these in your schema.yml:

columns:
  - name: customer_id
    description: "{{ doc('customer_id') }}"
  - name: event_timestamp
    description: "{{ doc('event_timestamp') }}"

You can prompt Claude to analyze all your models, identify commonly repeated columns, create a docs.md file with reusable blocks, and update schema.yml files to reference them. One prompt, and suddenly the same column has the same description everywhere:

Scan all schema.yml files in the project. Find columns that appear in 3+ models.
For each repeated column:
1. Create a docs block in models/docs.md
2. Write a single, authoritative description
3. Update all schema.yml references to use {{ doc('column_name') }}

After the initial cleanup, new descriptions reference existing docs blocks rather than introducing new wording for the same columns.

Documentation Standards

Encoding standards in your slash commands and CLAUDE.md makes documentation consistent regardless of who (or what) writes it.

Effective standards for dbt model documentation:

Model descriptions start with “This model…” and explain the grain. “This model contains one row per customer with their lifetime purchase metrics” is useful. “Customer data” is not.
Column descriptions include data type, business meaning, and source. “String. Unique identifier for a customer account, generated at account creation in Stripe.” tells the reader everything they need.
Terminology consistency. If your project calls it “session” instead of “visit,” every description should use “session.” Document the terminology in your data dictionary and reference it from CLAUDE.md.
Present tense. “Contains the total order value” not “This will contain” or “This contained.”
Note limitations. If a column is only populated for certain record types, or has known data quality issues, say so. “Not populated for guest checkout orders” prevents hours of debugging.

Lineage Documentation with Mermaid

Claude Code can generate Mermaid diagrams showing data lineage through your three-layer architecture:

Create a Mermaid flowchart showing the lineage from raw GA4 events
to the final marketing attribution model. Include all intermediate
transformations and key business logic at each step.

The output:

flowchart TD
    A[raw.ga4_events] --> B[base__ga4__events]
    B --> C[int__ga4_sessions_sessionized]
    C --> D[int__ga4_sessions_attributed]
    D --> E[mrt__marketing__attributed_conversions]

These diagrams are useful for:

Onboarding new team members who need to understand transformation chains
PR reviews where the reviewer needs to see the scope of changes in context
Incident response when tracing a data quality issue through the DAG
Documentation that lives alongside your models in Markdown files

Claude generates these by reading the ref() calls in your SQL files, which means the diagrams reflect the actual DAG — not an aspirational architecture diagram that drifted from reality six months ago.

The /document-model Workflow

Wrapping this all into a slash command makes it a one-step process. Save as .claude/commands/document-model.md:

---
description: Generate comprehensive dbt documentation for a model
allowed-tools: Bash(dbt:*), Read, Write
argument-hint: [model_name]
---
# Document Model: $ARGUMENTS

1. Read the model SQL file at models/**/$ARGUMENTS.sql
2. Identify all columns and their transformations
3. Check if schema.yml exists for this model
4. Generate or update the schema.yml with:
   - Model description explaining business purpose
   - Column descriptions explaining meaning and source
   - Appropriate tests for each column

## Documentation Standards
- Model descriptions: Start with "This model..." and explain the grain
- Column descriptions: Include data type, business meaning, and source
- Use consistent terminology from our data dictionary

Create/update the schema.yml file and show a summary of changes.

Running /document-model int__ga4_sessions_sessionized generates documentation in one command. Review the output, adjust anything Claude got wrong, and commit. For teams adopting Claude Code, this is a low-risk entry point: the output is bounded, verifiable, and easy to refine. Layer in testing workflows once the team is comfortable with the pattern.