AI Tools for dbt Documentation

After scaffolding and propagation handle the mechanical parts of documentation, remaining columns require actual descriptions. 50% of practitioners already use AI for this step (dbt Labs 2025 State of Analytics Engineering report). All tools below share one fundamental limitation: they describe what the SQL does, not what the data means to the business. Bridging that gap with business context is what separates useful documentation from column-name rephrasing.

dbt Copilot

dbt Copilot went GA in March 2025, available across Starter, Enterprise, and Enterprise+ plans. It generates YAML documentation using project metadata — lineage, schema relationships, model SQL — without accessing row-level data. One click in the dbt Cloud IDE gives you descriptions for a model and its columns.

What it sees: Your SQL, metadata, lineage relationships, and schema information.

What it doesn’t see: Your PRDs, Slack conversations, business glossary, or any context outside the dbt project.

The result: Descriptions that are accurate about the technical transformation but may miss business meaning. The dbt docs are upfront about this: “Always review AI-generated content, as it may be incorrect.”

Best for: Teams already on dbt Cloud who want documentation with zero additional tooling. You get Copilot with your existing subscription.

Claude Code with the dbt MCP Server

For teams on dbt Core, Claude Code with the dbt MCP Server (GA October 2025) connects to your project context: lineage, column-level relationships, contracts, tests, freshness metadata. Combined with a well-configured CLAUDE.md, it generates documentation that follows your team’s conventions.

The open-source Agent Skills library from dbt Labs provides curated instructions that improved benchmark accuracy from 56% to 58.5% on dbt tasks. Not dramatic, but meaningful when multiplied across hundreds of models.

What it sees: Everything the MCP Server exposes (lineage, column relationships, contracts, tests, freshness) plus whatever context you put in CLAUDE.md and your project files.

What it doesn’t see: Internal knowledge bases, unless you either copy relevant context into CLAUDE.md or set up a RAG pipeline.

The result: Descriptions that follow your team’s conventions (if those conventions are encoded in CLAUDE.md) and reflect the SQL transformations. The codegen-plus-Claude-Code pattern produces the most consistent results: scaffold the YAML skeleton first, then let Claude fill in descriptions.

Best for: dbt Core teams who are already using Claude Code for development. The MCP Server adds documentation context without changing your editor or workflow.

Altimate AI (dbt Power User)

The dbt Power User extension from Altimate AI offers bulk documentation generation with three personas (technical, business, general) and multi-language support. Through Paradime’s pricing at $25-55/user/month, it’s the most affordable AI documentation option for teams using VS Code or Cursor.

What it sees: Your dbt project files, schema, and whatever context the extension can extract.

Best for: Teams that want AI documentation without changing their editor or workflow, particularly those already in Cursor or VS Code.

Choosing Between Them

Start with what you already have.

If you use…	Start with…
dbt Cloud	dbt Copilot (included in subscription)
dbt Core + Claude Code	Claude Code + MCP Server
VS Code/Cursor without Claude Code	Altimate AI / dbt Power User

These tools are not mutually exclusive. Copilot can handle initial documentation in the Cloud IDE; Claude Code can refine locally. Altimate AI can handle bulk generation; Claude Code can be used for models requiring more nuanced descriptions.

What All AI Tools Get Wrong

The Recce blog documented an instructive failure: Claude Code silently filtered out rows with missing org_ids while building a model. A data quality decision made by an AI, buried in the code. The same pattern appears in documentation:

Column-name rephrasing. customer__segment described as “The segment of the customer.” Technically not wrong, completely useless.
Assumed business logic. AI might describe a column as “the customer’s lifetime value” when your business calculates LTV differently from any standard definition.
Missing exclusions. A status field described without mentioning that certain statuses are excluded from reporting.
Convention mismatches. Different description styles across models because the AI wasn’t given consistent formatting rules.

The Altimate AI team found that the top source of errors was mismatched conventions, fixed by adding the instruction “read 2-3 existing models first.” Each correction makes the next round better.

Building a Feedback Loop

Treat AI-generated descriptions with the same PR review scrutiny as SQL changes, because a misleading description causes downstream errors just as easily as a wrong JOIN. Every time you catch an AI documentation error:

Fix the description in the schema.yml
Add a rule to your CLAUDE.md or Skills file that prevents the same mistake
If the error was about business logic, consider adding that context to your CLAUDE.md or RAG pipeline

This feedback loop is what turns AI documentation from a one-shot generation tool into a system that improves over time. The first pass might need heavy review. The fifth pass, with accumulated corrections encoded in your project configuration, needs much less.

Coverage Prioritization

AI generates most value in the 20–80% documentation coverage range, where the SQL transformation tells most of the story. The 80–100% range requires human review because the remaining columns tend to have the most complex business logic — edge cases, business-specific definitions, and field meanings that vary by context.

Start with the most-queried models rather than trying to document everything at once. Ten critical models thoroughly documented, with business context and reviewed descriptions, are more valuable than 200 models with AI-generated first drafts that no one has checked.