dbt Documentation Scaffolding Tools

Two open-source tools handle the mechanical parts of dbt documentation: generating empty YAML files and propagating existing descriptions through the DAG. Neither writes descriptions — they reduce the surface that requires human or AI attention.

dbt-codegen

The official dbt-codegen package from dbt Labs generates YAML scaffolding from your warehouse schema. Point it at a model and it produces a complete YAML block with every column name and an empty description field:

dbt run-operation generate_model_yaml --args '{"model_names": ["base__stripe__payments"]}'

The output gives you the structure — model name, all column names, empty descriptions. No more manually typing column names or discovering three months later that someone added columns you never documented.

If you’re using the codegen-plus-Claude-Code pattern, the upstream_descriptions: true flag pulls in descriptions from source definitions or upstream models so you don’t re-describe columns that already have documentation:

dbt run-operation generate_model_yaml --args '{"model_names": ["base__ga4__events"], "upstream_descriptions": true}'

dbt-codegen reads the compiled model, extracts the column list from the warehouse, and generates YAML. It does not write descriptions, add tests, or make judgment calls. The result is a complete YAML template with every column name and empty description fields.

dbt-osmosis

dbt-osmosis takes a fundamentally different approach. Instead of generating empty scaffolding, it propagates existing descriptions through your DAG by following lineage. If you’ve described customer__email in your base model, dbt-osmosis copies that description to every downstream model that uses the same column.

The core command:

dbt-osmosis yaml refactor

This single command does several things at once:

Scaffolds new YAML files for models that don’t have them
Injects columns from your warehouse into existing YAML (catching columns added since the last documentation pass)
Propagates descriptions from upstream models to downstream ones
Removes stale columns that no longer exist in the compiled model

On a project with 200+ models, running dbt-osmosis yaml refactor typically propagates descriptions to 30-50% of previously undocumented columns. The reason is simple: column names repeat across layers. customer_id appears in your base model, your intermediate joins, and your marts. If it’s documented once at the base layer, osmosis copies that description everywhere it appears downstream.

Setting Up as a Pre-Commit Hook

The real value of dbt-osmosis comes when you automate it. Set it up as a pre-commit hook and it runs on every commit, keeping YAML files in sync with your actual schema:

repos:
  - repo: local
    hooks:
      - id: dbt-osmosis
        name: dbt-osmosis yaml refactor
        entry: dbt-osmosis yaml refactor
        language: system
        pass_filenames: false

This catches the common drift problem: someone adds a column to a model, the YAML doesn’t get updated, and documentation slowly diverges from reality. With osmosis running on every commit, the YAML always reflects the current schema.

Combined workflow

dbt-codegen — initial scaffolding. Creates YAML files from scratch when adding new models. One-time operation per model.

dbt-osmosis — ongoing maintenance. Keeps YAML in sync with schema changes, propagates descriptions as downstream models are added, removes dropped columns.

A practical workflow:

Run dbt-codegen when adding a new model to generate the initial YAML structure
Write descriptions for the columns that are genuinely new (not inherited from upstream)
Run dbt-osmosis yaml refactor to propagate those descriptions to all downstream models
Set osmosis as a pre-commit hook so it runs automatically going forward

After this workflow, undocumented columns are limited to those with genuinely new business meaning not described anywhere upstream — requiring human or AI attention.

What these tools don’t do

Neither tool writes descriptions. They solve structural problems: missing YAML files, missing columns, and descriptions present in one place but not propagated downstream. On a 200-model project, scaffolding and propagation can move coverage from 20% to 60% without writing any new descriptions, reducing the remaining gap to columns that genuinely need attention.

Comparison

Feature	dbt-codegen	dbt-osmosis
Generates YAML from warehouse schema	Yes	Yes
Propagates descriptions through DAG	No	Yes
Removes stale columns	No	Yes
Pre-commit hook support	Not designed for it	Yes
Maintained by	dbt Labs	Community (z3z1ma)
Use case	Initial scaffolding	Ongoing maintenance

Both are dbt packages you install via packages.yml (codegen) or pip (osmosis). Neither requires dbt Cloud — they work with any dbt Core project.