ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Documentation Scaffolding Tools

How dbt-codegen and dbt-osmosis handle the mechanical parts of documentation — generating YAML skeletons and propagating descriptions through your DAG

Planted
dbtautomationdata quality

Two open-source tools handle the mechanical parts of dbt documentation: generating empty YAML files and propagating existing descriptions through the DAG. Neither writes descriptions — they reduce the surface that requires human or AI attention.

dbt-codegen

The official dbt-codegen package from dbt Labs generates YAML scaffolding from your warehouse schema. Point it at a model and it produces a complete YAML block with every column name and an empty description field:

Terminal window
dbt run-operation generate_model_yaml --args '{"model_names": ["base__stripe__payments"]}'

The output gives you the structure — model name, all column names, empty descriptions. No more manually typing column names or discovering three months later that someone added columns you never documented.

If you’re using the codegen-plus-Claude-Code pattern, the upstream_descriptions: true flag pulls in descriptions from source definitions or upstream models so you don’t re-describe columns that already have documentation:

Terminal window
dbt run-operation generate_model_yaml --args '{"model_names": ["base__ga4__events"], "upstream_descriptions": true}'

dbt-codegen reads the compiled model, extracts the column list from the warehouse, and generates YAML. It does not write descriptions, add tests, or make judgment calls. The result is a complete YAML template with every column name and empty description fields.

dbt-osmosis

dbt-osmosis takes a fundamentally different approach. Instead of generating empty scaffolding, it propagates existing descriptions through your DAG by following lineage. If you’ve described customer__email in your base model, dbt-osmosis copies that description to every downstream model that uses the same column.

The core command:

Terminal window
dbt-osmosis yaml refactor

This single command does several things at once:

  • Scaffolds new YAML files for models that don’t have them
  • Injects columns from your warehouse into existing YAML (catching columns added since the last documentation pass)
  • Propagates descriptions from upstream models to downstream ones
  • Removes stale columns that no longer exist in the compiled model

On a project with 200+ models, running dbt-osmosis yaml refactor typically propagates descriptions to 30-50% of previously undocumented columns. The reason is simple: column names repeat across layers. customer_id appears in your base model, your intermediate joins, and your marts. If it’s documented once at the base layer, osmosis copies that description everywhere it appears downstream.

Setting Up as a Pre-Commit Hook

The real value of dbt-osmosis comes when you automate it. Set it up as a pre-commit hook and it runs on every commit, keeping YAML files in sync with your actual schema:

.pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: dbt-osmosis
name: dbt-osmosis yaml refactor
entry: dbt-osmosis yaml refactor
language: system
pass_filenames: false

This catches the common drift problem: someone adds a column to a model, the YAML doesn’t get updated, and documentation slowly diverges from reality. With osmosis running on every commit, the YAML always reflects the current schema.

Combined workflow

dbt-codegen — initial scaffolding. Creates YAML files from scratch when adding new models. One-time operation per model.

dbt-osmosis — ongoing maintenance. Keeps YAML in sync with schema changes, propagates descriptions as downstream models are added, removes dropped columns.

A practical workflow:

  1. Run dbt-codegen when adding a new model to generate the initial YAML structure
  2. Write descriptions for the columns that are genuinely new (not inherited from upstream)
  3. Run dbt-osmosis yaml refactor to propagate those descriptions to all downstream models
  4. Set osmosis as a pre-commit hook so it runs automatically going forward

After this workflow, undocumented columns are limited to those with genuinely new business meaning not described anywhere upstream — requiring human or AI attention.

What these tools don’t do

Neither tool writes descriptions. They solve structural problems: missing YAML files, missing columns, and descriptions present in one place but not propagated downstream. On a 200-model project, scaffolding and propagation can move coverage from 20% to 60% without writing any new descriptions, reducing the remaining gap to columns that genuinely need attention.

Comparison

Featuredbt-codegendbt-osmosis
Generates YAML from warehouse schemaYesYes
Propagates descriptions through DAGNoYes
Removes stale columnsNoYes
Pre-commit hook supportNot designed for itYes
Maintained bydbt LabsCommunity (z3z1ma)
Use caseInitial scaffoldingOngoing maintenance

Both are dbt packages you install via packages.yml (codegen) or pip (osmosis). Neither requires dbt Cloud — they work with any dbt Core project.