dbt Documentation CI Enforcement

This note covers four tools for enforcing dbt documentation completeness in CI. Without enforcement, coverage erodes as models and columns are added without descriptions. The tools operate at different levels of granularity and can be layered.

dbt-coverage

dbt-coverage is the most straightforward enforcement tool. It calculates the percentage of columns with non-empty descriptions across your project and fails CI when coverage drops below a threshold.

# Generate a coverage report
dbt-coverage compute doc --manifest target/manifest.json

# Fail CI if coverage is below 80%
dbt-coverage compute doc --manifest target/manifest.json --cov-fail-under 0.80

--cov-fail-under 0.80 requires that 80% of all columns across all models have a non-empty description.

The critical nuance: dbt-coverage checks for non-empty descriptions, not quality. customer_id: "The ID of the customer" passes the coverage check while being practically useless. Coverage tools tell you whether documentation exists, not whether it’s good. That’s where human review (and AI review) comes in.

Progressive Thresholds

Rather than starting at 80% from day one, many teams ratchet up the threshold as they improve:

# CI pipeline
steps:
  - name: Check documentation coverage
    run: |
      CURRENT=$(dbt-coverage compute doc --manifest target/manifest.json | grep "Total" | awk '{print $NF}')
      # Fail if coverage decreased from last known baseline
      if (( $(echo "$CURRENT < $BASELINE_COVERAGE" | bc -l) )); then
        echo "Documentation coverage dropped from $BASELINE_COVERAGE to $CURRENT"
        exit 1
      fi

This approach prevents backsliding without requiring a specific absolute number. Each PR must maintain or improve coverage, never reduce it.

dbt-checkpoint

dbt-checkpoint catches undocumented columns before they even reach the PR stage by running as a pre-commit hook. The check-model-columns-have-desc hook validates that every column in your schema.yml files has a description:

repos:
  - repo: https://github.com/dbt-checkpoint/dbt-checkpoint
    rev: v2.0.0
    hooks:
      - id: check-model-columns-have-desc
        name: Check model columns have descriptions

Pre-commit hooks provide faster feedback than CI — the failure surfaces at commit time rather than after the pipeline runs. This catches the common failure mode of adding a column without updating the YAML.

dbt-checkpoint includes other useful hooks beyond documentation: check-model-has-tests-by-name ensures models have minimum test coverage, check-model-has-properties-file ensures every model has a corresponding YAML file, and check-source-has-freshness validates that sources have freshness checks.

dbt-score

dbt-score from Picnic Technologies goes beyond binary “has description / doesn’t have description” by assigning a 0-10 quality score per model. The score considers multiple factors: documentation coverage, test coverage, model naming conventions, and other configurable quality rules.

dbt-score score --manifest target/manifest.json

The output gives you a per-model score that helps prioritize documentation work. A model with a score of 3/10 needs more attention than one at 7/10. You can set a minimum score threshold in CI, similar to dbt-coverage but covering a broader definition of quality.

Where dbt-coverage answers “how much documentation exists?”, dbt-score answers “how good is this model overall?” Documentation is one component of a model’s quality score, alongside testing, naming, and structure.

dbt-bouncer

dbt-bouncer enforces configurable conventions across the entire project. It’s more flexible than the other tools because you define the rules:

manifest_checks:
  - name: check_model_description_populated
    include: "models/marts"
  - name: check_column_description_populated
    include: "models/marts"
  - name: check_model_has_unique_test
  - name: check_model_names
    model_name_pattern: "^(base|int|mrt)__"

dbt-bouncer is particularly useful for teams with strong naming conventions and documentation standards that vary by layer. You might require 100% documentation for mart models (which external consumers query) while being more lenient on intermediate models (which are internal implementation details).

Layering These Tools

These tools aren’t mutually exclusive. A practical CI setup layers them:

Tool	Stage	What it catches
dbt-checkpoint	Pre-commit	Missing descriptions on changed models
dbt-osmosis	Pre-commit	Schema drift, missing YAML columns
dbt-coverage	CI pipeline	Overall documentation coverage decline
dbt-bouncer	CI pipeline	Convention violations, missing tests
dbt-score	CI pipeline (optional)	Overall model quality regression

The pre-commit hooks provide instant feedback to the developer. The CI checks provide project-wide enforcement that catches issues the local hooks might miss (like a model in a different directory affected by a schema change).

Recommended starting sequence

Start with dbt-coverage --cov-fail-under 0.50 to establish a baseline without blocking work
Add dbt-checkpoint’s check-model-columns-have-desc as a pre-commit hook
Ratchet up the coverage threshold by 5-10% each month
Add dbt-bouncer rules for naming conventions once coverage is stable
Target 80% as a steady-state minimum, using scaffolding tools and AI documentation to close gaps