Coverage metrics tell you whether descriptions exist. Drift detection tells you whether they’re still accurate. A project can report 95% documentation coverage while half those descriptions silently describe the wrong thing because the underlying models changed.
Drift takes three forms, each requiring a different detection approach: columns that exist in the warehouse but not in YAML, SQL that changed without corresponding YAML updates, and source schemas that shifted upstream.
Column-level drift
The most concrete form of drift: the warehouse has columns that your YAML doesn’t know about, or your YAML describes columns that no longer exist.
dbt-checkpoint at commit time
The check-model-has-all-columns hook from dbt-checkpoint catches this during development. It compares the columns in your compiled model against the columns in your YAML file and blocks the commit if they don’t match:
repos: - repo: https://github.com/dbt-checkpoint/dbt-checkpoint rev: v2.0.6 hooks: - id: dbt-parse - id: check-model-has-all-columns files: ^models/martsThe files: regex scopes this to marts only. Requiring all-column documentation for every staging model is usually too aggressive — those models change frequently and serve as internal implementation details.
Most dbt-checkpoint hooks require a fresh manifest.json, which is why dbt-parse runs first. This adds a few seconds to each commit but catches column drift that would otherwise surface in code review (or never surface at all).
dbt-osmosis for project-wide audit
For a project-wide check beyond what pre-commit hooks catch, dbt-osmosis can run in audit mode:
dbt-osmosis yaml auditThis exits non-zero if any model’s YAML is out of sync with the warehouse — columns added, removed, or reordered. Running it as a scheduled CI job (not just on PRs) catches drift that accumulates between pull requests, like when someone modifies a source table outside the dbt workflow or when a model’s compiled output changes due to an upstream dependency change.
The refactor command (dbt-osmosis yaml refactor) goes further and actually fixes the drift by adding missing columns, removing stale ones, and propagating descriptions. But for detection purposes, yaml audit gives you the signal without making changes.
Git-based staleness
A subtler form of drift happens when model SQL changes but its documentation doesn’t. The YAML might describe the correct columns, but the descriptions are outdated because the transformation logic changed.
Consider a mart model mrt__marketing__customer_ltv.sql that was modified in five PRs since its schema YAML was last touched. There’s a reasonable chance the descriptions are outdated — a column that was “total revenue” might now be “total revenue net of refunds” after a SQL change, but the description still says the old thing.
You can detect this with a Git-based comparison. A script that compares the last-modified dates of .sql files against their corresponding YAML entries flags models where the SQL changed more recently than the documentation:
#!/bin/bash# Find models where SQL was modified more recently than schema YAMLfor sql_file in models/**/*.sql; do model_name=$(basename "$sql_file" .sql) yaml_file=$(dirname "$sql_file")/schema.yml
if [ -f "$yaml_file" ]; then sql_date=$(git log -1 --format="%at" -- "$sql_file") yaml_date=$(git log -1 --format="%at" -- "$yaml_file")
if [ "$sql_date" -gt "$yaml_date" ]; then echo "STALE: $model_name (SQL updated $(date -d @$sql_date +%Y-%m-%d), YAML last touched $(date -d @$yaml_date +%Y-%m-%d))" fi fidoneThis is a heuristic, not a proof of staleness. A SQL change that only reformats whitespace does not make documentation stale. Models with the largest gap between SQL and YAML modification dates are the most likely candidates for stale descriptions. Running this as a weekly scheduled job and posting results to Slack or a dashboard keeps staleness visible without blocking any workflow.
Limitations of the git approach
The script above treats each schema YAML file as a unit. If your YAML files contain multiple models (which is common), a change to one model’s section updates the file timestamp for all models in that file. This means the script can miss staleness for models that share a YAML file with a recently-documented model.
More precise detection would parse the YAML and compare per-model sections, but the simple file-level comparison catches the most common cases and is trivial to set up.
Schema drift for sources
Source tables sit outside your dbt project’s control. Someone on the application team adds a column to the users table, changes a column type, or drops a field — and your dbt source definitions don’t know about it.
Source freshness as a proxy signal
dbt source freshness doesn’t directly detect schema changes, but it provides a useful proxy. If a source hasn’t been refreshed on schedule, the data flowing through your models may not match what the documentation describes. Freshness failures are a signal to investigate whether the source schema changed too.
Dedicated schema drift detection
dbt_schema_drift detects when source schemas change upstream. It compares the current warehouse schema against your declared source definitions and flags differences — new columns, removed columns, changed types.
This is particularly important for sources because:
- Source changes happen outside your team’s PR process
- There’s no pre-commit hook that can catch external schema changes
- The gap between a source schema change and its discovery can be weeks or months
- Documentation written against the old schema becomes misleading immediately
Running schema drift detection as a scheduled CI job (daily or on each pipeline run) gives you early warning when sources change. When drift is detected, you can update source YAML, propagate changes downstream with dbt-osmosis, and review whether any descriptions need updating.
Layering drift detection
No single tool catches all forms of drift. A practical setup combines them:
| Drift type | Tool | When it runs | What it catches |
|---|---|---|---|
| Column mismatch | dbt-checkpoint | Pre-commit | Columns added/removed in changed models |
| Column mismatch | dbt-osmosis audit | Scheduled CI | Project-wide column drift |
| Description staleness | Git date comparison | Scheduled (weekly) | SQL changes without YAML updates |
| Source schema changes | dbt_schema_drift | Scheduled CI | Upstream schema changes |
Pre-commit hooks catch drift as it is introduced. Scheduled jobs catch drift that accumulates between PRs or from external schema changes.