Stale documentation is worse than missing documentation

Stale documentation is worse than missing documentation. Missing documentation is recognizable: an analyst who sees no description knows to investigate. Stale documentation removes that signal — a column described as “Total revenue net of refunds” that now includes tax (because the SQL changed without the YAML updating) looks authoritative. The analyst trusts it, builds a dashboard on it, and the numbers are silently wrong.

The default trajectory

dbt projects follow a common pattern: documentation written during initial development stays frozen while the project evolves — columns are added, models are refactored, business logic changes. A 2024 Informatica survey found that 79% of organizations have undocumented data pipelines. A substantial portion of the remaining 21% have stale documentation — technically present, functionally misleading.

Manual documentation updates don’t scale: the process has no feedback mechanism when someone forgets to update a description. The forgetting is invisible; the consequences are delayed.

The damage model

Stale documentation causes harm through three channels:

False confidence in analysis. An analyst who trusts a description that no longer reflects reality will produce incorrect reports without knowing it. Unlike a query error (which fails loudly), a semantic error (using the wrong definition of a metric) fails silently. The dashboard looks fine. The numbers are wrong.

Compounding AI errors. AI tools increasingly read schema descriptions to generate SQL, answer questions, and build analyses. When those descriptions are wrong, the AI confidently produces code based on wrong assumptions. The AI documentation tools that generate descriptions also read existing descriptions as context — so stale docs don’t just mislead humans, they train AI to repeat the mistakes.

Erosion of trust in documentation as a system. Once a team discovers that documentation can’t be trusted, they stop reading it. The documentation investment becomes counterproductive: it discourages future documentation efforts while providing no benefit.

Why manual discipline alone won’t fix this

When SQL changes without a documentation update, no test fails, no CI check fires (unless drift detection is configured), and no reviewer catches it unless they are comparing the SQL diff against every YAML description. Documentation drift has no equivalent of a failing test — no immediate signal, and no pipeline block. The problem compounds silently.

The automation response

The fix is not more discipline — it’s layered automation that catches drift at multiple points:

Pre-commit hooks that block commits missing descriptions for changed models
Drift detection that flags when SQL changes without corresponding YAML updates
Coverage tracking that catches project-wide erosion before it becomes severe
AI remediation that generates first-draft descriptions for gaps

Each layer addresses a different failure mode. Together, they make stale documentation the exception rather than the default trajectory. The goal isn’t perfection — it’s reducing the window between when documentation becomes stale and when someone notices.

Accuracy matters more than coverage. A project with 70% coverage where every description is current is more useful than one with 100% coverage where half the descriptions are six months out of date.