ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Model Description Writing Patterns

Practical patterns for writing dbt model, column, and source descriptions that serve both business users and engineers — the three-question framework and when to use meta instead of description

Planted
dbtdata engineeringdata quality

Useful dbt model descriptions answer business purpose rather than restating the model name. “Orders table” tells a stakeholder nothing they couldn’t already see from the model name. Descriptions written for business purpose — source system, grain, inclusions and exclusions — communicate information that the name alone cannot.

The Three-Question Framework for Model Descriptions

A useful model description answers three questions:

  1. What system does this data come from?
  2. What does each row represent?
  3. What’s included or excluded?

Compare two descriptions of the same model:

# Bad: restates the model name
models:
- name: base__shopify__orders
description: "Orders table"
# Good: answers all three questions
models:
- name: base__shopify__orders
description: >
Completed e-commerce orders from Shopify, including
transaction details and customer information. One row
per order. Excludes cancelled and test transactions.

The good version tells a stakeholder where the data comes from (Shopify), what the grain is (one row per order), and what’s deliberately left out (cancelled and test orders). An analyst seeing this description in BigQuery’s schema panel or their BI tool’s column browser knows immediately whether this is the table they need.

The “one row per” pattern is particularly valuable. It communicates the grain of the table without requiring someone to inspect the data or trace the SQL. “One row per customer per day” tells you this is a daily snapshot. “One row per event” tells you it’s event-level. This single phrase eliminates a category of misuse where someone aggregates data at the wrong level.

Column Description Patterns

Column descriptions follow the same principle: answer “so what?” instead of restating the column name.

Instead of…Write…
Price amountPrice per unit in USD at time of purchase
Created timestampWhen the order was placed, in UTC
Customer segmentMarketing tier classification based on annual spend (PRD-102)
Is activeWhether the customer placed an order in the last 90 days
RevenueNet revenue after refunds, excluding tax and shipping

The patterns that make column descriptions useful:

  • Specify units. Is amount in USD or cents? Is duration in seconds or milliseconds? If every amount column in your project includes the currency and denomination, you prevent an entire class of calculation errors.
  • Specify timezones. Is created_at in UTC, the user’s local timezone, or the server’s timezone? This matters more than most teams realize until someone builds a report that’s off by hours.
  • State key relationships. “Foreign key to dim_customers.customer_id” saves someone from tracing the joins in your SQL.
  • Explain calculation logic for derived fields. “Average order value calculated as total revenue divided by order count, excluding orders under $1” tells an analyst whether this matches their definition of AOV.
  • Call out exclusions explicitly. A status column that doesn’t include all possible statuses should say which ones are filtered. “Order status. Includes only ‘completed’ and ‘shipped’; excludes ‘cancelled’, ‘draft’, and ‘test’” prevents someone from using this model for an analysis that requires those excluded statuses.

When your column descriptions follow these patterns consistently, doc blocks become especially powerful. Define customer_id, created_at, and your most-repeated columns once with full context, then reference them everywhere they appear.

Source Descriptions

Source descriptions deserve the same care as model descriptions, with an emphasis on the operational details that help debug freshness and quality issues:

sources:
- name: salesforce
description: >
CRM data from Salesforce, loaded via Fivetran connector
with 15-minute sync frequency. Contains account, contact,
opportunity, and activity objects.
tables:
- name: account
description: >
Salesforce Account records. One row per account.
Includes both active and archived accounts.
Synced every 15 minutes via Fivetran.

Three things make source descriptions useful: the upstream system name, the ingestion method, and the refresh cadence. When a freshness issue surfaces, an engineer seeing “loaded via Fivetran connector with 15-minute sync frequency” knows exactly where to look without digging through pipeline configs.

Description vs Meta: Different Jobs

dbt provides two places to attach metadata to models: description and meta. They serve different purposes and get confused often enough that the distinction matters.

description is for human-readable prose. It renders in the dbt docs site, gets pushed to warehouse comments via persist_docs, and is what analysts see when exploring your schema. Write it for people.

meta is for structured key-value pairs compiled into manifest.json. It’s consumed by external tools programmatically — data catalogs, orchestrators, governance platforms, custom scripts. Write it for machines.

models:
- name: mrt__marketing__campaign_performance
description: >
Daily campaign performance metrics aggregated from
Google Ads and Meta Ads. One row per campaign per day.
meta:
owner: "marketing_analytics"
contains_pii: false
sla_hours: 4
maturity: "production"

Ownership, PII classification, SLAs, data sensitivity tiers — these belong in meta because tooling needs to read them programmatically. What the model does, what each row represents, and what’s excluded — these belong in description because humans need to read them in context.

Mixing the two — putting ownership in descriptions or writing prose in meta — makes both less useful. The description becomes cluttered with metadata that should be machine-readable. The meta becomes a dumping ground for text that no tool can parse reliably.

Prioritization

In a project with many undocumented models, mart models are the highest-priority targets. They are what business users query, and missing or inaccurate descriptions at that layer cause the most confusion. Use dbt-codegen to generate YAML stubs with column names from the warehouse. Write model descriptions first, column descriptions second. The upstream_descriptions: true flag inherits descriptions from parent models, so only columns whose meaning changes at the mart layer need new descriptions.