ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Unit Test YAML Syntax

Complete reference for dbt unit test YAML structure — required elements, input formats (dict, csv, sql), optional configuration, and version-specific features.

Planted
dbttesting

Every dbt unit test lives in a YAML file with four required elements: a name, a target model, mocked inputs (given), and expected output (expect). There are three input formats and several optional configuration options.

Required Elements

Every unit test needs exactly four things: a name, a target model, input data (given), and expected output (expect).

unit_tests:
- name: test_customer_status_logic # Unique identifier
model: mrt__core__customers # Model being tested
given: # Mock inputs
- input: ref('base__crm__customers')
rows:
- {customer_id: 1, status: "active"}
expect: # Expected output
rows:
- {customer_id: 1, is_active: true}

The given section accepts three kinds of inputs:

  • ref() for models — the most common case
  • source() for source tables
  • this for self-references in incremental models

A critical convenience: you only need to specify columns that your logic actually uses. If base__crm__customers has 30 columns but your test only cares about customer_id and status, those two are sufficient. dbt handles the rest by filling in defaults.

Input Formats

dbt supports three formats for defining test data. Each has a sweet spot.

Dict Format

The default and most readable for small datasets. Each row is a YAML dictionary:

given:
- input: ref('base__shopify__orders')
format: dict
rows:
- {order_id: 1, amount: 100.00, status: "completed"}
- {order_id: 2, amount: 50.00, status: "pending"}

Dict format is what you’ll use 90% of the time. It’s concise, easy to scan, and plays well with code review diffs. The only downside is readability degradation when rows have many columns — at that point the YAML lines get long.

CSV Format

Better for larger datasets or when you want to share fixture files across tests:

given:
- input: ref('base__shopify__orders')
format: csv
rows: |
order_id,amount,status
1,100.00,completed
2,50.00,pending

CSV format also supports external fixture files, which is useful when the same dataset is needed by multiple tests:

given:
- input: ref('base__shopify__orders')
format: csv
fixture: order_test_data

This looks for tests/fixtures/order_test_data.csv in your project. External fixtures keep your YAML files clean when test data is large, but they add indirection — someone reading the test needs to open a separate file to see the inputs.

SQL Format

Required in two specific situations: ephemeral models and empty table scenarios.

given:
- input: ref('base__shopify__orders')
format: sql
rows: |
select 1 as order_id, 100.00 as amount, 'completed' as status
union all
select 2 as order_id, 50.00 as amount, 'pending' as status

For testing zero-row scenarios (what happens when a table is empty?), SQL format is the only option:

given:
- input: ref('base__shopify__orders')
format: sql
rows: |
select
cast(null as int64) as order_id,
cast(null as float64) as amount
where false

The where false trick creates a result set with the correct schema but zero rows. This is essential for testing models that need to handle empty upstream tables gracefully.

One important caveat with SQL format for ephemeral models: you must include ALL columns the model references, not just the ones relevant to your test. Ephemeral models can’t be queried for their schema, so dbt has no way to fill in defaults.

Optional Configuration

Beyond the basics, unit tests support description, tags, meta, and conditional enablement:

unit_tests:
- name: test_revenue_calculation
model: mrt__finance__orders
description: "Validates gross revenue calculation including tax"
config:
tags: ["critical", "finance"]
meta:
owner: "data-team"
ticket: "DATA-1234"
enabled: "{{ target.name != 'prod' }}" # v1.9+ only
given:
- input: ref('base__shopify__orders')
rows:
- {order_id: 1, subtotal: 100.00, tax_rate: 0.08}
expect:
rows:
- {order_id: 1, gross_revenue: 108.00}

Tags are particularly useful. They enable selective test runs (dbt test --select tag:critical) and make it easy to run just the unit tests that matter for a specific domain. If you tag tests by business area (finance, marketing, core), teams can run only their tests during development.

The enabled config lets you skip tests in certain environments. This is a dbt 1.9+ feature — it won’t work on 1.8.

The meta block is freeform. Use it for ownership (owner), traceability (ticket), or any project-specific metadata your team needs.

Version Compatibility

Unit testing syntax has evolved across dbt versions, and the differences matter:

  • dbt 1.8 (May 2024): Unit testing introduced. The tests: key was renamed to data_tests: to avoid ambiguity with unit tests. If you’re upgrading from pre-1.8, rename any tests: blocks in your YAML to data_tests:.
  • dbt 1.9: Added the enabled config option for conditional test execution. New --resource-type and --exclude-resource-type flags for filtering.
  • dbt 1.11: Unit tests for disabled models are now automatically disabled — no more orphaned tests failing on models that have been turned off.

The data_tests: rename catches teams off guard during upgrades. If you see unexpected behavior after moving to 1.8+, check whether you still have tests: blocks that need renaming.

Choosing a Format

For most teams, the decision is simple:

  • Dict format for everything under ~10 rows per input (which is almost every unit test)
  • CSV fixtures when the same large dataset is reused across multiple tests
  • SQL format only when dict/csv can’t work — ephemeral models and empty tables

Avoid mixing formats within a single test unless there’s a strong reason. Consistency makes tests easier to read and review.

The expect block supports the same formats as given, though dict format is almost always the right choice there — expected outputs are typically small, and you want them visible inline for quick comparison.