Organizing dbt Unit Tests at Scale

As your dbt project grows, so does your unit test suite. A mature project might have hundreds of unit tests across dozens of models. Without organization, running tests becomes slow and selecting relevant tests becomes tedious. The solution is a tagging strategy combined with a tiered CI pipeline.

Tag Strategy

Tags let you categorize tests and run subsets selectively. A useful tagging strategy has three dimensions:

Criticality: How important is this test for blocking deployments?

critical — Tests that must pass before any deployment
edge-case — Important but not deployment-blocking

Domain: Which part of the business does the model serve?

finance, marketing, product, operations

Test type: What category of bug does the test catch?

regression — Prevents known bugs from recurring
smoke-test — Quick sanity checks on core logic

unit_tests:
  - name: test_mrt_finance_orders_revenue_calculation
    model: mrt__finance__orders
    config:
      tags: ["critical", "finance", "regression"]
    # ...

  - name: test_mrt_finance_orders_empty_cart
    model: mrt__finance__orders
    config:
      tags: ["edge-case"]
    # ...

Keep tags flat and lowercase. Avoid deeply nested hierarchies or tag proliferation. Three to four tags per test is the sweet spot — enough to enable useful selection, not so many that the tags themselves become maintenance overhead.

CI Pipeline Tiers

The real payoff of tagging is selective execution at different stages of your deployment pipeline:

Tier 1: PR Checks (Every Pull Request)

dbt test --select tag:critical,test_type:unit

Run only critical unit tests. This should complete in 1-2 minutes to give developers fast feedback. If a critical test fails, the PR can’t merge.

Tier 2: Merge to Main (Post-Merge)

dbt test --select test_type:unit

Run all unit tests — critical, edge-case, regression, everything. This is comprehensive validation that takes 5-10 minutes. Failures here indicate that two individually-passing PRs interact badly, or that a non-critical edge case was broken.

Tier 3: Production Deployment

dbt build --exclude-resource-type unit_test

Don’t run unit tests in production. They use mocked data and add no value against real data. Data tests are what belong in production runs. This is the most important rule and the one teams most commonly violate.

Selecting Tests by Domain

When you’re working on finance models, you don’t need to run marketing tests. Tags enable domain-scoped development:

# All finance unit tests
dbt test --select tag:finance,test_type:unit

# All marketing unit tests for a specific model
dbt test --select mrt__marketing__customer_attribution,test_type:unit

This is particularly useful during development. Instead of running the full test suite after every change, run only the tests relevant to the models you’re modifying. Save the full suite for pre-merge validation.

File Organization

For projects with many unit tests, consider co-locating test files with the models they test:

models/
├── intermediate/
│   ├── int__events_processed.sql
│   └── _int__unit_tests.yml        # Unit tests for intermediate models
├── marts/
│   ├── finance/
│   │   ├── mrt__finance__orders.sql
│   │   └── _mrt_finance__unit_tests.yml
│   └── marketing/
│       ├── mrt__marketing__attribution.sql
│       └── _mrt_marketing__unit_tests.yml

The underscore prefix keeps test files sorted before model files. Co-location means developers find the tests alongside the models they modify. The alternative — a separate tests/unit/ directory — works too, but creates distance between models and their tests that slows down development.

The Pattern Library Reference

As your test suite grows, a quick-reference table helps developers find the right pattern for their scenario:

Pattern	Key Technique	When to Use
Incremental dual-mode	`is_incremental: false/true` + `this`	Any incremental model
SCD2 date ranges	Test derived models consuming snapshots	Snapshot consumers
Window functions	Out-of-order input rows + multiple partitions	ROW_NUMBER, LEAD/LAG, running totals
CASE WHEN boundaries	Boundary values at each threshold	Customer segments, tiers, status
Empty tables	`format: sql` with `WHERE false`	Models that join to optional tables
Null handling	Explicit null rows in fixtures	Any aggregation model
Macro testing	Ephemeral wrapper model	Shared macros with business logic

Keep this reference in your project’s contributing documentation or as a comment in your test YAML files. When a developer needs to write their first unit test, the right pattern is one lookup away.