Native dbt unit tests (introduced in 1.8) are high-effort, high-precision tools. They are worth writing for roughly 1% of columns — those where transformation logic is complex enough that a bug is genuinely plausible and data tests alone would not catch the regression. The question is which specific scenarios warrant them.
The Four Scenarios That Justify Unit Tests
Complex String Parsing and Regex
If you’re extracting domains from email addresses, parsing UTM parameters from URLs, or cleaning messy free-text fields, the edge cases multiply fast. An email domain extractor needs to handle subdomains, unusual TLDs, and malformed inputs. A UTM parser needs to handle encoded characters, missing parameters, and parameter ordering variations.
Data tests won’t catch these edge cases. A not_null test on the output column passes even when the parser silently returns NULL for an unusual input format. A regex-based format check on the output validates the result’s shape, not whether the extraction logic is correct.
Unit tests let you define the edge cases explicitly and verify each one:
unit_tests: - name: test_utm_source_extraction model: int__sessions__attributed given: - input: ref('base__ga4__events') rows: - {session_id: 1, page_referrer: "https://example.com?utm_source=google&utm_medium=cpc"} - {session_id: 2, page_referrer: "https://example.com?UTM_SOURCE=facebook"} # uppercase - {session_id: 3, page_referrer: "https://example.com"} # no UTM - {session_id: 4, page_referrer: null} # null referrer expect: rows: - {session_id: 1, utm_source: "google"} - {session_id: 2, utm_source: "facebook"} - {session_id: 3, utm_source: null} - {session_id: 4, utm_source: null}Date Calculations with Edge Cases
Fiscal year mappings, business day calculations, and timezone conversions are notorious for edge cases. A fiscal year that starts in April looks straightforward until you’re dealing with a date on March 31 near midnight in a UTC+5 timezone. A “days since last order” calculation behaves differently at month boundaries, year boundaries, and for customers with no prior orders.
Define the boundary cases as unit test fixtures and you’ve permanently documented the expected behavior while catching regressions automatically:
unit_tests: - name: test_fiscal_year_assignment model: int__finance__fiscal_calendar given: - input: ref('base__dates__spine') rows: - {date_day: '2024-03-31'} # last day of fiscal year - {date_day: '2024-04-01'} # first day of new fiscal year - {date_day: '2024-12-31'} # calendar year end, mid fiscal year - {date_day: '2025-03-31'} # last day of next fiscal year expect: rows: - {date_day: '2024-03-31', fiscal_year: 2024, fiscal_quarter: 4} - {date_day: '2024-04-01', fiscal_year: 2025, fiscal_quarter: 1} - {date_day: '2024-12-31', fiscal_year: 2025, fiscal_quarter: 3} - {date_day: '2025-03-31', fiscal_year: 2025, fiscal_quarter: 4}Multi-Branch CASE WHEN Logic
A column whose value depends on five conditions with overlap between categories is a maintenance hazard. When someone modifies the CASE WHEN to add a sixth category, or adjusts a priority order, they need to verify that existing behavior didn’t change — but there’s no automated verification unless unit tests are in place.
Unit tests document the expected behavior per branch and catch regressions when logic changes. They’re particularly valuable on customer segmentation, lead scoring, and order classification models where the business rules change regularly.
unit_tests: - name: test_customer_segment_assignment model: mrt__sales__customers given: - input: ref('int__customers__aggregated') rows: - {customer_id: 1, lifetime_value: 5000, order_count: 12, days_since_last_order: 15} - {customer_id: 2, lifetime_value: 5000, order_count: 12, days_since_last_order: 200} - {customer_id: 3, lifetime_value: 50, order_count: 1, days_since_last_order: 10} - {customer_id: 4, lifetime_value: 0, order_count: 0, days_since_last_order: null} expect: rows: - {customer_id: 1, segment: "champion"} - {customer_id: 2, segment: "at_risk"} - {customer_id: 3, segment: "new"} - {customer_id: 4, segment: "prospect"}Each row tests a different branch. If a future change accidentally reclassifies champions as at-risk, the test fails immediately on the next CI run.
Window Functions
Ranking, running totals, and lag/lead calculations are notoriously tricky. Window frames in particular — ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW vs. ROWS BETWEEN 1 PRECEDING AND CURRENT ROW — produce different results and the difference isn’t always obvious from a quick code review.
Mocking a small, controlled dataset lets you verify the window behaves exactly as intended:
unit_tests: - name: test_revenue_running_total model: mrt__finance__monthly_revenue given: - input: ref('int__orders__monthly') rows: - {month: '2024-01', revenue: 1000} - {month: '2024-02', revenue: 1500} - {month: '2024-03', revenue: 800} expect: rows: - {month: '2024-01', revenue: 1000, cumulative_revenue: 1000} - {month: '2024-02', revenue: 1500, cumulative_revenue: 2500} - {month: '2024-03', revenue: 800, cumulative_revenue: 3300}The Incremental Model Override Pattern
Unit tests become especially valuable for incremental models because the correctness of the merge logic depends on existing table state — something production data tests can’t easily replicate. The is_incremental() macro and this (the current table) can both be overridden:
unit_tests: - name: test_incremental_deduplication model: int__events_deduplicated overrides: macros: is_incremental: true given: - input: ref('base__segment__events') rows: - {event_id: 1, event_time: '2024-03-01 10:00:00', value: 100} - {event_id: 1, event_time: '2024-03-01 10:00:00', value: 150} # duplicate - input: this rows: - {event_id: 0, event_time: '2024-02-28 09:00:00', value: 50} # existing record expect: rows: - {event_id: 1, event_time: '2024-03-01 10:00:00', value: 150} # latest value winsThe overrides.macros.is_incremental: true forces the model into incremental mode. The input: this mocks the current table state. Together, they let you test the merge path explicitly — verifying that deduplication logic selects the right record, that new records are inserted correctly, and that existing records that shouldn’t be updated aren’t.
Incremental merge bugs accumulate silently across runs and can require backfilling months of data before detection. Unit tests with mocked this state catch these bugs before they reach production.
What to Skip
The community consensus is roughly 1% of columns warrant unit tests. Don’t try to unit test everything.
Skip passthrough columns. If a column just selects source.column_name with no transformation, there’s nothing to test. The warehouse either retrieves the value correctly or it doesn’t — that’s not something SQL unit tests verify.
Skip simple aggregations. A SUM(revenue) or COUNT(orders) doesn’t have edge cases worth testing. Add a not_null data test instead.
Skip models you’ve already thoroughly tested via data tests. If your unique + not_null tests plus a few range checks give you confidence in a model, adding unit tests doesn’t materially reduce risk.
Don’t try to cover every possible input combination. Unit tests document intent and catch regressions; they’re not exhaustive proofs. Pick 4-6 representative cases per model, focusing on the edge cases most likely to break — boundary values, nulls, unexpected inputs.
Practical Friction
The main friction in writing unit tests is “YAML accounting” — manually crafting fixtures for models with many input columns. A model that joins three sources, each with 20 columns, requires fixtures that specify every relevant column across all three inputs.
AI-assisted fixture generation helps significantly here. Describe the scenario you want to test, paste the model SQL, and ask for the fixture YAML. It’s a good fit for this kind of structured generation task.
For production runs, always exclude unit tests:
dbt build --exclude-resource-type unit_testUnit tests use mocked data and add no value in production. They belong in CI and local development only. Running them in production wastes compute and can cause confusion when tests that only ever saw clean fixtures encounter real-world messiness.
Two clear triggers for adding a unit test: fixing a production bug caused by incorrect transformation logic (regression prevention), or implementing complex business logic test-first. In both cases, the goal is testing specific anticipated failure modes, not coverage percentages.