Adrienne Vermorel
Unit Testing dbt Models: Real-World Examples and Patterns
This is Part 2 of a 3-part series on dbt unit testing. Building on the syntax from Part 1, this article covers practical patterns for incremental models, snapshots, complex SQL, and marketing analytics.
Part 1 covered the fundamentals: YAML syntax, BigQuery quirks, and CI/CD setup. Now let’s apply that knowledge to the models that actually keep you up at night: incremental logic, window functions, attribution models, and all the edge cases that slip through code review.
This article is a pattern library. Each section tackles a specific scenario with copy-paste-ready examples you can adapt to your own models. But more importantly, it explains why each pattern matters and how to think about designing these tests. Because copy-pasting YAML without understanding what you’re testing is how you end up with a green CI pipeline and broken dashboards.
Testing Incremental Models
Incremental models are the most common source of subtle bugs in dbt projects. Why? Because they have two completely different code paths—one that runs on a full refresh, another that runs during normal incremental loads—and teams often test only one of them (usually the full refresh path, since that’s what dbt run does during development).
The typical failure mode looks like this: You develop your incremental model, it works perfectly in dev. You deploy to production, run a full refresh, everything looks great. Three months later, someone notices duplicates appearing in downstream reports. The incremental logic had a bug that only manifested after months of daily runs, when a subtle edge case in your WHERE clause finally hit production data.
Unit tests let you catch these bugs before deployment by simulating both execution paths with controlled test data.
The Dual-Mode Testing Approach
Every incremental model needs at least two unit tests: one for full refresh mode, one for incremental mode. This isn’t optional—if you’re only testing one mode, you’re leaving half your logic untested.
-- models/intermediate/int__events_processed.sql{{ config(materialized='incremental', unique_key='event_id') }}
select event_id, user_id, event_type, event_timestamp, processed_atfrom {{ ref('base__ga4__events') }}
{% if is_incremental() %}where event_timestamp > (select max(event_timestamp) from {{ this }}){% endif %}This model has two behaviors:
- Full refresh: Process all events from the source table
- Incremental: Only process events newer than the most recent event already in the target table
The {% if is_incremental() %} block is where bugs hide. Common mistakes include off-by-one errors (using >= vs >), timezone mismatches, and incorrect column references. Let’s write tests that would catch these issues.
Testing both modes:
unit_tests: # Test 1: Full refresh behavior - name: test_int_events_processed_full_refresh model: int__events_processed description: "On full refresh, all source rows should be processed" overrides: macros: is_incremental: false given: - input: ref('base__ga4__events') rows: - {event_id: 1, user_id: 100, event_type: "click", event_timestamp: "2024-06-01 10:00:00"} - {event_id: 2, user_id: 100, event_type: "purchase", event_timestamp: "2024-06-02 11:00:00"} - {event_id: 3, user_id: 101, event_type: "click", event_timestamp: "2024-06-03 12:00:00"} expect: rows: - {event_id: 1, user_id: 100, event_type: "click"} - {event_id: 2, user_id: 100, event_type: "purchase"} - {event_id: 3, user_id: 101, event_type: "click"}
# Test 2: Incremental behavior - name: test_int_events_processed_incremental model: int__events_processed description: "On incremental run, only new events should be inserted" overrides: macros: is_incremental: true given: - input: ref('base__ga4__events') rows: - {event_id: 1, user_id: 100, event_type: "click", event_timestamp: "2024-06-01 10:00:00"} - {event_id: 2, user_id: 100, event_type: "purchase", event_timestamp: "2024-06-02 11:00:00"} - {event_id: 3, user_id: 101, event_type: "click", event_timestamp: "2024-06-03 12:00:00"} - input: this # Current state of the target table rows: - {event_id: 1, user_id: 100, event_type: "click", event_timestamp: "2024-06-01 10:00:00"} expect: # Only rows NEWER than max(event_timestamp) in `this` rows: - {event_id: 2, user_id: 100, event_type: "purchase"} - {event_id: 3, user_id: 101, event_type: "click"}Let’s break down what’s happening in these tests:
Test 1 (Full Refresh):
- We set
is_incremental: falsein the overrides, simulating adbt run --full-refresh - We provide three events in the source table
- We expect all three events to appear in the output—no filtering happens
Test 2 (Incremental):
- We set
is_incremental: true, simulating a normal incremental run - We provide the same three source events, plus we mock the current state of the target table using
input: this - The
thistable contains one event with timestamp2024-06-01 10:00:00 - Our model’s
WHEREclause filters to events after that timestamp - Only events 2 and 3 (with timestamps on June 2 and 3) should be processed
Critical insight: The expect block represents what gets inserted or merged, not the final table state. In the incremental test above, event_id 1 already exists in this, so only events 2 and 3 appear in the expected output. This trips up many developers who expect to see the merged result—but unit tests validate the transformation logic, not the merge operation itself.
Testing Merge Logic
The merge_update_columns config adds another layer of complexity. When a row with an existing unique_key arrives, BigQuery will update only the specified columns—leaving others unchanged. This is useful for slowly changing dimension patterns, but it’s also a common source of confusion.
Consider this scenario: You have a users table where you want to update the email when it changes, but preserve the original created_at timestamp. The merge logic handles this, but how do you verify it works correctly?
-- models/intermediate/int__users_current.sql{{ config( materialized='incremental', unique_key='user_id', merge_update_columns=['email', 'updated_at']) }}
select user_id, email, created_at, updated_atfrom {{ ref('base__crm__users') }}
{% if is_incremental() %}where updated_at > (select max(updated_at) from {{ this }}){% endif %}unit_tests: - name: test_int_users_current_merge_update model: int__users_current description: "Existing users should have email updated, not created_at" overrides: macros: is_incremental: true given: - input: ref('base__crm__users') rows: - {user_id: 1, email: "new@example.com", created_at: "2024-06-15", updated_at: "2024-06-15"} - input: this rows: - {user_id: 1, email: "old@example.com", created_at: "2024-01-01", updated_at: "2024-01-01"} expect: rows: - {user_id: 1, email: "new@example.com", created_at: "2024-06-15", updated_at: "2024-06-15"}Important caveat: This test validates that your transformation logic produces the correct output row. It doesn’t actually test the merge behavior—that’s BigQuery’s responsibility. What you’re testing here is: “Given this source data and this existing state, does my model produce the right row to be merged?”
The expected output shows created_at: "2024-06-15" because that’s what the source data contains. In practice, the actual merge would preserve the original created_at: "2024-01-01" from the this table. If you need to test more complex merge scenarios, consider using dbt-audit-helper to compare actual results after a test run.
Testing Late-Arriving Data
Real-world data pipelines rarely receive events in perfect chronological order. Mobile apps batch events during offline periods. Third-party APIs have processing delays. CDC systems can replay historical changes. Your incremental logic needs to handle these “late arrivals” gracefully.
A common pattern uses a lookback window: instead of only processing events newer than max(event_timestamp), you also include events that arrived recently (based on a _loaded_at column) even if their event timestamp is older. This ensures you don’t miss data that was delayed in transit.
unit_tests: - name: test_int_events_processed_late_arriving model: int__events_processed description: "Late-arriving events within lookback window should be captured" overrides: macros: is_incremental: true vars: lookback_hours: 24 given: - input: ref('base__ga4__events') rows: # Event arrived late but timestamp is old - {event_id: 4, event_timestamp: "2024-06-01 08:00:00", _loaded_at: "2024-06-02 10:00:00"} # Normal new event - {event_id: 5, event_timestamp: "2024-06-02 09:00:00", _loaded_at: "2024-06-02 09:05:00"} - input: this rows: - {event_id: 1, event_timestamp: "2024-06-01 10:00:00"} expect: rows: - {event_id: 4, event_timestamp: "2024-06-01 08:00:00"} - {event_id: 5, event_timestamp: "2024-06-02 09:00:00"}In this test:
- Event 4 has an old timestamp (June 1, 08:00) but a recent load time (June 2, 10:00)—it arrived late
- Event 5 is a normal new event with both recent timestamp and load time
- The
thistable shows we’ve already processed up to June 1, 10:00 - A naive
WHERE event_timestamp > max(event_timestamp)would miss event 4 - The lookback window (controlled by the
lookback_hoursvar) ensures we catch it
This test verifies that your late-arrival handling works correctly. Without it, you might not notice missing events until a downstream analyst asks why their numbers don’t match the source system.
Prerequisite: Before running incremental unit tests, build your incremental models with
dbt run --select "config.materialized:incremental" --emptyto ensure the schema exists. Unit tests need to introspect the table structure, and they’ll fail with “Not able to get columns for unit test” if the target table doesn’t exist.
Testing Snapshot/SCD Type 2 Logic
Snapshots (Slowly Changing Dimension Type 2) track historical changes to your data. When a customer’s email changes, instead of overwriting the old value, dbt creates a new row with validity timestamps—preserving the complete change history.
Here’s the challenge: native dbt unit tests don’t directly support snapshot resources. You can’t write a unit test with model: my_snapshot because snapshots aren’t models—they’re a separate resource type with their own execution path.
This limitation makes sense when you think about it. Snapshots detect changes over time, comparing the current source state to the previous snapshot state. Unit tests run in isolation with mocked data—there’s no “previous state” to compare against.
So how do you test snapshot-related logic? Three strategies:
- Test the models that feed snapshots — ensure your staging layer produces correct data for the snapshot to consume
- Test the models that consume snapshot output — validate your SCD2 date range calculations and
is_currentflags - Test change detection logic — if you use hash-based change detection, verify the hashing works correctly
Testing Pre-Snapshot Base Models
The transformation that prepares data for a snapshot is just as important as the snapshot itself. If your staging model has bugs, the snapshot will faithfully preserve those bugs forever.
unit_tests: - name: test_base_crm_users_snapshot_ready model: base__crm__users description: "Base model should correctly prepare data for snapshot" given: - input: source('crm', 'users') rows: - {id: 1, name: "Alice", email: "alice@example.com", _updated_at: "2024-06-01"} - {id: 1, name: "Alice Smith", email: "alice@example.com", _updated_at: "2024-06-15"} expect: rows: - {user_id: 1, user_name: "Alice", email: "alice@example.com", updated_at: "2024-06-01"} - {user_id: 1, user_name: "Alice Smith", email: "alice@example.com", updated_at: "2024-06-15"}This test verifies that:
- Raw column names are correctly transformed (
id→user_id,name→user_name) - The
updated_atcolumn (which the snapshot will use for change detection) is properly exposed - Multiple versions of the same user are preserved (critical for SCD2 to work correctly)
Testing SCD2 Date Range Logic
After the snapshot runs, downstream models typically consume the snapshot output and calculate derived fields like valid_to dates and is_current flags. This is where window functions come into play—and where bugs love to hide.
If you have a model that consumes snapshot data and calculates validity periods:
-- models/intermediate/int__users_history.sqlselect user_id, user_name, valid_from, coalesce( lead(valid_from) over (partition by user_id order by valid_from), '2199-12-31' ) as valid_to, case when lead(valid_from) over (partition by user_id order by valid_from) is null then true else false end as is_currentfrom {{ ref('snp_users') }}unit_tests: - name: test_int_users_history_date_ranges model: int__users_history description: "Valid_to should be the next version's valid_from, or 2199-12-31 for current" given: - input: ref('snp_users') rows: - {user_id: 1, user_name: "Alice", valid_from: "2024-01-01"} - {user_id: 1, user_name: "Alice Smith", valid_from: "2024-06-15"} - {user_id: 2, user_name: "Bob", valid_from: "2024-03-01"} expect: rows: - {user_id: 1, user_name: "Alice", valid_from: "2024-01-01", valid_to: "2024-06-15", is_current: false} - {user_id: 1, user_name: "Alice Smith", valid_from: "2024-06-15", valid_to: "2199-12-31", is_current: true} - {user_id: 2, user_name: "Bob", valid_from: "2024-03-01", valid_to: "2199-12-31", is_current: true}The test verifies several things at once:
- Date range calculation: Alice’s first record should have
valid_to = "2024-06-15"(the next version’svalid_from), while her second record uses the far-future sentinel date - Current flag logic: Only the most recent version of each user should have
is_current: true - Single-version handling: Bob has only one record, so he’s automatically current with the sentinel
valid_to - Partition isolation: Alice’s records don’t affect Bob’s calculations (each user_id is a separate partition)
This is the kind of logic that’s nearly impossible to verify through code review alone. The window function partitioning, ordering, and frame specifications all interact in ways that only reveal bugs with actual data.
Testing Change Detection
Some teams implement their own change detection logic using row-level hashes, especially when working with sources that don’t have reliable updated_at timestamps. The idea is simple: concatenate all relevant columns, hash the result, and compare hashes between runs to detect changes.
The tricky part? Hash functions are sensitive to column ordering, null handling, and data type casting. A change that seems cosmetic (reordering columns in your hash) can completely break change detection.
unit_tests: - name: test_change_detection_hash model: base__crm__users_with_hash description: "Row hash should change when any tracked column changes" given: - input: source('crm', 'users') rows: - {id: 1, name: "Alice", email: "alice@example.com"} - {id: 2, name: "Bob", email: "bob@example.com"} expect: rows: - {user_id: 1, row_hash: "abc123"} # Calculate the actual expected hash - {user_id: 2, row_hash: "def456"}To write this test, you’ll need to calculate the expected hash values manually (or run your model once and capture the outputs). The test then becomes a regression safeguard—if someone accidentally modifies the hash logic, the test will catch it.
Pro tip: Consider adding a second test case where you change one column value and verify the hash changes appropriately:
- name: test_change_detection_hash_sensitivity model: base__crm__users_with_hash description: "Hash should differ when email changes" given: - input: source('crm', 'users') rows: - {id: 1, name: "Alice", email: "alice@example.com"} - {id: 1, name: "Alice", email: "alice.new@example.com"} expect: rows: - {user_id: 1, row_hash: "abc123"} # Original hash - {user_id: 1, row_hash: "xyz789"} # Different hash (email changed)Testing Complex SQL Patterns
Some SQL constructs are inherently difficult to reason about through code review. Window functions, complex CASE WHEN chains, and regex patterns all benefit from concrete examples that demonstrate correct behavior.
Window Functions
Window functions are prime candidates for unit testing because their behavior depends on three interacting factors: partitioning (which rows belong together), ordering (sequence within each partition), and framing (which rows are included in the calculation). Get any one of these wrong and your results will be subtly incorrect in ways that are hard to spot.
Consider this model that calculates session-level metrics:
-- models/intermediate/int__users_sessions.sqlselect user_id, event_timestamp, row_number() over (partition by user_id order by event_timestamp) as session_event_number, first_value(page_path) over (partition by user_id order by event_timestamp) as landing_page, sum(page_views) over (partition by user_id order by event_timestamp rows unbounded preceding) as cumulative_page_viewsfrom {{ ref('base__ga4__events') }}unit_tests: - name: test_int_users_sessions_window_functions model: int__users_sessions description: "Window functions should correctly partition and order by user" given: - input: ref('base__ga4__events') rows: # User 1's events (out of order in source) - {user_id: 1, event_timestamp: "2024-06-01 10:05:00", page_path: "/products", page_views: 2} - {user_id: 1, event_timestamp: "2024-06-01 10:00:00", page_path: "/home", page_views: 1} - {user_id: 1, event_timestamp: "2024-06-01 10:10:00", page_path: "/cart", page_views: 1} # User 2's events - {user_id: 2, event_timestamp: "2024-06-01 11:00:00", page_path: "/about", page_views: 3} expect: rows: # User 1 - ordered by timestamp - {user_id: 1, event_timestamp: "2024-06-01 10:00:00", session_event_number: 1, landing_page: "/home", cumulative_page_views: 1} - {user_id: 1, event_timestamp: "2024-06-01 10:05:00", session_event_number: 2, landing_page: "/home", cumulative_page_views: 3} - {user_id: 1, event_timestamp: "2024-06-01 10:10:00", session_event_number: 3, landing_page: "/home", cumulative_page_views: 4} # User 2 - {user_id: 2, event_timestamp: "2024-06-01 11:00:00", session_event_number: 1, landing_page: "/about", cumulative_page_views: 3}Notice how the test data is deliberately designed:
-
Out-of-order source rows: User 1’s events are listed out of chronological order in the
givenblock (10:05, 10:00, 10:10). This verifies that theORDER BY event_timestampactually works—if you accidentally order byevent_idinstead, the test would fail. -
Multiple users: Including User 2 confirms that partitioning works correctly. User 1’s events shouldn’t affect User 2’s
session_event_numberorlanding_page. -
Cumulative calculation: The
page_viewsvalues (1, 2, 1, 3) are chosen to make the running total easy to verify. If the frame clause is wrong (say, missingROWS UNBOUNDED PRECEDING), the numbers won’t add up.
This single test verifies ROW_NUMBER, FIRST_VALUE, and SUM with a window frame—all in one pass. If any of the window specifications are wrong, at least one expected value will be incorrect.
Complex CASE WHEN Logic
Business logic often lives in CASE WHEN statements. Customer segmentation, pricing tiers, status calculations—these multi-branch conditionals are deceptively simple to write and surprisingly easy to get wrong.
The most common bugs:
- Overlapping conditions: Two branches that can both match (evaluation order matters)
- Missing boundary values: Off-by-one errors at tier thresholds
- Null handling: Forgetting that
NULL >= 1000returns NULL, not FALSE - Implicit ELSE: Relying on the default
ELSE NULLwhen you meant to handle all cases explicitly
The solution is systematic boundary testing—create test rows that hit every threshold:
-- models/marts/core/mrt__core__customer_segments.sqlselect customer_id, total_spend, case when total_spend >= 10000 then 'platinum' when total_spend >= 5000 then 'gold' when total_spend >= 1000 then 'silver' when total_spend > 0 then 'bronze' else 'inactive' end as segmentfrom {{ ref('int__customers_summary') }}unit_tests: - name: test_mrt_core_customer_segments_all_tiers model: mrt__core__customer_segments description: "Each spend threshold should map to correct segment" given: - input: ref('int__customers_summary') rows: - {customer_id: 1, total_spend: 15000} # Platinum - {customer_id: 2, total_spend: 10000} # Platinum (boundary) - {customer_id: 3, total_spend: 9999} # Gold (just under) - {customer_id: 4, total_spend: 5000} # Gold (boundary) - {customer_id: 5, total_spend: 1000} # Silver (boundary) - {customer_id: 6, total_spend: 1} # Bronze - {customer_id: 7, total_spend: 0} # Inactive - {customer_id: 8, total_spend: null} # Inactive (null case) expect: rows: - {customer_id: 1, segment: "platinum"} - {customer_id: 2, segment: "platinum"} - {customer_id: 3, segment: "gold"} - {customer_id: 4, segment: "gold"} - {customer_id: 5, segment: "silver"} - {customer_id: 6, segment: "bronze"} - {customer_id: 7, segment: "inactive"} - {customer_id: 8, segment: "inactive"}The test data is intentionally comprehensive:
- Boundary values: Customers 2, 4, and 5 test exact threshold matches (10000, 5000, 1000). If someone uses
>instead of>=, these will fail. - Just-under values: Customer 3 tests 9999 to ensure gold (not platinum) kicks in
- Edge cases: Customer 6 tests the smallest positive value, customer 7 tests zero
- Null handling: Customer 8 verifies that null spend maps to “inactive” rather than causing an error or unexpected behavior
This approach—testing boundaries, not just typical values—catches the most common bugs in conditional logic.
Regex and String Manipulation
String extraction logic is notoriously fragile. Regex patterns that work for your typical data often fail on edge cases. Unit tests are essential for documenting expected behavior and catching regressions.
unit_tests: - name: test_mrt_core_customers_email_domain_extraction model: mrt__core__customers description: "Email domain extraction handles various formats" given: - input: ref('base__crm__customers') rows: - {customer_id: 1, email: "alice@example.com"} - {customer_id: 2, email: "bob.smith@subdomain.company.co.uk"} - {customer_id: 3, email: "CAPS@DOMAIN.COM"} - {customer_id: 4, email: "invalid-email"} - {customer_id: 5, email: null} expect: rows: - {customer_id: 1, email_domain: "example.com"} - {customer_id: 2, email_domain: "subdomain.company.co.uk"} - {customer_id: 3, email_domain: "domain.com"} - {customer_id: 4, email_domain: null} - {customer_id: 5, email_domain: null}Each test row serves a specific purpose:
- Simple case:
alice@example.com— baseline behavior - Complex domain:
subdomain.company.co.uk— multiple dots, TLD edge case - Case handling:
CAPS@DOMAIN.COM— should normalize to lowercase - Invalid input:
invalid-email— no@symbol, should gracefully return null rather than error - Null input:
null— explicit null handling
Without these tests, someone might “simplify” your regex and break the subdomain case, or change the null handling without realizing downstream models depend on it.
Marketing Analytics Testing Patterns
Marketing analytics models often contain the most complex—and highest-stakes—logic in a dbt project. Attribution determines how marketing budget gets allocated. Sessionization affects conversion rate calculations. Funnel analysis drives product decisions. When these models have bugs, the business makes decisions based on faulty data.
The patterns in this section come from real-world implementations. They’re more complex than basic unit tests, but they test logic that genuinely matters.
GA4 Sessionization
Google Analytics 4 sends raw events to BigQuery, but those events lack session-level aggregations. You need to build sessionization logic yourself—grouping events by user, detecting session boundaries (typically 30 minutes of inactivity), and calculating metrics like session duration and event counts.
This logic involves:
- Timestamp arithmetic (using
timestamp_micros()to convert GA4’s microsecond timestamps) - Window functions to calculate time gaps between events
- Session boundary detection based on configurable inactivity thresholds
- Aggregation to create session-level records from event-level data
Let’s test it:
-- models/intermediate/int__ga4_sessions.sqlwith events as ( select user_pseudo_id, ga_session_id, event_timestamp, event_name, -- Calculate time since previous event timestamp_diff( timestamp_micros(event_timestamp), lag(timestamp_micros(event_timestamp)) over ( partition by user_pseudo_id order by event_timestamp ), minute ) as minutes_since_last_event from {{ ref('base__ga4__events') }}),
sessionized as ( select *, -- New session if gap > 30 minutes or first event case when minutes_since_last_event > 30 or minutes_since_last_event is null then 1 else 0 end as is_new_session from events)
select user_pseudo_id, ga_session_id, concat(user_pseudo_id, '_', ga_session_id) as session_key, min(event_timestamp) as session_start, max(event_timestamp) as session_end, timestamp_diff( timestamp_micros(max(event_timestamp)), timestamp_micros(min(event_timestamp)), second ) as session_duration_seconds, count(*) as event_countfrom sessionizedgroup by 1, 2, 3unit_tests: - name: test_int_ga4_sessions_boundaries model: int__ga4_sessions description: "Sessions should break after 30 minutes of inactivity" given: - input: ref('base__ga4__events') rows: # User 1, Session 1: events within 30 min - {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717200000000000, event_name: "page_view"} - {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717201800000000, event_name: "scroll"} # +30 min # User 1, Session 2: gap > 30 min - {user_pseudo_id: "user_1", ga_session_id: 1002, event_timestamp: 1717207200000000, event_name: "page_view"} # +90 min from first # User 2, Session 1 - {user_pseudo_id: "user_2", ga_session_id: 2001, event_timestamp: 1717200000000000, event_name: "page_view"} expect: rows: - {session_key: "user_1_1001", session_duration_seconds: 1800, event_count: 2} - {session_key: "user_1_1002", session_duration_seconds: 0, event_count: 1} - {session_key: "user_2_2001", session_duration_seconds: 0, event_count: 1}The first test verifies core sessionization behavior:
- Session 1001: Two events 30 minutes apart (exactly at the boundary) belong to the same session, resulting in 1800 seconds duration
- Session 1002: A third event 90 minutes after the first triggers a new session (gap > 30 min from session 1001’s last event)
- Session 2001: User 2’s single event creates a session with 0-second duration
The microsecond timestamps (1717200000000000) are GA4’s native format. You’ll need to calculate these values based on your test scenarios.
- name: test_int_ga4_sessions_cross_midnight model: int__ga4_sessions description: "Sessions spanning midnight should not break artificially" given: - input: ref('base__ga4__events') rows: - {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717199400000000, event_name: "page_view"} # 23:50 - {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717200600000000, event_name: "purchase"} # 00:10 next day expect: rows: - {session_key: "user_1_1001", session_duration_seconds: 1200, event_count: 2} # 20 minutesThe cross-midnight test catches a subtle bug: some implementations accidentally break sessions at date boundaries (because they partition by date instead of by user). A user browsing at 11:50 PM who purchases at 12:10 AM should have a single 20-minute session, not two separate sessions.
Attribution Modeling
Attribution models answer the question: “Which marketing channels deserve credit for this conversion?” The answer determines millions of dollars in ad spend.
Common attribution models:
- First-touch: Credit goes to the channel that first acquired the customer
- Last-touch: Credit goes to the channel immediately before conversion
- Linear: Credit is split equally among all touchpoints
- Time-decay: More credit to touchpoints closer to conversion
Each model uses window functions to identify the relevant touchpoint. Let’s test first-touch and last-touch:
-- models/marts/marketing/mrt__marketing__customer_attribution.sqlselect customer_id, -- First touch first_value(utm_source) over ( partition by customer_id order by session_start ) as first_touch_source, first_value(utm_medium) over ( partition by customer_id order by session_start ) as first_touch_medium, -- Last touch (before conversion) last_value(utm_source) over ( partition by customer_id order by session_start rows between unbounded preceding and unbounded following ) as last_touch_source, -- Conversion info max(case when converted then session_start end) as conversion_timestampfrom {{ ref('int__customers_sessions') }}where converted = true or session_start <= ( select min(session_start) from {{ ref('int__customers_sessions') }} cs2 where cs2.customer_id = int__customers_sessions.customer_id and cs2.converted)group by customer_idunit_tests: - name: test_mrt_marketing_customer_attribution_first_last_touch model: mrt__marketing__customer_attribution description: "First touch captures initial source, last touch captures converting source" given: - input: ref('int__customers_sessions') rows: # Customer journey: Facebook -> Google -> Direct (conversion) - {customer_id: 100, session_start: "2024-06-01 10:00:00", utm_source: "facebook", utm_medium: "paid", converted: false} - {customer_id: 100, session_start: "2024-06-05 14:00:00", utm_source: "google", utm_medium: "organic", converted: false} - {customer_id: 100, session_start: "2024-06-10 09:00:00", utm_source: "direct", utm_medium: "none", converted: true} # Single-touch conversion - {customer_id: 101, session_start: "2024-06-02 11:00:00", utm_source: "email", utm_medium: "newsletter", converted: true} expect: rows: - {customer_id: 100, first_touch_source: "facebook", first_touch_medium: "paid", last_touch_source: "direct"} - {customer_id: 101, first_touch_source: "email", first_touch_medium: "newsletter", last_touch_source: "email"}This test verifies two scenarios:
Customer 100’s multi-touch journey:
- First encounter via Facebook paid ads (June 1)
- Returns via Google organic search (June 5)
- Converts via direct visit (June 10)
The expected output shows first-touch credit to Facebook (where the journey began) and last-touch credit to Direct (the final touchpoint before conversion). If your FIRST_VALUE or LAST_VALUE window functions have incorrect ordering or framing, this test will catch it.
Customer 101’s single-touch conversion: When there’s only one session before conversion, both first-touch and last-touch should be the same channel. This edge case often breaks when implementations assume at least two touchpoints.
- name: test_mrt_marketing_customer_attribution_no_conversion model: mrt__marketing__customer_attribution description: "Customers without conversion should not appear" given: - input: ref('int__customers_sessions') rows: - {customer_id: 200, session_start: "2024-06-01 10:00:00", utm_source: "facebook", utm_medium: "paid", converted: false} - {customer_id: 200, session_start: "2024-06-05 14:00:00", utm_source: "google", utm_medium: "organic", converted: false} expect: rows: []The no-conversion test is equally important. Attribution only makes sense for customers who actually converted. Customer 200 had sessions but never purchased—they should be excluded from the attribution model entirely. An empty expect block with rows: [] verifies this behavior.
Conversion Funnels
Funnel analysis tracks how users progress through a defined sequence of steps—page view → add to cart → checkout → purchase. At each step, some users drop off. The conversion rate between steps is a key metric for identifying friction points.
The logic involves:
- Counting distinct users at each funnel step
- Calculating step-over-step conversion rates (what percentage of users from step N reach step N+1)
- Handling users who skip steps (directly purchase without add-to-cart)
unit_tests: - name: test_mrt_marketing_conversion_funnel_dropoff model: mrt__marketing__conversion_funnel description: "Funnel should calculate conversion rates between steps" given: - input: ref('base__ga4__events') rows: # User 1: completes full funnel - {user_id: 1, event_name: "page_view", event_date: "2024-06-01"} - {user_id: 1, event_name: "add_to_cart", event_date: "2024-06-01"} - {user_id: 1, event_name: "begin_checkout", event_date: "2024-06-01"} - {user_id: 1, event_name: "purchase", event_date: "2024-06-01"} # User 2: drops at checkout - {user_id: 2, event_name: "page_view", event_date: "2024-06-01"} - {user_id: 2, event_name: "add_to_cart", event_date: "2024-06-01"} - {user_id: 2, event_name: "begin_checkout", event_date: "2024-06-01"} # User 3: drops at cart - {user_id: 3, event_name: "page_view", event_date: "2024-06-01"} - {user_id: 3, event_name: "add_to_cart", event_date: "2024-06-01"} # User 4: only views - {user_id: 4, event_name: "page_view", event_date: "2024-06-01"} expect: rows: - {funnel_step: "page_view", user_count: 4, step_conversion_rate: 1.0} - {funnel_step: "add_to_cart", user_count: 3, step_conversion_rate: 0.75} - {funnel_step: "begin_checkout", user_count: 2, step_conversion_rate: 0.67} - {funnel_step: "purchase", user_count: 1, step_conversion_rate: 0.5}Let’s trace through the test data:
| User | page_view | add_to_cart | begin_checkout | purchase |
|---|---|---|---|---|
| 1 | ✓ | ✓ | ✓ | ✓ |
| 2 | ✓ | ✓ | ✓ | |
| 3 | ✓ | ✓ | ||
| 4 | ✓ |
Calculating conversion rates:
- page_view: 4 users, 100% (starting point)
- add_to_cart: 3 users, 75% of page_view users (3/4)
- begin_checkout: 2 users, 67% of add_to_cart users (2/3)
- purchase: 1 user, 50% of checkout users (1/2)
The test verifies both user counts and conversion rates. If your model calculates conversion rate incorrectly (dividing by total users instead of previous step users, for example), this test will fail.
Edge Case Testing Strategies
Production data is messy. Nulls appear where you don’t expect them. Tables are occasionally empty. Dates hit boundary conditions. The models that survive in production are the ones designed with these edge cases in mind.
Unit tests are your tool for documenting and enforcing edge case behavior. Instead of hoping your model handles nulls gracefully, you prove it with a test.
Null Handling
SQL’s null semantics are notoriously confusing. NULL + 100 = NULL. NULL = NULL is not TRUE. SUM() ignores nulls, but AVG() doesn’t count null rows in the denominator. These behaviors lead to subtle bugs that only manifest when null values appear in production.
Test that your aggregations handle nulls correctly:
unit_tests: - name: test_int_customers_revenue_null_handling model: int__customers_revenue description: "Null order values should be treated as zero in sum" given: - input: ref('base__shopify__orders') rows: - {customer_id: 1, order_value: 100} - {customer_id: 1, order_value: null} - {customer_id: 1, order_value: 50} - {customer_id: 2, order_value: null} # All nulls expect: rows: - {customer_id: 1, total_revenue: 150} - {customer_id: 2, total_revenue: 0} # Or null, depending on your logicThis test clarifies expected behavior:
- Customer 1 has three orders: 100, null, 50. Does
SUM()return 150 (ignoring null) or null (null propagation)? - Customer 2 has only null orders. Does
SUM()return 0, null, or cause an error?
The comment “Or null, depending on your logic” is intentional—your model might use COALESCE(SUM(order_value), 0) to ensure a zero instead of null. The test documents whichever behavior you’ve chosen.
Empty Tables
What happens when an upstream table is empty? Maybe it’s a new deployment with no historical data. Maybe a source system failed and delivered zero records. Your model should handle this gracefully—either returning zero rows or returning a row with default values.
The challenge is that format: dict with rows: [] doesn’t work—dbt can’t infer the column types from an empty list. The solution is format: sql with a WHERE false clause:
unit_tests: - name: test_mrt_finance_daily_revenue_empty_orders model: mrt__finance__daily_revenue description: "Model should return zero revenue for days with no orders" given: - input: ref('base__shopify__orders') format: sql rows: | select cast(null as string) as order_id, cast(null as date) as order_date, cast(null as float64) as order_value where false - input: ref('int__dates') rows: - {date_key: "2024-06-01"} expect: rows: - {date_key: "2024-06-01", daily_revenue: 0, order_count: 0}The SQL format uses explicit casting (cast(null as string)) to define the column schema, then WHERE false ensures zero rows are returned. This gives dbt the type information it needs without providing any data.
The test verifies that when orders are empty but dates exist, the model still produces output with zero revenue—rather than returning nothing or throwing an error. This is especially important for reporting models that need to show all dates even when no activity occurred.
Date Boundaries
Date logic is a minefield. Fiscal years that don’t align with calendar years. Leap years. Week number calculations that vary by country. Time zones that shift during daylight saving transitions. These edge cases cause real bugs in production.
A fiscal calendar model is a perfect candidate for boundary testing:
unit_tests: - name: test_int_fiscal_calendar_boundaries model: int__fiscal_calendar description: "Fiscal year should handle year-end correctly (FY starts April 1)" given: - input: ref('int__dates') rows: - {calendar_date: "2024-03-31"} # Last day of FY2024 - {calendar_date: "2024-04-01"} # First day of FY2025 - {calendar_date: "2024-02-29"} # Leap year - {calendar_date: "2024-12-31"} # Calendar year end expect: rows: - {calendar_date: "2024-03-31", fiscal_year: 2024, fiscal_quarter: 4} - {calendar_date: "2024-04-01", fiscal_year: 2025, fiscal_quarter: 1} - {calendar_date: "2024-02-29", fiscal_year: 2024, fiscal_quarter: 4} - {calendar_date: "2024-12-31", fiscal_year: 2025, fiscal_quarter: 3}This test assumes a fiscal year starting April 1 (common in many organizations). Let’s trace through:
- March 31, 2024: Last day of FY2024 Q4 (fiscal year ends March 31)
- April 1, 2024: First day of FY2025 Q1 (new fiscal year begins)
- February 29, 2024: Leap year date—verifies the model handles Feb 29 correctly
- December 31, 2024: Calendar year end, but mid-fiscal-year (FY2025 Q3)
If someone hardcodes a calendar year assumption or forgets to handle leap years, at least one of these test cases will fail.
Testing Macros and Reusable Logic
Macros are the functions of the dbt world—reusable logic that you call from multiple models. A discount calculation macro might be used across dozens of models. A date formatting macro might be called hundreds of times. When macro logic is wrong, the bug propagates everywhere.
The challenge: dbt unit tests target models, not macros directly. You can’t write model: my_macro in your test YAML.
The Ephemeral Model Pattern
The solution is an ephemeral wrapper model that exists solely to test the macro. Ephemeral models don’t create tables in your warehouse—they’re compiled as CTEs into downstream models. This makes them perfect for testing: they add no storage cost and exist purely for the unit test.
-- models/test_helpers/test_calculate_discount.sql{{ config(materialized='ephemeral') }}
select order_value, customer_tier, {{ calculate_discount('order_value', 'customer_tier') }} as discount_amountfrom {{ ref('test_source') }}unit_tests: - name: test_calculate_discount_macro model: test_calculate_discount given: - input: ref('test_source') format: sql rows: | select 100.00 as order_value, 'gold' as customer_tier union all select 100.00 as order_value, 'silver' as customer_tier union all select 100.00 as order_value, 'bronze' as customer_tier expect: rows: - {order_value: 100.00, customer_tier: "gold", discount_amount: 20.00} - {order_value: 100.00, customer_tier: "silver", discount_amount: 10.00} - {order_value: 100.00, customer_tier: "bronze", discount_amount: 5.00}The wrapper model does three things:
- References a test source (also ephemeral) to provide input data
- Calls the macro being tested (
calculate_discount) - Outputs the result for comparison
The test then verifies that calculate_discount returns:
- 20% off ($20) for gold customers
- 10% off ($10) for silver customers
- 5% off ($5) for bronze customers
If someone modifies the macro logic, this test will catch unintended changes. Store these test helper models in a dedicated test_helpers/ directory and exclude them from production builds.
Organizing Tests at Scale
As your dbt project grows, so does your test suite. A mature project might have hundreds of unit tests across dozens of models. Without organization, running tests becomes slow and selecting relevant tests becomes tedious.
Using Tags Strategically
Tags let you categorize tests and run subsets selectively. Consider a tagging strategy based on:
- Criticality:
criticalfor tests that block deployment vs.nice-to-havefor extras - Domain:
finance,marketing,productto match your business domains - Test type:
regression,edge-case,smoke-test
unit_tests: - name: test_mrt_finance_orders_revenue_calculation model: mrt__finance__orders config: tags: ["critical", "finance", "regression"] # ...
- name: test_mrt_finance_orders_empty_cart model: mrt__finance__orders config: tags: ["edge-case"] # ...Now you can run subsets:
# Critical tests only (for quick CI feedback on every PR)dbt test --select tag:critical,test_type:unit
# Full regression suite (nightly or before releases)dbt test --select tag:regression,test_type:unit
# Domain-specific tests (when working on finance models)dbt test --select tag:finance,test_type:unitA common CI strategy:
- PR checks: Run
tag:criticaltests only (fast feedback, 1-2 minutes) - Merge to main: Run all unit tests (comprehensive validation, 5-10 minutes)
- Production deployment: Run data tests (validate actual data, separate from unit tests)
Pattern Library Quick Reference
| Pattern | Key Technique | Example |
|---|---|---|
| Incremental full-refresh | is_incremental: false | Section 1 |
| Incremental merge | this + is_incremental: true | Section 1 |
| SCD2 date ranges | Test derived models | Section 2 |
| Window functions | Ordered input rows | Section 3 |
| All CASE branches | Boundary values | Section 3 |
| Empty tables | format: sql with WHERE false | Section 5 |
| Null handling | Explicit null rows | Section 5 |
| Macro testing | Ephemeral wrapper model | Section 6 |
Key Takeaways
Unit testing dbt models isn’t about achieving 100% coverage—it’s about testing the logic that matters most. Focus your effort on:
- Incremental models: Test both full-refresh and incremental paths. The
expectblock shows what gets inserted, not the final table state. - Window functions: Design test data that validates partitioning, ordering, and framing. Use out-of-order source rows to verify sorting works correctly.
- Business logic: Test boundary values in CASE WHEN statements. Test null handling explicitly. Document expected behavior through tests.
- Marketing analytics: Attribution, sessionization, and funnels are high-stakes models where bugs directly impact business decisions. Invest in comprehensive tests.
- Edge cases: Empty tables, null values, and date boundaries cause real production issues. Unit tests prove your model handles them gracefully.
The patterns in this article are starting points. Adapt them to your specific models and business logic. The best test is one that catches a bug before it reaches production.
What’s Next
You now have a library of patterns for unit testing real-world dbt models. But unit tests are just one tool in the testing toolkit. They validate transformation logic with mocked data—but what about validating the actual data in your warehouse?
Part 3 zooms out to the strategic level: when should you use unit tests versus data tests? What about external packages like dbt-expectations and Elementary? We’ll build a decision framework to help you choose the right testing approach for each scenario—and avoid both under-testing and over-testing.