Unit Testing dbt Models: Real-World Examples and Patterns

This is Part 2 of a 3-part series on dbt unit testing. Building on the syntax from Part 1, this article covers practical patterns for incremental models, snapshots, complex SQL, and marketing analytics.

Part 1 covered the fundamentals: YAML syntax, BigQuery quirks, and CI/CD setup. Now let’s apply that knowledge to the models that actually keep you up at night: incremental logic, window functions, attribution models, and all the edge cases that slip through code review.

This article is a pattern library. Each section tackles a specific scenario with copy-paste-ready examples you can adapt to your own models. But more importantly, it explains why each pattern matters and how to think about designing these tests. Because copy-pasting YAML without understanding what you’re testing is how you end up with a green CI pipeline and broken dashboards.

Testing Incremental Models

Incremental models are the most common source of subtle bugs in dbt projects. Why? Because they have two completely different code paths—one that runs on a full refresh, another that runs during normal incremental loads—and teams often test only one of them (usually the full refresh path, since that’s what dbt run does during development).

The typical failure mode looks like this: You develop your incremental model, it works perfectly in dev. You deploy to production, run a full refresh, everything looks great. Three months later, someone notices duplicates appearing in downstream reports. The incremental logic had a bug that only manifested after months of daily runs, when a subtle edge case in your WHERE clause finally hit production data.

Unit tests let you catch these bugs before deployment by simulating both execution paths with controlled test data.

The Dual-Mode Testing Approach

Every incremental model needs at least two unit tests: one for full refresh mode, one for incremental mode. This isn’t optional—if you’re only testing one mode, you’re leaving half your logic untested.

-- models/intermediate/int__events_processed.sql
{{ config(materialized='incremental', unique_key='event_id') }}

select
    event_id,
    user_id,
    event_type,
    event_timestamp,
    processed_at
from {{ ref('base__ga4__events') }}

{% if is_incremental() %}
where event_timestamp > (select max(event_timestamp) from {{ this }})
{% endif %}

This model has two behaviors:

Full refresh: Process all events from the source table
Incremental: Only process events newer than the most recent event already in the target table

The {% if is_incremental() %} block is where bugs hide. Common mistakes include off-by-one errors (using >= vs >), timezone mismatches, and incorrect column references. Let’s write tests that would catch these issues.

Testing both modes:

unit_tests:
  # Test 1: Full refresh behavior
  - name: test_int_events_processed_full_refresh
    model: int__events_processed
    description: "On full refresh, all source rows should be processed"
    overrides:
      macros:
        is_incremental: false
    given:
      - input: ref('base__ga4__events')
        rows:
          - {event_id: 1, user_id: 100, event_type: "click", event_timestamp: "2024-06-01 10:00:00"}
          - {event_id: 2, user_id: 100, event_type: "purchase", event_timestamp: "2024-06-02 11:00:00"}
          - {event_id: 3, user_id: 101, event_type: "click", event_timestamp: "2024-06-03 12:00:00"}
    expect:
      rows:
        - {event_id: 1, user_id: 100, event_type: "click"}
        - {event_id: 2, user_id: 100, event_type: "purchase"}
        - {event_id: 3, user_id: 101, event_type: "click"}

  # Test 2: Incremental behavior
  - name: test_int_events_processed_incremental
    model: int__events_processed
    description: "On incremental run, only new events should be inserted"
    overrides:
      macros:
        is_incremental: true
    given:
      - input: ref('base__ga4__events')
        rows:
          - {event_id: 1, user_id: 100, event_type: "click", event_timestamp: "2024-06-01 10:00:00"}
          - {event_id: 2, user_id: 100, event_type: "purchase", event_timestamp: "2024-06-02 11:00:00"}
          - {event_id: 3, user_id: 101, event_type: "click", event_timestamp: "2024-06-03 12:00:00"}
      - input: this  # Current state of the target table
        rows:
          - {event_id: 1, user_id: 100, event_type: "click", event_timestamp: "2024-06-01 10:00:00"}
    expect:
      # Only rows NEWER than max(event_timestamp) in `this`
      rows:
        - {event_id: 2, user_id: 100, event_type: "purchase"}
        - {event_id: 3, user_id: 101, event_type: "click"}

Let’s break down what’s happening in these tests:

Test 1 (Full Refresh):

We set is_incremental: false in the overrides, simulating a dbt run --full-refresh
We provide three events in the source table
We expect all three events to appear in the output—no filtering happens

Test 2 (Incremental):

We set is_incremental: true, simulating a normal incremental run
We provide the same three source events, plus we mock the current state of the target table using input: this
The this table contains one event with timestamp 2024-06-01 10:00:00
Our model’s WHERE clause filters to events after that timestamp
Only events 2 and 3 (with timestamps on June 2 and 3) should be processed

Critical insight: The expect block represents what gets inserted or merged, not the final table state. In the incremental test above, event_id 1 already exists in this, so only events 2 and 3 appear in the expected output. This trips up many developers who expect to see the merged result—but unit tests validate the transformation logic, not the merge operation itself.

Testing Merge Logic

The merge_update_columns config adds another layer of complexity. When a row with an existing unique_key arrives, BigQuery will update only the specified columns—leaving others unchanged. Understanding which incremental strategy you’re using is critical here. This is useful for slowly changing dimension patterns, but it’s also a common source of confusion.

Consider this scenario: You have a users table where you want to update the email when it changes, but preserve the original created_at timestamp. The merge logic handles this, but how do you verify it works correctly?

-- models/intermediate/int__users_current.sql
{{ config(
    materialized='incremental',
    unique_key='user_id',
    merge_update_columns=['email', 'updated_at']
) }}

select
    user_id,
    email,
    created_at,
    updated_at
from {{ ref('base__crm__users') }}

{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}

unit_tests:
  - name: test_int_users_current_merge_update
    model: int__users_current
    description: "Existing users should have email updated, not created_at"
    overrides:
      macros:
        is_incremental: true
    given:
      - input: ref('base__crm__users')
        rows:
          - {user_id: 1, email: "new@example.com", created_at: "2024-06-15", updated_at: "2024-06-15"}
      - input: this
        rows:
          - {user_id: 1, email: "old@example.com", created_at: "2024-01-01", updated_at: "2024-01-01"}
    expect:
      rows:
        - {user_id: 1, email: "new@example.com", created_at: "2024-06-15", updated_at: "2024-06-15"}

Important caveat: This test validates that your transformation logic produces the correct output row. It doesn’t actually test the merge behavior—that’s BigQuery’s responsibility. What you’re testing here is: “Given this source data and this existing state, does my model produce the right row to be merged?”

The expected output shows created_at: "2024-06-15" because that’s what the source data contains. In practice, the actual merge would preserve the original created_at: "2024-01-01" from the this table. If you need to test more complex merge scenarios, consider using dbt-audit-helper to compare actual results after a test run.

Testing Late-Arriving Data

Real-world data pipelines rarely receive events in perfect chronological order. Mobile apps batch events during offline periods. Third-party APIs have processing delays. CDC systems can replay historical changes. Your incremental logic needs to handle these “late arrivals” gracefully.

A common pattern uses a lookback window: instead of only processing events newer than max(event_timestamp), you also include events that arrived recently (based on a _loaded_at column) even if their event timestamp is older. This ensures you don’t miss data that was delayed in transit.

unit_tests:
  - name: test_int_events_processed_late_arriving
    model: int__events_processed
    description: "Late-arriving events within lookback window should be captured"
    overrides:
      macros:
        is_incremental: true
      vars:
        lookback_hours: 24
    given:
      - input: ref('base__ga4__events')
        rows:
          # Event arrived late but timestamp is old
          - {event_id: 4, event_timestamp: "2024-06-01 08:00:00", _loaded_at: "2024-06-02 10:00:00"}
          # Normal new event
          - {event_id: 5, event_timestamp: "2024-06-02 09:00:00", _loaded_at: "2024-06-02 09:05:00"}
      - input: this
        rows:
          - {event_id: 1, event_timestamp: "2024-06-01 10:00:00"}
    expect:
      rows:
        - {event_id: 4, event_timestamp: "2024-06-01 08:00:00"}
        - {event_id: 5, event_timestamp: "2024-06-02 09:00:00"}

In this test:

Event 4 has an old timestamp (June 1, 08:00) but a recent load time (June 2, 10:00)—it arrived late
Event 5 is a normal new event with both recent timestamp and load time
The this table shows we’ve already processed up to June 1, 10:00
A naive WHERE event_timestamp > max(event_timestamp) would miss event 4
The lookback window (controlled by the lookback_hours var) ensures we catch it

This test verifies that your late-arrival handling works correctly. Without it, you might not notice missing events until a downstream analyst asks why their numbers don’t match the source system.

Prerequisite: Before running incremental unit tests, build your incremental models with dbt run --select "config.materialized:incremental" --empty to ensure the schema exists. Unit tests need to introspect the table structure, and they’ll fail with “Not able to get columns for unit test” if the target table doesn’t exist.

Testing Snapshot/SCD Type 2 Logic

Snapshots (Slowly Changing Dimension Type 2) track historical changes to your data. When a customer’s email changes, instead of overwriting the old value, dbt creates a new row with validity timestamps—preserving the complete change history.

Here’s the challenge: native dbt unit tests don’t directly support snapshot resources. You can’t write a unit test with model: my_snapshot because snapshots aren’t models—they’re a separate resource type with their own execution path.

This limitation makes sense when you think about it. Snapshots detect changes over time, comparing the current source state to the previous snapshot state. Unit tests run in isolation with mocked data—there’s no “previous state” to compare against.

So how do you test snapshot-related logic? Three strategies:

Test the models that feed snapshots — ensure your staging layer produces correct data for the snapshot to consume
Test the models that consume snapshot output — validate your SCD2 date range calculations and is_current flags
Test change detection logic — if you use hash-based change detection, verify the hashing works correctly

Testing Pre-Snapshot Base Models

The transformation that prepares data for a snapshot is just as important as the snapshot itself. If your staging model has bugs, the snapshot will faithfully preserve those bugs forever.

unit_tests:
  - name: test_base_crm_users_snapshot_ready
    model: base__crm__users
    description: "Base model should correctly prepare data for snapshot"
    given:
      - input: source('crm', 'users')
        rows:
          - {id: 1, name: "Alice", email: "alice@example.com", _updated_at: "2024-06-01"}
          - {id: 1, name: "Alice Smith", email: "alice@example.com", _updated_at: "2024-06-15"}
    expect:
      rows:
        - {user_id: 1, user_name: "Alice", email: "alice@example.com", updated_at: "2024-06-01"}
        - {user_id: 1, user_name: "Alice Smith", email: "alice@example.com", updated_at: "2024-06-15"}

This test verifies that:

Raw column names are correctly transformed (id → user_id, name → user_name)
The updated_at column (which the snapshot will use for change detection) is properly exposed
Multiple versions of the same user are preserved (critical for SCD2 to work correctly)

Testing SCD2 Date Range Logic

After the snapshot runs, downstream models typically consume the snapshot output and calculate derived fields like valid_to dates and is_current flags. This is where window functions come into play—and where bugs love to hide.

If you have a model that consumes snapshot data and calculates validity periods:

-- models/intermediate/int__users_history.sql
select
    user_id,
    user_name,
    valid_from,
    coalesce(
        lead(valid_from) over (partition by user_id order by valid_from),
        '2199-12-31'
    ) as valid_to,
    case
        when lead(valid_from) over (partition by user_id order by valid_from) is null
        then true
        else false
    end as is_current
from {{ ref('snp_users') }}

unit_tests:
  - name: test_int_users_history_date_ranges
    model: int__users_history
    description: "Valid_to should be the next version's valid_from, or 2199-12-31 for current"
    given:
      - input: ref('snp_users')
        rows:
          - {user_id: 1, user_name: "Alice", valid_from: "2024-01-01"}
          - {user_id: 1, user_name: "Alice Smith", valid_from: "2024-06-15"}
          - {user_id: 2, user_name: "Bob", valid_from: "2024-03-01"}
    expect:
      rows:
        - {user_id: 1, user_name: "Alice", valid_from: "2024-01-01", valid_to: "2024-06-15", is_current: false}
        - {user_id: 1, user_name: "Alice Smith", valid_from: "2024-06-15", valid_to: "2199-12-31", is_current: true}
        - {user_id: 2, user_name: "Bob", valid_from: "2024-03-01", valid_to: "2199-12-31", is_current: true}

The test verifies several things at once:

Date range calculation: Alice’s first record should have valid_to = "2024-06-15" (the next version’s valid_from), while her second record uses the far-future sentinel date
Current flag logic: Only the most recent version of each user should have is_current: true
Single-version handling: Bob has only one record, so he’s automatically current with the sentinel valid_to
Partition isolation: Alice’s records don’t affect Bob’s calculations (each user_id is a separate partition)

This is the kind of logic that’s nearly impossible to verify through code review alone. The window function partitioning, ordering, and frame specifications all interact in ways that only reveal bugs with actual data.

Testing Change Detection

Some teams implement their own change detection logic using row-level hashes, especially when working with sources that don’t have reliable updated_at timestamps. The idea is simple: concatenate all relevant columns, hash the result, and compare hashes between runs to detect changes.

The tricky part? Hash functions are sensitive to column ordering, null handling, and data type casting. A change that seems cosmetic (reordering columns in your hash) can completely break change detection.

unit_tests:
  - name: test_change_detection_hash
    model: base__crm__users_with_hash
    description: "Row hash should change when any tracked column changes"
    given:
      - input: source('crm', 'users')
        rows:
          - {id: 1, name: "Alice", email: "alice@example.com"}
          - {id: 2, name: "Bob", email: "bob@example.com"}
    expect:
      rows:
        - {user_id: 1, row_hash: "abc123"}  # Calculate the actual expected hash
        - {user_id: 2, row_hash: "def456"}

To write this test, you’ll need to calculate the expected hash values manually (or run your model once and capture the outputs). The test then becomes a regression safeguard—if someone accidentally modifies the hash logic, the test will catch it.

Pro tip: Consider adding a second test case where you change one column value and verify the hash changes appropriately:

  - name: test_change_detection_hash_sensitivity
    model: base__crm__users_with_hash
    description: "Hash should differ when email changes"
    given:
      - input: source('crm', 'users')
        rows:
          - {id: 1, name: "Alice", email: "alice@example.com"}
          - {id: 1, name: "Alice", email: "alice.new@example.com"}
    expect:
      rows:
        - {user_id: 1, row_hash: "abc123"}  # Original hash
        - {user_id: 1, row_hash: "xyz789"}  # Different hash (email changed)

Testing Complex SQL Patterns

Some SQL constructs are inherently difficult to reason about through code review. Window functions, complex CASE WHEN chains, and regex patterns all benefit from concrete examples that demonstrate correct behavior.

Window Functions

Window functions are prime candidates for unit testing because their behavior depends on three interacting factors: partitioning (which rows belong together), ordering (sequence within each partition), and framing (which rows are included in the calculation). Get any one of these wrong and your results will be subtly incorrect in ways that are hard to spot.

Consider this model that calculates session-level metrics:

-- models/intermediate/int__users_sessions.sql
select
    user_id,
    event_timestamp,
    row_number() over (partition by user_id order by event_timestamp) as session_event_number,
    first_value(page_path) over (partition by user_id order by event_timestamp) as landing_page,
    sum(page_views) over (partition by user_id order by event_timestamp rows unbounded preceding) as cumulative_page_views
from {{ ref('base__ga4__events') }}

unit_tests:
  - name: test_int_users_sessions_window_functions
    model: int__users_sessions
    description: "Window functions should correctly partition and order by user"
    given:
      - input: ref('base__ga4__events')
        rows:
          # User 1's events (out of order in source)
          - {user_id: 1, event_timestamp: "2024-06-01 10:05:00", page_path: "/products", page_views: 2}
          - {user_id: 1, event_timestamp: "2024-06-01 10:00:00", page_path: "/home", page_views: 1}
          - {user_id: 1, event_timestamp: "2024-06-01 10:10:00", page_path: "/cart", page_views: 1}
          # User 2's events
          - {user_id: 2, event_timestamp: "2024-06-01 11:00:00", page_path: "/about", page_views: 3}
    expect:
      rows:
        # User 1 - ordered by timestamp
        - {user_id: 1, event_timestamp: "2024-06-01 10:00:00", session_event_number: 1, landing_page: "/home", cumulative_page_views: 1}
        - {user_id: 1, event_timestamp: "2024-06-01 10:05:00", session_event_number: 2, landing_page: "/home", cumulative_page_views: 3}
        - {user_id: 1, event_timestamp: "2024-06-01 10:10:00", session_event_number: 3, landing_page: "/home", cumulative_page_views: 4}
        # User 2
        - {user_id: 2, event_timestamp: "2024-06-01 11:00:00", session_event_number: 1, landing_page: "/about", cumulative_page_views: 3}

Notice how the test data is deliberately designed:

Out-of-order source rows: User 1’s events are listed out of chronological order in the given block (10:05, 10:00, 10:10). This verifies that the ORDER BY event_timestamp actually works—if you accidentally order by event_id instead, the test would fail.
Multiple users: Including User 2 confirms that partitioning works correctly. User 1’s events shouldn’t affect User 2’s session_event_number or landing_page.
Cumulative calculation: The page_views values (1, 2, 1, 3) are chosen to make the running total easy to verify. If the frame clause is wrong (say, missing ROWS UNBOUNDED PRECEDING), the numbers won’t add up.

This single test verifies ROW_NUMBER, FIRST_VALUE, and SUM with a window frame—all in one pass. If any of the window specifications are wrong, at least one expected value will be incorrect.

Complex CASE WHEN Logic

Business logic often lives in CASE WHEN statements. Customer segmentation, pricing tiers, status calculations—these multi-branch conditionals are deceptively simple to write and surprisingly easy to get wrong.

The most common bugs:

Overlapping conditions: Two branches that can both match (evaluation order matters)
Missing boundary values: Off-by-one errors at tier thresholds
Null handling: Forgetting that NULL >= 1000 returns NULL, not FALSE
Implicit ELSE: Relying on the default ELSE NULL when you meant to handle all cases explicitly

The solution is systematic boundary testing—create test rows that hit every threshold:

-- models/marts/core/mrt__core__customer_segments.sql
select
    customer_id,
    total_spend,
    case
        when total_spend >= 10000 then 'platinum'
        when total_spend >= 5000 then 'gold'
        when total_spend >= 1000 then 'silver'
        when total_spend > 0 then 'bronze'
        else 'inactive'
    end as segment
from {{ ref('int__customers_summary') }}

unit_tests:
  - name: test_mrt_core_customer_segments_all_tiers
    model: mrt__core__customer_segments
    description: "Each spend threshold should map to correct segment"
    given:
      - input: ref('int__customers_summary')
        rows:
          - {customer_id: 1, total_spend: 15000}   # Platinum
          - {customer_id: 2, total_spend: 10000}   # Platinum (boundary)
          - {customer_id: 3, total_spend: 9999}    # Gold (just under)
          - {customer_id: 4, total_spend: 5000}    # Gold (boundary)
          - {customer_id: 5, total_spend: 1000}    # Silver (boundary)
          - {customer_id: 6, total_spend: 1}       # Bronze
          - {customer_id: 7, total_spend: 0}       # Inactive
          - {customer_id: 8, total_spend: null}    # Inactive (null case)
    expect:
      rows:
        - {customer_id: 1, segment: "platinum"}
        - {customer_id: 2, segment: "platinum"}
        - {customer_id: 3, segment: "gold"}
        - {customer_id: 4, segment: "gold"}
        - {customer_id: 5, segment: "silver"}
        - {customer_id: 6, segment: "bronze"}
        - {customer_id: 7, segment: "inactive"}
        - {customer_id: 8, segment: "inactive"}

The test data is intentionally comprehensive:

Boundary values: Customers 2, 4, and 5 test exact threshold matches (10000, 5000, 1000). If someone uses > instead of >=, these will fail.
Just-under values: Customer 3 tests 9999 to ensure gold (not platinum) kicks in
Edge cases: Customer 6 tests the smallest positive value, customer 7 tests zero
Null handling: Customer 8 verifies that null spend maps to “inactive” rather than causing an error or unexpected behavior

This approach—testing boundaries, not just typical values—catches the most common bugs in conditional logic.

Regex and String Manipulation

String extraction logic is notoriously fragile. Regex patterns that work for your typical data often fail on edge cases. Unit tests are essential for documenting expected behavior and catching regressions.

unit_tests:
  - name: test_mrt_core_customers_email_domain_extraction
    model: mrt__core__customers
    description: "Email domain extraction handles various formats"
    given:
      - input: ref('base__crm__customers')
        rows:
          - {customer_id: 1, email: "alice@example.com"}
          - {customer_id: 2, email: "bob.smith@subdomain.company.co.uk"}
          - {customer_id: 3, email: "CAPS@DOMAIN.COM"}
          - {customer_id: 4, email: "invalid-email"}
          - {customer_id: 5, email: null}
    expect:
      rows:
        - {customer_id: 1, email_domain: "example.com"}
        - {customer_id: 2, email_domain: "subdomain.company.co.uk"}
        - {customer_id: 3, email_domain: "domain.com"}
        - {customer_id: 4, email_domain: null}
        - {customer_id: 5, email_domain: null}

Each test row serves a specific purpose:

Simple case: alice@example.com — baseline behavior
Complex domain: subdomain.company.co.uk — multiple dots, TLD edge case
Case handling: CAPS@DOMAIN.COM — should normalize to lowercase
Invalid input: invalid-email — no @ symbol, should gracefully return null rather than error
Null input: null — explicit null handling

Without these tests, someone might “simplify” your regex and break the subdomain case, or change the null handling without realizing downstream models depend on it.

Marketing Analytics Testing Patterns

Marketing analytics models often contain the most complex—and highest-stakes—logic in a dbt project. Attribution determines how marketing budget gets allocated. Sessionization affects conversion rate calculations. Funnel analysis drives product decisions. When these models have bugs, the business makes decisions based on faulty data.

The patterns in this section come from real-world implementations. They’re more complex than basic unit tests, but they test logic that genuinely matters.

GA4 Sessionization

Google Analytics 4 sends raw events to BigQuery, but those events lack session-level aggregations. You need to build sessionization logic yourself—grouping events by user, detecting session boundaries (typically 30 minutes of inactivity), and calculating metrics like session duration and event counts.

This logic involves:

Timestamp arithmetic (using timestamp_micros() to convert GA4’s microsecond timestamps)
Window functions to calculate time gaps between events
Session boundary detection based on configurable inactivity thresholds
Aggregation to create session-level records from event-level data

Let’s test it:

-- models/intermediate/int__ga4_sessions.sql
with events as (
    select
        user_pseudo_id,
        ga_session_id,
        event_timestamp,
        event_name,
        -- Calculate time since previous event
        timestamp_diff(
            timestamp_micros(event_timestamp),
            lag(timestamp_micros(event_timestamp)) over (
                partition by user_pseudo_id
                order by event_timestamp
            ),
            minute
        ) as minutes_since_last_event
    from {{ ref('base__ga4__events') }}
),

sessionized as (
    select
        *,
        -- New session if gap > 30 minutes or first event
        case
            when minutes_since_last_event > 30 or minutes_since_last_event is null
            then 1
            else 0
        end as is_new_session
    from events
)

select
    user_pseudo_id,
    ga_session_id,
    concat(user_pseudo_id, '_', ga_session_id) as session_key,
    min(event_timestamp) as session_start,
    max(event_timestamp) as session_end,
    timestamp_diff(
        timestamp_micros(max(event_timestamp)),
        timestamp_micros(min(event_timestamp)),
        second
    ) as session_duration_seconds,
    count(*) as event_count
from sessionized
group by 1, 2, 3

unit_tests:
  - name: test_int_ga4_sessions_boundaries
    model: int__ga4_sessions
    description: "Sessions should break after 30 minutes of inactivity"
    given:
      - input: ref('base__ga4__events')
        rows:
          # User 1, Session 1: events within 30 min
          - {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717200000000000, event_name: "page_view"}
          - {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717201800000000, event_name: "scroll"}  # +30 min
          # User 1, Session 2: gap > 30 min
          - {user_pseudo_id: "user_1", ga_session_id: 1002, event_timestamp: 1717207200000000, event_name: "page_view"}  # +90 min from first
          # User 2, Session 1
          - {user_pseudo_id: "user_2", ga_session_id: 2001, event_timestamp: 1717200000000000, event_name: "page_view"}
    expect:
      rows:
        - {session_key: "user_1_1001", session_duration_seconds: 1800, event_count: 2}
        - {session_key: "user_1_1002", session_duration_seconds: 0, event_count: 1}
        - {session_key: "user_2_2001", session_duration_seconds: 0, event_count: 1}

The first test verifies core sessionization behavior:

Session 1001: Two events 30 minutes apart (exactly at the boundary) belong to the same session, resulting in 1800 seconds duration
Session 1002: A third event 90 minutes after the first triggers a new session (gap > 30 min from session 1001’s last event)
Session 2001: User 2’s single event creates a session with 0-second duration

The microsecond timestamps (1717200000000000) are GA4’s native format. You’ll need to calculate these values based on your test scenarios.

  - name: test_int_ga4_sessions_cross_midnight
    model: int__ga4_sessions
    description: "Sessions spanning midnight should not break artificially"
    given:
      - input: ref('base__ga4__events')
        rows:
          - {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717199400000000, event_name: "page_view"}  # 23:50
          - {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717200600000000, event_name: "purchase"}   # 00:10 next day
    expect:
      rows:
        - {session_key: "user_1_1001", session_duration_seconds: 1200, event_count: 2}  # 20 minutes

The cross-midnight test catches a subtle bug: some implementations accidentally break sessions at date boundaries (because they partition by date instead of by user). A user browsing at 11:50 PM who purchases at 12:10 AM should have a single 20-minute session, not two separate sessions.

Attribution Modeling

Attribution models answer the question: “Which marketing channels deserve credit for this conversion?” The answer determines millions of dollars in ad spend.

Common attribution models:

First-touch: Credit goes to the channel that first acquired the customer
Last-touch: Credit goes to the channel immediately before conversion
Linear: Credit is split equally among all touchpoints
Time-decay: More credit to touchpoints closer to conversion

Each model uses window functions to identify the relevant touchpoint. Let’s test first-touch and last-touch:

-- models/marts/marketing/mrt__marketing__customer_attribution.sql
select
    customer_id,
    -- First touch
    first_value(utm_source) over (
        partition by customer_id
        order by session_start
    ) as first_touch_source,
    first_value(utm_medium) over (
        partition by customer_id
        order by session_start
    ) as first_touch_medium,
    -- Last touch (before conversion)
    last_value(utm_source) over (
        partition by customer_id
        order by session_start
        rows between unbounded preceding and unbounded following
    ) as last_touch_source,
    -- Conversion info
    max(case when converted then session_start end) as conversion_timestamp
from {{ ref('int__customers_sessions') }}
where converted = true or session_start <= (
    select min(session_start)
    from {{ ref('int__customers_sessions') }} cs2
    where cs2.customer_id = int__customers_sessions.customer_id and cs2.converted
)
group by customer_id

unit_tests:
  - name: test_mrt_marketing_customer_attribution_first_last_touch
    model: mrt__marketing__customer_attribution
    description: "First touch captures initial source, last touch captures converting source"
    given:
      - input: ref('int__customers_sessions')
        rows:
          # Customer journey: Facebook -> Google -> Direct (conversion)
          - {customer_id: 100, session_start: "2024-06-01 10:00:00", utm_source: "facebook", utm_medium: "paid", converted: false}
          - {customer_id: 100, session_start: "2024-06-05 14:00:00", utm_source: "google", utm_medium: "organic", converted: false}
          - {customer_id: 100, session_start: "2024-06-10 09:00:00", utm_source: "direct", utm_medium: "none", converted: true}
          # Single-touch conversion
          - {customer_id: 101, session_start: "2024-06-02 11:00:00", utm_source: "email", utm_medium: "newsletter", converted: true}
    expect:
      rows:
        - {customer_id: 100, first_touch_source: "facebook", first_touch_medium: "paid", last_touch_source: "direct"}
        - {customer_id: 101, first_touch_source: "email", first_touch_medium: "newsletter", last_touch_source: "email"}

This test verifies two scenarios:

Customer 100’s multi-touch journey:

First encounter via Facebook paid ads (June 1)
Returns via Google organic search (June 5)
Converts via direct visit (June 10)

The expected output shows first-touch credit to Facebook (where the journey began) and last-touch credit to Direct (the final touchpoint before conversion). If your FIRST_VALUE or LAST_VALUE window functions have incorrect ordering or framing, this test will catch it.

Customer 101’s single-touch conversion: When there’s only one session before conversion, both first-touch and last-touch should be the same channel. This edge case often breaks when implementations assume at least two touchpoints.

  - name: test_mrt_marketing_customer_attribution_no_conversion
    model: mrt__marketing__customer_attribution
    description: "Customers without conversion should not appear"
    given:
      - input: ref('int__customers_sessions')
        rows:
          - {customer_id: 200, session_start: "2024-06-01 10:00:00", utm_source: "facebook", utm_medium: "paid", converted: false}
          - {customer_id: 200, session_start: "2024-06-05 14:00:00", utm_source: "google", utm_medium: "organic", converted: false}
    expect:
      rows: []

The no-conversion test is equally important. Attribution only makes sense for customers who actually converted. Customer 200 had sessions but never purchased—they should be excluded from the attribution model entirely. An empty expect block with rows: [] verifies this behavior.

Conversion Funnels

Funnel analysis tracks how users progress through a defined sequence of steps—page view → add to cart → checkout → purchase. At each step, some users drop off. The conversion rate between steps is a key metric for identifying friction points.

The logic involves:

Counting distinct users at each funnel step
Calculating step-over-step conversion rates (what percentage of users from step N reach step N+1)
Handling users who skip steps (directly purchase without add-to-cart)

unit_tests:
  - name: test_mrt_marketing_conversion_funnel_dropoff
    model: mrt__marketing__conversion_funnel
    description: "Funnel should calculate conversion rates between steps"
    given:
      - input: ref('base__ga4__events')
        rows:
          # User 1: completes full funnel
          - {user_id: 1, event_name: "page_view", event_date: "2024-06-01"}
          - {user_id: 1, event_name: "add_to_cart", event_date: "2024-06-01"}
          - {user_id: 1, event_name: "begin_checkout", event_date: "2024-06-01"}
          - {user_id: 1, event_name: "purchase", event_date: "2024-06-01"}
          # User 2: drops at checkout
          - {user_id: 2, event_name: "page_view", event_date: "2024-06-01"}
          - {user_id: 2, event_name: "add_to_cart", event_date: "2024-06-01"}
          - {user_id: 2, event_name: "begin_checkout", event_date: "2024-06-01"}
          # User 3: drops at cart
          - {user_id: 3, event_name: "page_view", event_date: "2024-06-01"}
          - {user_id: 3, event_name: "add_to_cart", event_date: "2024-06-01"}
          # User 4: only views
          - {user_id: 4, event_name: "page_view", event_date: "2024-06-01"}
    expect:
      rows:
        - {funnel_step: "page_view", user_count: 4, step_conversion_rate: 1.0}
        - {funnel_step: "add_to_cart", user_count: 3, step_conversion_rate: 0.75}
        - {funnel_step: "begin_checkout", user_count: 2, step_conversion_rate: 0.67}
        - {funnel_step: "purchase", user_count: 1, step_conversion_rate: 0.5}

Let’s trace through the test data:

User	page_view	add_to_cart	begin_checkout	purchase
1	✓	✓	✓	✓
2	✓	✓	✓
3	✓	✓
4	✓

Calculating conversion rates:

page_view: 4 users, 100% (starting point)
add_to_cart: 3 users, 75% of page_view users (3/4)
begin_checkout: 2 users, 67% of add_to_cart users (2/3)
purchase: 1 user, 50% of checkout users (1/2)

The test verifies both user counts and conversion rates. If your model calculates conversion rate incorrectly (dividing by total users instead of previous step users, for example), this test will fail.

Edge Case Testing Strategies

Production data is messy. Nulls appear where you don’t expect them. Tables are occasionally empty. Dates hit boundary conditions. The models that survive in production are the ones designed with these edge cases in mind.

Unit tests are your tool for documenting and enforcing edge case behavior. Instead of hoping your model handles nulls gracefully, you prove it with a test.

Null Handling

SQL’s null semantics are notoriously confusing. NULL + 100 = NULL. NULL = NULL is not TRUE. SUM() ignores nulls, but AVG() doesn’t count null rows in the denominator. These behaviors lead to subtle bugs that only manifest when null values appear in production.

Test that your aggregations handle nulls correctly:

unit_tests:
  - name: test_int_customers_revenue_null_handling
    model: int__customers_revenue
    description: "Null order values should be treated as zero in sum"
    given:
      - input: ref('base__shopify__orders')
        rows:
          - {customer_id: 1, order_value: 100}
          - {customer_id: 1, order_value: null}
          - {customer_id: 1, order_value: 50}
          - {customer_id: 2, order_value: null}  # All nulls
    expect:
      rows:
        - {customer_id: 1, total_revenue: 150}
        - {customer_id: 2, total_revenue: 0}  # Or null, depending on your logic

This test clarifies expected behavior:

Customer 1 has three orders: 100, null, 50. Does SUM() return 150 (ignoring null) or null (null propagation)?
Customer 2 has only null orders. Does SUM() return 0, null, or cause an error?

The comment “Or null, depending on your logic” is intentional—your model might use COALESCE(SUM(order_value), 0) to ensure a zero instead of null. The test documents whichever behavior you’ve chosen.

Empty Tables

What happens when an upstream table is empty? Maybe it’s a new deployment with no historical data. Maybe a source system failed and delivered zero records. Your model should handle this gracefully—either returning zero rows or returning a row with default values.

The challenge is that format: dict with rows: [] doesn’t work—dbt can’t infer the column types from an empty list. The solution is format: sql with a WHERE false clause:

unit_tests:
  - name: test_mrt_finance_daily_revenue_empty_orders
    model: mrt__finance__daily_revenue
    description: "Model should return zero revenue for days with no orders"
    given:
      - input: ref('base__shopify__orders')
        format: sql
        rows: |
          select
            cast(null as string) as order_id,
            cast(null as date) as order_date,
            cast(null as float64) as order_value
          where false
      - input: ref('int__dates')
        rows:
          - {date_key: "2024-06-01"}
    expect:
      rows:
        - {date_key: "2024-06-01", daily_revenue: 0, order_count: 0}

The SQL format uses explicit casting (cast(null as string)) to define the column schema, then WHERE false ensures zero rows are returned. This gives dbt the type information it needs without providing any data.

The test verifies that when orders are empty but dates exist, the model still produces output with zero revenue—rather than returning nothing or throwing an error. This is especially important for reporting models that need to show all dates even when no activity occurred.

Date Boundaries

Date logic is a minefield. Fiscal years that don’t align with calendar years. Leap years. Week number calculations that vary by country. Time zones that shift during daylight saving transitions. These edge cases cause real bugs in production.

A fiscal calendar model is a perfect candidate for boundary testing:

unit_tests:
  - name: test_int_fiscal_calendar_boundaries
    model: int__fiscal_calendar
    description: "Fiscal year should handle year-end correctly (FY starts April 1)"
    given:
      - input: ref('int__dates')
        rows:
          - {calendar_date: "2024-03-31"}  # Last day of FY2024
          - {calendar_date: "2024-04-01"}  # First day of FY2025
          - {calendar_date: "2024-02-29"}  # Leap year
          - {calendar_date: "2024-12-31"}  # Calendar year end
    expect:
      rows:
        - {calendar_date: "2024-03-31", fiscal_year: 2024, fiscal_quarter: 4}
        - {calendar_date: "2024-04-01", fiscal_year: 2025, fiscal_quarter: 1}
        - {calendar_date: "2024-02-29", fiscal_year: 2024, fiscal_quarter: 4}
        - {calendar_date: "2024-12-31", fiscal_year: 2025, fiscal_quarter: 3}

This test assumes a fiscal year starting April 1 (common in many organizations). Let’s trace through:

March 31, 2024: Last day of FY2024 Q4 (fiscal year ends March 31)
April 1, 2024: First day of FY2025 Q1 (new fiscal year begins)
February 29, 2024: Leap year date—verifies the model handles Feb 29 correctly
December 31, 2024: Calendar year end, but mid-fiscal-year (FY2025 Q3)

If someone hardcodes a calendar year assumption or forgets to handle leap years, at least one of these test cases will fail.

Testing Macros and Reusable Logic

Macros are the functions of the dbt world—reusable logic that you call from multiple models. A discount calculation macro might be used across dozens of models. A date formatting macro might be called hundreds of times. When macro logic is wrong, the bug propagates everywhere.

The challenge: dbt unit tests target models, not macros directly. You can’t write model: my_macro in your test YAML.

The Ephemeral Model Pattern

The solution is an ephemeral wrapper model that exists solely to test the macro. Ephemeral models don’t create tables in your warehouse—they’re compiled as CTEs into downstream models. This makes them perfect for testing: they add no storage cost and exist purely for the unit test.

-- models/test_helpers/test_calculate_discount.sql
{{ config(materialized='ephemeral') }}

select
    order_value,
    customer_tier,
    {{ calculate_discount('order_value', 'customer_tier') }} as discount_amount
from {{ ref('test_source') }}

unit_tests:
  - name: test_calculate_discount_macro
    model: test_calculate_discount
    given:
      - input: ref('test_source')
        format: sql
        rows: |
          select 100.00 as order_value, 'gold' as customer_tier
          union all
          select 100.00 as order_value, 'silver' as customer_tier
          union all
          select 100.00 as order_value, 'bronze' as customer_tier
    expect:
      rows:
        - {order_value: 100.00, customer_tier: "gold", discount_amount: 20.00}
        - {order_value: 100.00, customer_tier: "silver", discount_amount: 10.00}
        - {order_value: 100.00, customer_tier: "bronze", discount_amount: 5.00}

The wrapper model does three things:

References a test source (also ephemeral) to provide input data
Calls the macro being tested (calculate_discount)
Outputs the result for comparison

The test then verifies that calculate_discount returns:

20% off ($20) for gold customers
10% off ($10) for silver customers
5% off ($5) for bronze customers

If someone modifies the macro logic, this test will catch unintended changes. Store these test helper models in a dedicated test_helpers/ directory and exclude them from production builds.

Organizing Tests at Scale

As your dbt project grows, so does your test suite. A mature project might have hundreds of unit tests across dozens of models. Without organization, running tests becomes slow and selecting relevant tests becomes tedious.

Using Tags Strategically

Tags let you categorize tests and run subsets selectively. Consider a tagging strategy based on:

Criticality: critical for tests that block deployment vs. nice-to-have for extras
Domain: finance, marketing, product to match your business domains
Test type: regression, edge-case, smoke-test

unit_tests:
  - name: test_mrt_finance_orders_revenue_calculation
    model: mrt__finance__orders
    config:
      tags: ["critical", "finance", "regression"]
    # ...

  - name: test_mrt_finance_orders_empty_cart
    model: mrt__finance__orders
    config:
      tags: ["edge-case"]
    # ...

Now you can run subsets:

# Critical tests only (for quick CI feedback on every PR)
dbt test --select tag:critical,test_type:unit

# Full regression suite (nightly or before releases)
dbt test --select tag:regression,test_type:unit

# Domain-specific tests (when working on finance models)
dbt test --select tag:finance,test_type:unit

A common CI strategy:

PR checks: Run tag:critical tests only (fast feedback, 1-2 minutes)
Merge to main: Run all unit tests (comprehensive validation, 5-10 minutes)
Production deployment: Run data tests (validate actual data, separate from unit tests)

Pattern Library Quick Reference

Pattern	Key Technique	Example
Incremental full-refresh	`is_incremental: false`	Section 1
Incremental merge	`this` + `is_incremental: true`	Section 1
SCD2 date ranges	Test derived models	Section 2
Window functions	Ordered input rows	Section 3
All CASE branches	Boundary values	Section 3
Empty tables	`format: sql` with `WHERE false`	Section 5
Null handling	Explicit null rows	Section 5
Macro testing	Ephemeral wrapper model	Section 6

Key Takeaways

Unit testing dbt models isn’t about achieving 100% coverage—it’s about testing the logic that matters most. Focus your effort on:

Incremental models: Test both full-refresh and incremental paths. The expect block shows what gets inserted, not the final table state.
Window functions: Design test data that validates partitioning, ordering, and framing. Use out-of-order source rows to verify sorting works correctly.
Business logic: Test boundary values in CASE WHEN statements. Test null handling explicitly. Document expected behavior through tests.
Marketing analytics: Attribution, sessionization, and funnels are high-stakes models where bugs directly impact business decisions. Invest in comprehensive tests.
Edge cases: Empty tables, null values, and date boundaries cause real production issues. Unit tests prove your model handles them gracefully.

The patterns in this article are starting points. Adapt them to your specific models and business logic. The best test is one that catches a bug before it reaches production.

What’s Next

You now have a library of patterns for unit testing real-world dbt models. But unit tests are just one tool in the testing toolkit. They validate transformation logic with mocked data—but what about validating the actual data in your warehouse?

Part 3 zooms out to the strategic level: when should you use unit tests versus data tests? What about external packages like dbt-expectations and Elementary? We’ll build a decision framework to help you choose the right testing approach for each scenario—and avoid both under-testing and over-testing.