Unit Testing Incremental Models in dbt

Incremental models have two code paths — one for full refresh, another for normal incremental loads — and teams typically test only one. The full refresh path is tested because that’s what dbt run does during development.

A common failure mode: the model works in dev and on full refresh, but a WHERE clause bug in the incremental path produces duplicates that accumulate across months of daily runs before anyone notices.

Unit tests catch these bugs before deployment by simulating both execution paths with controlled test data.

The Dual-Mode Testing Pattern

Every incremental model needs at least two unit tests: one for full refresh mode, one for incremental mode. This isn’t optional. If you’re only testing one mode, you’re leaving half your logic untested.

-- models/intermediate/int__events_processed.sql
{{ config(materialized='incremental', unique_key='event_id') }}

select
    event_id,
    user_id,
    event_type,
    event_timestamp,
    processed_at
from {{ ref('base__ga4__events') }}

{% if is_incremental() %}
where event_timestamp > (select max(event_timestamp) from {{ this }})
{% endif %}

The {% if is_incremental() %} block is where bugs hide. Common mistakes include off-by-one errors (using >= vs >), timezone mismatches, and incorrect column references.

Testing Full Refresh

unit_tests:
  - name: test_int_events_processed_full_refresh
    model: int__events_processed
    description: "On full refresh, all source rows should be processed"
    overrides:
      macros:
        is_incremental: false
    given:
      - input: ref('base__ga4__events')
        rows:
          - {event_id: 1, user_id: 100, event_type: "click", event_timestamp: "2024-06-01 10:00:00"}
          - {event_id: 2, user_id: 100, event_type: "purchase", event_timestamp: "2024-06-02 11:00:00"}
          - {event_id: 3, user_id: 101, event_type: "click", event_timestamp: "2024-06-03 12:00:00"}
    expect:
      rows:
        - {event_id: 1, user_id: 100, event_type: "click"}
        - {event_id: 2, user_id: 100, event_type: "purchase"}
        - {event_id: 3, user_id: 101, event_type: "click"}

Setting is_incremental: false in overrides simulates dbt run --full-refresh. All three events appear in the output — no filtering happens.

Testing Incremental Mode

unit_tests:
  - name: test_int_events_processed_incremental
    model: int__events_processed
    description: "On incremental run, only new events should be inserted"
    overrides:
      macros:
        is_incremental: true
    given:
      - input: ref('base__ga4__events')
        rows:
          - {event_id: 1, user_id: 100, event_type: "click", event_timestamp: "2024-06-01 10:00:00"}
          - {event_id: 2, user_id: 100, event_type: "purchase", event_timestamp: "2024-06-02 11:00:00"}
          - {event_id: 3, user_id: 101, event_type: "click", event_timestamp: "2024-06-03 12:00:00"}
      - input: this
        rows:
          - {event_id: 1, user_id: 100, event_type: "click", event_timestamp: "2024-06-01 10:00:00"}
    expect:
      rows:
        - {event_id: 2, user_id: 100, event_type: "purchase"}
        - {event_id: 3, user_id: 101, event_type: "click"}

Setting is_incremental: true forces incremental mode. The input: this mocks the current state of the target table — it contains one event with timestamp 2024-06-01 10:00:00. The model’s WHERE clause filters to events after that timestamp, so only events 2 and 3 appear.

The Critical Insight: Expect Shows Inserts, Not Final State

This trips up many developers. The expect block represents what gets inserted or merged, not the final table state. In the incremental test above, event_id 1 already exists in this, so only events 2 and 3 appear in the expected output. You’re testing the transformation logic, not the merge operation itself.

Testing Merge Logic

The merge_update_columns config adds complexity. When a row with an existing unique_key arrives, BigQuery updates only the specified columns, leaving others unchanged.

{{ config(
    materialized='incremental',
    unique_key='user_id',
    merge_update_columns=['email', 'updated_at']
) }}

unit_tests:
  - name: test_int_users_current_merge_update
    model: int__users_current
    description: "Existing users should have email updated, not created_at"
    overrides:
      macros:
        is_incremental: true
    given:
      - input: ref('base__crm__users')
        rows:
          - {user_id: 1, email: "new@example.com", created_at: "2024-06-15", updated_at: "2024-06-15"}
      - input: this
        rows:
          - {user_id: 1, email: "old@example.com", created_at: "2024-01-01", updated_at: "2024-01-01"}
    expect:
      rows:
        - {user_id: 1, email: "new@example.com", created_at: "2024-06-15", updated_at: "2024-06-15"}

Important caveat: this test validates that your transformation logic produces the correct output row. It doesn’t test the merge behavior itself — that’s BigQuery’s responsibility. The expected output shows created_at: "2024-06-15" because that’s what the source data contains. In practice, the actual merge would preserve the original created_at: "2024-01-01" from this. For more complex merge scenarios, consider dbt-audit-helper to compare actual results after a test run.

Testing Late-Arriving Data

Real-world pipelines rarely receive events in perfect chronological order. A common pattern uses a lookback window: instead of only processing events newer than max(event_timestamp), you also include events that arrived recently (based on a _loaded_at column) even if their event timestamp is older.

unit_tests:
  - name: test_int_events_processed_late_arriving
    model: int__events_processed
    description: "Late-arriving events within lookback window should be captured"
    overrides:
      macros:
        is_incremental: true
      vars:
        lookback_hours: 24
    given:
      - input: ref('base__ga4__events')
        rows:
          - {event_id: 4, event_timestamp: "2024-06-01 08:00:00", _loaded_at: "2024-06-02 10:00:00"}
          - {event_id: 5, event_timestamp: "2024-06-02 09:00:00", _loaded_at: "2024-06-02 09:05:00"}
      - input: this
        rows:
          - {event_id: 1, event_timestamp: "2024-06-01 10:00:00"}
    expect:
      rows:
        - {event_id: 4, event_timestamp: "2024-06-01 08:00:00"}
        - {event_id: 5, event_timestamp: "2024-06-02 09:00:00"}

Event 4 has an old timestamp (June 1, 08:00) but a recent load time (June 2, 10:00) — it arrived late. A naive WHERE event_timestamp > max(event_timestamp) would miss it. The lookback window ensures it gets captured. See Testing Late-Arriving Data Handling in dbt for deeper coverage of this testing pattern.

Prerequisite: Schema Must Exist

Before running incremental unit tests, build your incremental models with:

dbt run --select "config.materialized:incremental" --empty

Unit tests need to introspect the table structure. They’ll fail with “Not able to get columns for unit test” if the target table doesn’t exist. This creates empty tables with the right schema without processing any data.

Notes on the Dual-Mode Pattern

Incremental merge bugs accumulate silently across runs. The dual-mode pattern — testing both is_incremental: false and is_incremental: true with explicit this mocking — catches these bugs before deployment, when the bad state is still recoverable.