Testing Late-Arriving Data Handling in dbt

Idempotency and lookback windows only work if you test them. Without explicit tests, a misconfigured lookback window silently misses late records for months before anyone notices. By then, the drift is hard to quantify and harder to fix.

There are two testing layers: unit tests that verify your model logic handles late arrivals correctly, and production comparison queries that detect drift between incremental and full-refresh results.

Unit Testing Late Arrivals with dbt 1.8+

dbt’s unit testing framework (added in dbt 1.8) lets you simulate the state of your target table and verify that your model handles late-arriving records correctly. The key is overriding is_incremental() to force the model into incremental mode during the test.

unit_tests:
  - name: test_late_arrival_captured_within_window
    model: fct__events
    overrides:
      macros:
        is_incremental: true
    given:
      - input: this
        rows:
          - {event_id: 1, event_data: "original", updated_at: "2024-01-15"}
          - {event_id: 3, event_data: "existing_record", updated_at: "2024-01-14"}
      - input: ref('base__events')
        rows:
          - {event_id: 1, event_data: "updated", updated_at: "2024-01-17"}
          - {event_id: 2, event_data: "late_arrival", updated_at: "2024-01-13"}
          - {event_id: 3, event_data: "existing_record", updated_at: "2024-01-14"}
    expect:
      rows:
        - {event_id: 1, event_data: "updated"}
        - {event_id: 2, event_data: "late_arrival"}
        - {event_id: 3, event_data: "existing_record"}

What this test verifies: event_id: 1 gets updated (its newer version replaces the existing record), event_id: 2 is a late arrival dated January 13 — before the current max of January 17 in {{ this }} — and it gets captured because it falls within the lookback window, and event_id: 3 passes through unchanged.

The is_incremental: true override is essential. Without it, the model runs in full-refresh mode and the {{ this }} reference has no meaning, making the test useless for verifying incremental behavior.

Testing Boundary Conditions

The lookback window boundary is where most bugs live. A record that arrives exactly at the boundary should be captured; one that arrives just outside should not. Test both:

unit_tests:
  - name: test_record_at_lookback_boundary_captured
    model: fct__events
    overrides:
      macros:
        is_incremental: true
    given:
      - input: this
        rows:
          # Current max is Jan 17, 3-day lookback means Jan 14 is the cutoff
          - {event_id: 10, event_timestamp: "2024-01-17 12:00:00", event_data: "latest"}
      - input: ref('base__events')
        rows:
          # This record is exactly at the boundary — should be captured
          - {event_id: 11, event_timestamp: "2024-01-14 00:00:01", event_data: "boundary_record"}
          # This record is one second before the boundary — behavior depends on your filter
          - {event_id: 12, event_timestamp: "2024-01-13 23:59:59", event_data: "outside_window"}
    expect:
      rows:
        - {event_id: 10, event_timestamp: "2024-01-17 12:00:00", event_data: "latest"}
        - {event_id: 11, event_timestamp: "2024-01-14 00:00:01", event_data: "boundary_record"}
        # event_id: 12 should NOT appear if your filter uses >= (not >)

Whether the boundary record is included or excluded depends on your filter operator (>= vs >). The test makes this behavior explicit and prevents accidental changes to the logic from going unnoticed.

Testing Deduplication Within the Window

When a lookback window reprocesses records that are already in the target table, the model must update rather than duplicate them. Test this explicitly:

unit_tests:
  - name: test_reprocessed_records_update_not_duplicate
    model: fct__events
    overrides:
      macros:
        is_incremental: true
    given:
      - input: this
        rows:
          - {event_id: 1, event_data: "stale_value", processed_at: "2024-01-16 10:00:00"}
      - input: ref('base__events')
        rows:
          # Same event_id, newer value — should update, not duplicate
          - {event_id: 1, event_data: "fresh_value", processed_at: "2024-01-17 08:00:00"}
    expect:
      rows:
        - {event_id: 1, event_data: "fresh_value"}
      # Exactly one row — no duplicates

This verifies that your unique_key and deduplication logic (QUALIFY/ROW_NUMBER) work together correctly when the lookback window causes reprocessing of already-processed records.

Production Validation with audit_helper

Unit tests verify that your model logic is correct. Production validation detects whether your running model is actually drifting from source truth — which can happen when lookback windows are sized incorrectly or when data arrives outside the window.

The standard pattern: compare a recent slice of your incremental model against a full refresh of the same time period.

{{ audit_helper.compare_queries(
    a_query="SELECT * FROM {{ ref('fct__events') }} WHERE event_date >= CURRENT_DATE - 7",
    b_query="""
        SELECT *
        FROM {{ ref('base__events') }}
        WHERE event_date >= CURRENT_DATE - 7
        QUALIFY ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY updated_at DESC) = 1
    """,
    primary_key='event_id'
) }}

This comparison returns rows present in one query but not the other, and rows where the key exists in both but values differ. Non-empty results indicate drift.

Running this comparison weekly (or after any known pipeline issue) gives you early warning before drift compounds. A lookup window of 7 days in this comparison is intentional — it’s wider than the typical 3-day lookback window, so it can catch cases where late-arriving records just missed the window.

Interpreting Comparison Results

audit_helper’s compare_queries returns a result set with columns indicating whether each row is in_a_only, in_b_only, or in_both_or_identical. The categories have different meanings:

in_a_only (incremental has, full refresh doesn’t): Usually harmless — records in your incremental model that fall outside the comparison window. Can also indicate extra records from a deduplication bug.
in_b_only (full refresh has, incremental doesn’t): These are the dangerous ones. Records your incremental model should have but doesn’t. This is late-arriving data that fell outside your lookback window.
in_both_different: Records exist in both but with different values. Suggests either an update was missed or the deduplication logic picked a different version.

A high count of in_b_only records within your comparison window (but outside your lookback window) is a direct signal that your window is too narrow.

When Unit Tests Aren’t Enough

Unit tests verify logic in isolation with controlled fixture data. They can’t detect:

Window sizing problems: If your lookback window is 3 days but 10% of records arrive within 5 days, unit tests with synthetic fixtures won’t catch the real-world miss rate.
Source data distribution changes: The latency profile of your source system can shift over time. What worked for 3 days last year might need 7 days this year.
Interaction effects at scale: A deduplication that works correctly on 10 test rows can have edge cases that only appear with millions of rows and a realistic distribution of duplicates.

The audit_helper comparison is what catches these. Combine unit tests for logic correctness with periodic production comparison for drift detection. Neither alone is sufficient — one tests the code, the other tests the outcome.

For the underlying patterns these tests are validating, see the lookback window pattern and idempotent incremental models.