Late-arriving data is why incremental models drift from source truth. A record with an event timestamp of March 20 lands in your warehouse on March 24. Your model processed March 20’s data three days ago and moved on. Without a strategy to catch late arrivals, that record is missed permanently, and the gap between your model and source truth widens with every run.
This hub connects the concepts you need to handle late-arriving data correctly in dbt.
The Core Problem
Two timestamps drive the issue: event time (when something happened) and load time (when the record reached your warehouse). Standard incremental filters only look at load time or max event time — they miss records that arrived late relative to the model’s last run.
-- This misses late-arriving data{% if is_incremental() %}WHERE event_date > (SELECT MAX(event_date) FROM {{ this }}){% endif %}If this model ran on January 15th and captured events through January 14th, any January 13th events arriving today get skipped entirely.
Start by Measuring
Before choosing any strategy, measure your data’s actual latency profile. The distribution of arrival delays determines which patterns make sense and what window size is appropriate.
→ Measuring Data Latency Before Choosing an Incremental Strategy
The Lookback Window
The standard defense: reprocess a rolling window of recent data on every run. Instead of filtering to new-only data, subtract a fixed window from the current maximum and reprocess everything from there.
Make it configurable so you can extend the window for backfills without changing model code:
{% set lookback_days = var('lookback_days', 3) %}
{% if is_incremental() %}WHERE event_date >= ( SELECT DATE_SUB(MAX(event_date), INTERVAL {{ lookback_days }} DAY) FROM {{ this }}){% endif %}→ Late-Arriving Data and the Lookback Window Pattern
Limiting Destination Scans with Incremental Predicates
On large tables, the lookback window filters source data but the merge still scans the entire destination table. Incremental predicates constrain the destination scan to the same window, keeping merge viable at scale.
→ Incremental Predicates for dbt Merge
Partition-Based Strategies
For time-partitioned tables, replacing entire partitions is often more efficient than row-by-row merging. The right approach differs by warehouse:
- BigQuery:
insert_overwritewith a static partition list - Snowflake:
delete+insertwith date predicates (notinsert_overwrite— that replaces the entire table) - Databricks:
replace_wherewith a predicate expression
Each has a blind spot: records arriving after their partition’s replacement window passes get permanently missed without a full refresh or longer window.
→ dbt Incremental Strategy Configuration Patterns → dbt Incremental Strategy Warehouse Behaviors → Incremental Strategy Decision Framework
Deduplication and Idempotency
A lookback window means processing the same records multiple times. Your model must produce identical results regardless of how many times it runs. Relying solely on unique_key isn’t enough — the initial full refresh doesn’t apply merge logic. Add explicit deduplication in your SELECT:
QUALIFY ROW_NUMBER() OVER ( PARTITION BY event_id ORDER BY _loaded_at DESC) = 1When uniqueness spans multiple columns, use generate_surrogate_key from dbt-utils to create a single stable key.
→ Idempotent Incremental Models in dbt → dbt-utils generate_surrogate_key
Testing Late-Arriving Data Handling
dbt unit tests (1.8+) can simulate late arrivals by overriding is_incremental() and providing fixture rows for {{ this }}. In production, compare incremental results against a full-refresh baseline to detect drift.
→ Testing Late-Arriving Data Handling in dbt
The Full Refresh Safety Net
No lookback window catches everything. Data can arrive arbitrarily late. The full_refresh: false config prevents accidental multi-hour rebuilds while keeping intentional full refreshes possible. Schedule periodic full refreshes to reset accumulated drift.
→ The full_refresh: false Guard in dbt
Microbatch as an Alternative
dbt’s microbatch strategy (1.9+) has built-in lookback support via the lookback config parameter. It handles filtering automatically without is_incremental() logic, and provides batch-level retry for failed periods. Trade-offs: minimum batch size is hourly, all time calculations use UTC, and batches run sequentially.
→ dbt Microbatch Strategy Tradeoffs
Source Article
→ Late-Arriving Data in dbt: Patterns That Actually Work