ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Incremental Strategy Configuration Patterns

Complete, runnable dbt config blocks for each incremental strategy — merge with predicates, delete+insert on Snowflake, insert_overwrite with static partitions, and replace_where on Databricks.

Planted
dbtbigquerysnowflakedatabricksincremental processingdata modeling

Complete config blocks for each incremental strategy, organized by warehouse and use case. For strategy selection logic, see the decision framework. For warehouse-specific behavior details, see dbt Incremental Strategy Warehouse Behaviors.

Merge with Incremental Predicates (BigQuery / Large Tables)

Use this when you need row-level updates on a table that’s grown past 100M rows. The incremental_predicates limit the destination table scan during the MERGE, which is the key to keeping merge viable at scale. See incremental predicates for the full explanation of how the two filters work together.

{{ config(
materialized='incremental',
unique_key='event_id',
incremental_strategy='merge',
incremental_predicates=[
"DBT_INTERNAL_DEST.event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)"
]
) }}
SELECT
event_id,
event_date,
event_name,
user_id,
event_properties
FROM {{ ref('base__segment__events') }}
{% if is_incremental() %}
WHERE event_date >= (SELECT MAX(event_date) - INTERVAL 3 DAY FROM {{ this }})
{% endif %}

The is_incremental() filter covers 3 days of source data (a lookback window for late-arriving records). The predicate covers 7 days on the destination side — deliberately wider to ensure all matches are found. A destination predicate narrower than your source filter can cause the merge to miss legitimate matches.

Delete+Insert on Snowflake

Use this when tables exceed 100M rows on Snowflake. This is the standard replacement for merge at scale — 3.4x faster at 500M+ rows in production benchmarks. Not atomic: if the process fails between delete and insert, records are temporarily missing.

{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='delete+insert'
) }}
SELECT
order_id,
customer_id,
order_date,
updated_at,
total_amount
FROM {{ ref('base__shopify__orders') }}
{% if is_incremental() %}
WHERE updated_at >= (SELECT MAX(updated_at) - INTERVAL 3 DAY FROM {{ this }})
{% endif %}

The lookback window handles late-arriving orders. The unique_key drives the DELETE step — dbt deletes all rows in the target where order_id matches any incoming row, then inserts the full batch.

Insert_Overwrite with Static Partitions (BigQuery)

Use this for time-partitioned fact tables on BigQuery where cost optimization matters. Static partition lists are faster than dynamic because BigQuery doesn’t need a discovery query to determine which partitions to replace.

{% set partitions_to_replace = [
'CURRENT_DATE()',
'DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)',
'DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)'
] %}
{{ config(
materialized='incremental',
incremental_strategy='insert_overwrite',
partition_by={'field': 'event_date', 'data_type': 'date'},
partitions=partitions_to_replace
) }}
SELECT
event_id,
event_date,
event_name,
user_id,
event_properties
FROM {{ ref('base__segment__events') }}
{% if is_incremental() %}
WHERE event_date IN ({{ partitions_to_replace | join(',') }})
{% endif %}

The partitions config tells BigQuery exactly which partitions to replace. The is_incremental() WHERE clause filters the source query to match. Both lists must align — if you replace 3 days of partitions but select 7 days of source data, the extra days are wasted compute.

The partition list acts as a built-in lookback window: replacing today plus the previous 2 days catches most late-arriving event data. To extend the window, add more entries to partitions_to_replace.

This pattern doesn’t require unique_key because entire partitions are replaced atomically — there’s no row-level matching. Duplicates within the source query will propagate into the partition, so include deduplication in your SELECT if your source can have duplicates.

Insert_Overwrite with Dynamic Partitions (BigQuery)

Use this when you don’t know in advance which partitions will contain new data. Slower than static because BigQuery runs a discovery query first, but handles unpredictable data arrival patterns.

{{ config(
materialized='incremental',
incremental_strategy='insert_overwrite',
partition_by={'field': 'event_date', 'data_type': 'date'}
) }}
SELECT
event_id,
event_date,
event_name,
user_id,
event_properties
FROM {{ ref('base__segment__events') }}
{% if is_incremental() %}
WHERE event_date >= _dbt_max_partition
{% endif %}

_dbt_max_partition is a BigQuery scripting variable, not Jinja. It’s resolved at query time in the warehouse and contains the maximum partition value in the target table. dbt generates a DECLARE/SET block that populates this variable before the main query runs.

Replace_Where on Databricks

Use this for date-bounded updates on Databricks with Delta Lake. More flexible than insert_overwrite because the predicate can be any expression, not just a partition boundary. Atomic.

{{ config(
materialized='incremental',
incremental_strategy='replace_where',
incremental_predicates=["event_date >= CURRENT_DATE - INTERVAL 3 DAY"]
) }}
SELECT
event_id,
event_date,
event_name,
user_id,
event_properties
FROM {{ ref('base__segment__events') }}
WHERE event_date >= CURRENT_DATE - INTERVAL 3 DAY

Notice: no is_incremental() block. The source filter applies on every run (full refresh and incremental). The incremental_predicates config controls which rows in the target get replaced. On full refresh, the predicates are ignored and the full table is rebuilt.

This means the source query always returns only 3 days of data. On a full refresh, that builds a table with only 3 days. If you need the full refresh to process all historical data, wrap the WHERE in is_incremental():

{% if is_incremental() %}
WHERE event_date >= CURRENT_DATE - INTERVAL 3 DAY
{% endif %}

Merge with CDC Pattern (Databricks)

Use this for Change Data Capture workloads where you need to propagate inserts, updates, and deletes from a source system. Databricks-specific matched_condition and not_matched_by_source_action options enable patterns that other warehouses require custom macros for.

{{ config(
materialized='incremental',
unique_key='customer_id',
incremental_strategy='merge',
merge_with_schema_evolution=true
) }}
SELECT
customer_id,
customer_name,
email,
updated_at,
is_deleted
FROM {{ ref('base__crm__customers') }}
{% if is_incremental() %}
WHERE updated_at >= (SELECT MAX(updated_at) - INTERVAL 3 DAY FROM {{ this }})
{% endif %}
QUALIFY ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY updated_at DESC
) = 1

The merge_with_schema_evolution=true setting lets new columns in the source automatically propagate to the target without requiring on_schema_change configuration or ALTER TABLE statements.

The QUALIFY clause ensures idempotency by pre-deduplicating before the merge step.

Choosing Your Pattern

Your situationPattern to use
Row-level updates, growing table, BigQueryMerge with predicates
Large table (100M+), SnowflakeDelete+insert
Time-partitioned facts, BigQueryInsert_overwrite with static partitions
Unknown partition set, BigQueryInsert_overwrite with dynamic partitions
Date-bounded updates, DatabricksReplace_where
CDC with deletes, DatabricksMerge with schema evolution

For the full decision logic behind these choices, see the strategy decision framework. For warehouse-specific gotchas and limitations, see the warehouse behaviors note.