dbt Incremental Strategy Warehouse Behaviors

Platform-specific behavior for dbt incremental strategies on BigQuery, Snowflake, and Databricks. The same strategy name produces different behavior across warehouses — differences significant enough to cause silent failures or unexpected costs when porting models between platforms.

BigQuery

Merge Quirks

BigQuery’s merge requires unique_key. This isn’t just a best practice — without it, the merge fails entirely. On Snowflake and Databricks, omitting unique_key with merge produces append behavior. On BigQuery, it produces an error. If you’re porting models from Snowflake, this is one of the first things that breaks.

BigQuery does not support dynamic partition pruning in merge operations. A subquery in your incremental_predicates won’t trigger pruning:

-- This does NOT prune on BigQuery
incremental_predicates=[
    "DBT_INTERNAL_DEST.event_date = (SELECT MAX(event_date) FROM source_table)"
]

-- This DOES prune
incremental_predicates=[
    "DBT_INTERNAL_DEST.event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)"
]

BigQuery needs literal values or deterministic date arithmetic at query planning time. Subqueries produce values at execution time, which is too late for the partition pruner. This is the same limitation that affects all BigQuery partition pruning — it’s not merge-specific, but it’s more painful in the merge context because the destination scan is the expensive part.

The require_partition_filter=true table option interacts badly with merge. If you’ve enabled this option on your target table (a common cost governance guardrail), the merge statement fails unless you also set incremental_predicates that include a partition filter. The MERGE operation doesn’t inherently know which partitions to touch, and BigQuery enforces the partition filter requirement on the destination scan.

Insert_Overwrite Strengths

Insert_overwrite is where BigQuery shines. It’s the most cost-effective strategy for time-partitioned fact tables, and BigQuery’s implementation is the most mature across all three warehouses.

You have two modes:

Dynamic partitions: BigQuery queries the source to discover which partitions contain new data, then replaces those partitions atomically. The discovery query adds overhead but handles cases where you don’t know in advance which partitions will be affected.

Static partitions: You explicitly list the partitions to replace. Faster than dynamic because BigQuery skips the discovery query entirely. The _dbt_max_partition variable that appears in some dynamic partition queries is BigQuery scripting, not Jinja — it runs at query time in the warehouse, not during dbt compilation.

-- Static partitions: faster, no discovery overhead
{% set partitions_to_replace = [
    'CURRENT_DATE()',
    'DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)',
    'DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)'
] %}

{{ config(
    materialized='incremental',
    incremental_strategy='insert_overwrite',
    partition_by={'field': 'event_date', 'data_type': 'date'},
    partitions=partitions_to_replace
) }}

Teams switching from merge to insert_overwrite with static partitions report 100–200x cost reductions on large tables: merge scans the full destination table while insert_overwrite only touches the partitions being replaced.

What BigQuery Doesn’t Support

BigQuery doesn’t support append or delete+insert strategies. If you need append-like behavior, use merge without unique_key… which also doesn’t work on BigQuery. Your actual option is to use a post-hook to INSERT INTO without deduplication, but at that point you’re fighting the tool. For append-only patterns on BigQuery, insert_overwrite with a partition list is a cleaner path.

Snowflake

Merge Behavior

Snowflake’s merge follows standard MERGE INTO syntax and works as expected — until your data isn’t clean. The “nondeterministic merge” error surfaces when your unique_key isn’t truly unique in the source data. If two source rows match the same destination row, Snowflake doesn’t know which one to apply and throws an error rather than picking arbitrarily.

-- This error means your unique_key has duplicates in the source
SQL compilation error: Merge is nondeterministic:
multiple source rows matched the same target row

The fix is to pre-deduplicate in your SELECT (see idempotent incremental models). The error is actually a feature — it surfaces a data quality problem that BigQuery and Databricks silently ignore by applying the last-matched row.

Performance degrades noticeably past 100M rows. Snowflake’s micro-partition pruning helps with incremental predicates, but the row-by-row comparison inherent to merge remains the bottleneck at scale.

Delete+Insert Advantages

Delete+insert is Snowflake’s escape hatch from merge at scale. Benchmarks show 3.4x faster execution at 500M+ rows compared to merge. The performance gap widens as tables grow because delete+insert avoids the row-by-row comparison entirely.

One configuration detail that trips people up: when unique_key is defined, you may need tmp_relation_type: table to avoid materialization issues:

{{ config(
    materialized='incremental',
    unique_key='order_id',
    incremental_strategy='delete+insert'
) }}

Delete+insert is not atomic. The delete and insert are two separate operations. If the process fails between them, you have missing data until the next successful run. For most batch analytics workloads, this brief intermediate state is acceptable. For anything powering real-time decisions, it’s not.

The Insert_Overwrite Trap

This is the single most dangerous gotcha across all three warehouses: insert_overwrite on Snowflake replaces the entire table, not individual partitions. The strategy name implies partition-level replacement (and that’s exactly what it does on BigQuery and Databricks), but Snowflake’s implementation is a full table replacement.

If your incremental query selects only the last 3 days of data and you’re using insert_overwrite on Snowflake, you don’t get a table with the last 3 days updated — you get a table containing only the last 3 days. Everything else is gone.

This naming inconsistency has caused data loss for many teams. There’s no configuration that changes this behavior. On Snowflake, if you need partition-like replacement, use delete+insert with explicit date predicates in your WHERE clause.

Databricks

Merge on Delta Lake

Databricks merge benefits from Delta Lake’s ACID guarantees, but it also offers the most configuration options of any warehouse. Beyond the basics, Databricks supports:

Schema evolution: merge_with_schema_evolution=true automatically adds new columns from the source to the target during merge operations. This is more graceful than dbt’s on_schema_change options because it happens at the Delta Lake level rather than requiring ALTER TABLE statements.

Advanced CDC patterns: matched_condition lets you add additional filters to the WHEN MATCHED clause, and not_matched_by_source_action='delete' enables automatic deletion of records that exist in the target but not in the source. This makes the merge strategy viable for CDC pipelines that need to propagate deletes.

{{ config(
    materialized='incremental',
    unique_key='customer_id',
    incremental_strategy='merge',
    merge_with_schema_evolution=true
) }}

Insert_Overwrite Specifics

Databricks uses dynamic partition overwrite by default, which usually works correctly. But a v1.8.0 regression changed the behavior to full table replacement instead of partition replacement. If you’re on a version affected by this regression, set the Spark configuration explicitly:

spark.sql.sources.partitionOverwriteMode=DYNAMIC

This is a cluster-level or session-level setting, not a dbt config. If you’re debugging unexpected data loss with insert_overwrite on Databricks, this setting is the first thing to check.

Replace_Where: The Databricks-Only Option

replace_where is often the best choice for date-bounded updates on Databricks, especially on SQL Warehouses where insert_overwrite partition behavior can be less predictable. It uses Delta Lake’s INSERT INTO ... REPLACE WHERE syntax:

{{ config(
    materialized='incremental',
    incremental_strategy='replace_where',
    incremental_predicates=["event_date >= CURRENT_DATE - INTERVAL 3 DAY"]
) }}

The predicate can be any expression, not just a partition boundary. This makes replace_where more flexible than insert_overwrite while remaining atomic. The tradeoff: it’s Databricks-only and requires Delta Lake, so your model isn’t portable if you ever need to support another warehouse.

Cross-Warehouse Strategy Mapping

When you’re deciding which strategy to use and you support multiple warehouses, this mapping shows the practical equivalents:

Use Case	BigQuery	Snowflake	Databricks
Row-level updates, small tables	merge	merge	merge
Row-level updates, large tables	merge + predicates	delete+insert	merge (Delta optimized)
Partition replacement	insert_overwrite	delete+insert (with date filter)	insert_overwrite or replace_where
Append-only	insert_overwrite (overwrite today’s partition)	append	append

The same intent maps to different strategies across warehouses. “Replace today’s data” is insert_overwrite on BigQuery, delete+insert on Snowflake, and insert_overwrite or replace_where on Databricks. The decision framework applies per-warehouse, not as a universal choice.