Adrienne Vermorel
Merge vs. Delete+Insert vs. Insert_Overwrite: Choosing the Right dbt Strategy
Merge is the default incremental strategy in dbt. It’s also the strategy most likely to blow up your warehouse bill once tables grow past 100 million rows.
Merge does more work than you often need. When you’re replacing entire partitions of event data, row-by-row comparison is wasteful. When your unique key guarantees no duplicates exist, a full MERGE statement is overkill.
Choosing the right strategy can cut run times significantly and reduce scan costs by orders of magnitude on large tables.
This guide breaks down how each strategy works, when each makes sense, and what to watch for on BigQuery, Snowflake, and Databricks.
Strategy Availability by Warehouse
Not every strategy works on every warehouse. Here’s what you can actually use:
| Strategy | BigQuery | Snowflake | Databricks |
|---|---|---|---|
| append | ❌ | ✅ | ✅ |
| merge | ✅ (default) | ✅ (default) | ✅ (default) |
| delete+insert | ❌ | ✅ | ✅ (v1.11+) |
| insert_overwrite | ✅ | ⚠️ Full table only | ✅ |
| replace_where | ❌ | ❌ | ✅ (Delta only) |
The Snowflake limitation on insert_overwrite is the one that catches people. Unlike BigQuery and Databricks, Snowflake’s insert_overwrite replaces the entire table, not specific partitions. For partition-level replacement on Snowflake, use delete+insert instead.
How Each Strategy Works
Merge
Merge compares every incoming row against the destination table using your unique_key. Matching rows get updated. Non-matching rows get inserted.
MERGE INTO target AS DBT_INTERNAL_DESTUSING staging AS DBT_INTERNAL_SOURCEON DBT_INTERNAL_DEST.id = DBT_INTERNAL_SOURCE.idWHEN MATCHED THEN UPDATE SET col1 = DBT_INTERNAL_SOURCE.col1, ...WHEN NOT MATCHED THEN INSERT (col1, col2) VALUES (...)The generated SQL scans the entire destination table unless you add incremental_predicates to limit the comparison window.
Strengths: Handles mixed inserts and updates cleanly. Atomic operation. Simplest mental model.
Weaknesses: Full table scans get expensive fast. Performance degrades significantly past 100M rows.
Delete+Insert
Delete+insert runs two separate operations: first delete all matching records, then insert the new batch.
-- Step 1: Remove existing recordsDELETE FROM targetUSING tmpWHERE tmp.unique_key = target.unique_key
-- Step 2: Insert fresh dataINSERT INTO target SELECT * FROM tmpThe two-step nature creates a brief intermediate state. If the process fails between delete and insert, you’ll have missing data until the next successful run.
Strengths: Faster than merge at scale (3.4x faster at 500M rows on Snowflake). Works well for batch replacements.
Weaknesses: Not atomic. Requires unique_key. Creates intermediate states on failure.
Insert_Overwrite
Insert_overwrite replaces entire partitions rather than comparing individual rows. On BigQuery, it identifies which partitions contain new data, then atomically swaps them out.
-- BigQuery determines affected partitionsDECLARE dbt_partitions_for_replacement ARRAY<DATE>;SET (dbt_partitions_for_replacement) = ( SELECT AS STRUCT ARRAY_AGG(DISTINCT DATE(partition_col)) FROM source_table);
-- Atomic partition replacementMERGE INTO target USING staging ON FALSEWHEN NOT MATCHED BY SOURCE AND partition_col IN UNNEST(dbt_partitions_for_replacement)THEN DELETEWHEN NOT MATCHED THEN INSERT ...Strengths: Extremely efficient for time-partitioned data. No row comparison overhead. Handles late-arriving data naturally within overwritten partitions.
Weaknesses: Requires partitioning. Can’t update specific rows within a partition. Behavior differs significantly across warehouses.
Replace_Where (Databricks Only)
Replace_where is Databricks’ predicate-based alternative to partition overwrite. Instead of replacing partitions, it replaces all rows matching a condition.
INSERT INTO table REPLACE WHERE date_col >= '2024-01-01'Strengths: More flexible than partition-based overwrite. Works on SQL Warehouses. Atomic.
Weaknesses: Databricks only. Requires Delta Lake.
Warehouse-Specific Behaviors
BigQuery
Merge quirks:
- Requires
unique_key. Without it, merge fails (not append behavior). - Does not support dynamic partition pruning. Subqueries in predicates won’t trigger pruning.
- Full table scans are the norm unless you add
incremental_predicatesor cluster your table.
Insert_overwrite strengths:
- Works with partitioned tables (required configuration:
partition_by) - Static partitions (explicit list) are faster than dynamic partition discovery
- The
_dbt_max_partitionvariable is BigQuery scripting, not Jinja (it runs at query time)
BigQuery gotcha: You can’t use require_partition_filter=true with merge unless you also set incremental_predicates. The merge statement doesn’t know which partitions to touch.
Cost impact: Teams switching from merge to insert_overwrite with static partitions report 100-200x cost reductions on large tables.
Snowflake
Merge behavior:
- Default strategy, standard MERGE INTO syntax
- Fails with “nondeterministic merge” if your unique_key isn’t truly unique
- Performance degrades noticeably past 100M rows
Delete+insert advantages:
- 3.4x faster than merge at 500M+ rows in benchmarks
- Requires
tmp_relation_type: tablewhen unique_key is defined
Critical limitation: Insert_overwrite on Snowflake replaces the entire table, not partitions. This makes it useless for incremental processing. If you need partition-level replacement on Snowflake, use delete+insert with date predicates in your WHERE clause.
Databricks
Merge on Delta Lake:
- Uses ACID-compliant MERGE INTO
- Supports schema evolution via
merge_with_schema_evolution=true - Advanced options like
matched_conditionandnot_matched_by_source_action='delete'for CDC patterns
Insert_overwrite specifics:
- Uses dynamic partition overwrite by default
- A v1.8.0 regression caused full table replacement instead of partition replacement. Fix by setting
spark.sql.sources.partitionOverwriteMode=DYNAMIC
Replace_where: Often the best choice for date-bounded updates on SQL Warehouses where partition behavior is less predictable.
Performance Data
Real benchmarks from production systems:
| Scenario | Merge | Better Alternative | Improvement |
|---|---|---|---|
| 500M rows, Snowflake | 44 min | 13 min (delete+insert) | 3.4x |
| 192GB partition, BigQuery | 43 min | 26 min (insert_overwrite) | 1.7x |
| Full vs incremental scan, BigQuery | 24.6 GB | 500 MB | 50x data reduction |
| With incremental_predicates | 9.5s | 4s | 2.4x |
Merge works fine at smaller scales but becomes a bottleneck once tables grow.
Configuration Examples
Merge with Predicates (Large Tables)
{{ config( materialized='incremental', unique_key='event_id', incremental_strategy='merge', incremental_predicates=[ "DBT_INTERNAL_DEST.event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)" ]) }}
SELECT event_id, event_date, event_name, user_id, event_propertiesFROM {{ ref('base__segment__events') }}{% if is_incremental() %}WHERE event_date >= (SELECT MAX(event_date) - INTERVAL 3 DAY FROM {{ this }}){% endif %}The incremental_predicates filter the destination table during the merge. This enables partition pruning and avoids full table scans.
Delete+Insert on Snowflake
{{ config( materialized='incremental', unique_key='order_id', incremental_strategy='delete+insert') }}
SELECT order_id, customer_id, order_date, updated_at, total_amountFROM {{ ref('base__shopify__orders') }}{% if is_incremental() %}WHERE updated_at >= (SELECT MAX(updated_at) - INTERVAL 3 DAY FROM {{ this }}){% endif %}Insert_Overwrite with Static Partitions (BigQuery)
{% set partitions_to_replace = [ 'CURRENT_DATE()', 'DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)', 'DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)'] %}
{{ config( materialized='incremental', incremental_strategy='insert_overwrite', partition_by={'field': 'event_date', 'data_type': 'date'}, partitions=partitions_to_replace) }}
SELECT event_id, event_date, event_name, user_id, event_propertiesFROM {{ ref('base__segment__events') }}{% if is_incremental() %}WHERE event_date IN ({{ partitions_to_replace | join(',') }}){% endif %}Static partitions are faster than dynamic because BigQuery doesn’t need to query the source table to determine which partitions to replace.
Replace_Where on Databricks
{{ config( materialized='incremental', incremental_strategy='replace_where', incremental_predicates=["event_date >= CURRENT_DATE - INTERVAL 3 DAY"]) }}
SELECT event_id, event_date, event_name, user_id, event_propertiesFROM {{ ref('base__segment__events') }}WHERE event_date >= CURRENT_DATE - INTERVAL 3 DAYDecision Framework
Use merge when:
- Tables are under 100M rows
- You need to update specific rows based on changing source data
- Simplicity matters more than optimization
- You’re handling CDC with mixed inserts, updates, and deletes
Use delete+insert when:
- Tables exceed 100M rows on Snowflake
- Your updates are batch-oriented (replacing all records for a time window)
- You can tolerate brief intermediate states during the two-step operation
Use insert_overwrite when:
- Tables are time-partitioned (BigQuery, Databricks)
- You’re processing event or fact data where entire days/partitions get reprocessed
- Late-arriving data falls within your overwrite window
- Cost optimization is a priority
Use replace_where when:
- You’re on Databricks with Delta Lake
- You need predicate-based replacement without partition dependencies
- SQL Warehouse behavior with insert_overwrite is unpredictable
Use append when:
- Data is truly append-only (logs, events with no updates)
- You handle deduplication downstream
- Snowflake or Databricks (BigQuery doesn’t support it)
Recommendations by Warehouse
BigQuery: Default to insert_overwrite for fact tables with time partitions. Use static partition lists when possible. Fall back to merge with incremental_predicates for dimension tables requiring row-level updates.
Snowflake: Start with merge for tables under 100M rows. Switch to delete+insert for larger tables or batch-oriented updates. Avoid insert_overwrite entirely; it doesn’t do what you’d expect.
Databricks: Merge works well with Delta Lake’s optimizations. Consider replace_where for cleaner predicate-based updates. Use insert_overwrite for partition-aligned workloads, but verify partitionOverwriteMode is set correctly.
What to Watch For
Merge at scale: Monitor query costs and run times. If a model that used to take 5 minutes now takes 30, your table probably outgrew the merge strategy.
Unique key integrity: Nulls in unique_key columns cause match failures and duplicates. Composite keys need all columns to be non-null.
Schema changes: Adding columns with on_schema_change='append_new_columns' works for new data but won’t backfill historical records. Plan accordingly.
Late-arriving data: Insert_overwrite handles late data within overwritten partitions. It does not catch records arriving after the partition window closes. Periodic full refreshes remain necessary.
The right strategy depends on your data patterns, table sizes, and warehouse. Start simple with merge, measure your costs and run times, then optimize when the numbers justify the added complexity.