Merge vs. Delete+Insert vs. Insert_Overwrite: Choosing the Right dbt Strategy

Merge is the default incremental strategy in dbt. It’s also the strategy most likely to blow up your warehouse bill once tables grow past 100 million rows.

Merge does more work than you often need. When you’re replacing entire partitions of event data, row-by-row comparison is wasteful. When your unique key guarantees no duplicates exist, a full MERGE statement is overkill.

Choosing the right strategy can cut run times significantly and reduce scan costs by orders of magnitude on large tables. If you’re new to incremental models, start with my complete guide to dbt incremental models before diving into strategy selection.

This guide breaks down how each strategy works, when each makes sense, and what to watch for on BigQuery, Snowflake, and Databricks.

Strategy Availability by Warehouse

Not every strategy works on every warehouse. Here’s what you can actually use:

Strategy	BigQuery	Snowflake	Databricks
append	❌	✅	✅
merge	✅ (default)	✅ (default)	✅ (default)
delete+insert	❌	✅	✅ (v1.11+)
insert_overwrite	✅	⚠️ Full table only	✅
replace_where	❌	❌	✅ (Delta only)

The Snowflake limitation on insert_overwrite is the one that catches people. Unlike BigQuery and Databricks, Snowflake’s insert_overwrite replaces the entire table, not specific partitions. For partition-level replacement on Snowflake, use delete+insert instead.

How Each Strategy Works

Merge

Merge compares every incoming row against the destination table using your unique_key. Matching rows get updated. Non-matching rows get inserted.

MERGE INTO target AS DBT_INTERNAL_DEST
USING staging AS DBT_INTERNAL_SOURCE
ON DBT_INTERNAL_DEST.id = DBT_INTERNAL_SOURCE.id
WHEN MATCHED THEN UPDATE SET col1 = DBT_INTERNAL_SOURCE.col1, ...
WHEN NOT MATCHED THEN INSERT (col1, col2) VALUES (...)

The generated SQL scans the entire destination table unless you add incremental_predicates to limit the comparison window.

Strengths: Handles mixed inserts and updates cleanly. Atomic operation. Simplest mental model.

Weaknesses: Full table scans get expensive fast. Performance degrades significantly past 100M rows.

Delete+Insert

Delete+insert runs two separate operations: first delete all matching records, then insert the new batch.

-- Step 1: Remove existing records
DELETE FROM target
USING tmp
WHERE tmp.unique_key = target.unique_key

-- Step 2: Insert fresh data
INSERT INTO target SELECT * FROM tmp

The two-step nature creates a brief intermediate state. If the process fails between delete and insert, you’ll have missing data until the next successful run.

Strengths: Faster than merge at scale (3.4x faster at 500M rows on Snowflake). Works well for batch replacements.

Weaknesses: Not atomic. Requires unique_key. Creates intermediate states on failure.

Insert_Overwrite

Insert_overwrite replaces entire partitions rather than comparing individual rows. On BigQuery, it identifies which partitions contain new data, then atomically swaps them out.

-- BigQuery determines affected partitions
DECLARE dbt_partitions_for_replacement ARRAY<DATE>;
SET (dbt_partitions_for_replacement) = (
    SELECT AS STRUCT ARRAY_AGG(DISTINCT DATE(partition_col))
    FROM source_table
);

-- Atomic partition replacement
MERGE INTO target USING staging ON FALSE
WHEN NOT MATCHED BY SOURCE
  AND partition_col IN UNNEST(dbt_partitions_for_replacement)
THEN DELETE
WHEN NOT MATCHED THEN INSERT ...

Strengths: Extremely efficient for time-partitioned data. No row comparison overhead. Handles late-arriving data naturally within overwritten partitions.

Weaknesses: Requires partitioning. Can’t update specific rows within a partition. Behavior differs significantly across warehouses.

Replace_Where (Databricks Only)

Replace_where is Databricks’ predicate-based alternative to partition overwrite. Instead of replacing partitions, it replaces all rows matching a condition.

INSERT INTO table REPLACE WHERE date_col >= '2024-01-01'

Strengths: More flexible than partition-based overwrite. Works on SQL Warehouses. Atomic.

Weaknesses: Databricks only. Requires Delta Lake.

Warehouse-Specific Behaviors

BigQuery

Merge quirks:

Requires unique_key. Without it, merge fails (not append behavior).
Does not support dynamic partition pruning. Subqueries in predicates won’t trigger pruning.
Full table scans are the norm unless you add incremental_predicates or cluster your table.

Insert_overwrite strengths:

Works with partitioned tables (required configuration: partition_by)
Static partitions (explicit list) are faster than dynamic partition discovery
The _dbt_max_partition variable is BigQuery scripting, not Jinja (it runs at query time)

BigQuery gotcha: You can’t use require_partition_filter=true with merge unless you also set incremental_predicates. The merge statement doesn’t know which partitions to touch.

Cost impact: Teams switching from merge to insert_overwrite with static partitions report 100-200x cost reductions on large tables. For more on managing warehouse costs, see my BigQuery cost optimization guide.

Snowflake

Merge behavior:

Default strategy, standard MERGE INTO syntax
Fails with “nondeterministic merge” if your unique_key isn’t truly unique
Performance degrades noticeably past 100M rows

Delete+insert advantages:

3.4x faster than merge at 500M+ rows in benchmarks
Requires tmp_relation_type: table when unique_key is defined

Critical limitation: Insert_overwrite on Snowflake replaces the entire table, not partitions. This makes it useless for incremental processing. If you need partition-level replacement on Snowflake, use delete+insert with date predicates in your WHERE clause.

Databricks

Merge on Delta Lake:

Uses ACID-compliant MERGE INTO
Supports schema evolution via merge_with_schema_evolution=true
Advanced options like matched_condition and not_matched_by_source_action='delete' for CDC patterns

Insert_overwrite specifics:

Uses dynamic partition overwrite by default
A v1.8.0 regression caused full table replacement instead of partition replacement. Fix by setting spark.sql.sources.partitionOverwriteMode=DYNAMIC

Replace_where: Often the best choice for date-bounded updates on SQL Warehouses where partition behavior is less predictable.

Performance Data

Real benchmarks from production systems:

Scenario	Merge	Better Alternative	Improvement
500M rows, Snowflake	44 min	13 min (delete+insert)	3.4x
192GB partition, BigQuery	43 min	26 min (insert_overwrite)	1.7x
Full vs incremental scan, BigQuery	24.6 GB	500 MB	50x data reduction
With incremental_predicates	9.5s	4s	2.4x

Merge works fine at smaller scales but becomes a bottleneck once tables grow.

Configuration Examples

Merge with Predicates (Large Tables)

{{ config(
    materialized='incremental',
    unique_key='event_id',
    incremental_strategy='merge',
    incremental_predicates=[
        "DBT_INTERNAL_DEST.event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)"
    ]
) }}

SELECT
    event_id,
    event_date,
    event_name,
    user_id,
    event_properties
FROM {{ ref('base__segment__events') }}
{% if is_incremental() %}
WHERE event_date >= (SELECT MAX(event_date) - INTERVAL 3 DAY FROM {{ this }})
{% endif %}

The incremental_predicates filter the destination table during the merge. This enables partition pruning and avoids full table scans.

Delete+Insert on Snowflake

{{ config(
    materialized='incremental',
    unique_key='order_id',
    incremental_strategy='delete+insert'
) }}

SELECT
    order_id,
    customer_id,
    order_date,
    updated_at,
    total_amount
FROM {{ ref('base__shopify__orders') }}
{% if is_incremental() %}
WHERE updated_at >= (SELECT MAX(updated_at) - INTERVAL 3 DAY FROM {{ this }})
{% endif %}

Insert_Overwrite with Static Partitions (BigQuery)

{% set partitions_to_replace = [
    'CURRENT_DATE()',
    'DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)',
    'DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)'
] %}

{{ config(
    materialized='incremental',
    incremental_strategy='insert_overwrite',
    partition_by={'field': 'event_date', 'data_type': 'date'},
    partitions=partitions_to_replace
) }}

SELECT
    event_id,
    event_date,
    event_name,
    user_id,
    event_properties
FROM {{ ref('base__segment__events') }}
{% if is_incremental() %}
WHERE event_date IN ({{ partitions_to_replace | join(',') }})
{% endif %}

Static partitions are faster than dynamic because BigQuery doesn’t need to query the source table to determine which partitions to replace.

Replace_Where on Databricks

{{ config(
    materialized='incremental',
    incremental_strategy='replace_where',
    incremental_predicates=["event_date >= CURRENT_DATE - INTERVAL 3 DAY"]
) }}

SELECT
    event_id,
    event_date,
    event_name,
    user_id,
    event_properties
FROM {{ ref('base__segment__events') }}
WHERE event_date >= CURRENT_DATE - INTERVAL 3 DAY

Decision Framework

Use merge when:

Tables are under 100M rows
You need to update specific rows based on changing source data
Simplicity matters more than optimization
You’re handling CDC with mixed inserts, updates, and deletes

Use delete+insert when:

Tables exceed 100M rows on Snowflake
Your updates are batch-oriented (replacing all records for a time window)
You can tolerate brief intermediate states during the two-step operation

Use insert_overwrite when:

Tables are time-partitioned (BigQuery, Databricks)
You’re processing event or fact data where entire days/partitions get reprocessed
Late-arriving data falls within your overwrite window
Cost optimization is a priority

Use replace_where when:

You’re on Databricks with Delta Lake
You need predicate-based replacement without partition dependencies
SQL Warehouse behavior with insert_overwrite is unpredictable

Use append when:

Data is truly append-only (logs, events with no updates)
You handle deduplication downstream
Snowflake or Databricks (BigQuery doesn’t support it)

Recommendations by Warehouse

BigQuery: Default to insert_overwrite for fact tables with time partitions. Use static partition lists when possible. Fall back to merge with incremental_predicates for dimension tables requiring row-level updates.

Snowflake: Start with merge for tables under 100M rows. Switch to delete+insert for larger tables or batch-oriented updates. Avoid insert_overwrite entirely; it doesn’t do what you’d expect.

Databricks: Merge works well with Delta Lake’s optimizations. Consider replace_where for cleaner predicate-based updates. Use insert_overwrite for partition-aligned workloads, but verify partitionOverwriteMode is set correctly.

What to Watch For

Merge at scale: Monitor query costs and run times. If a model that used to take 5 minutes now takes 30, your table probably outgrew the merge strategy.

Unique key integrity: Nulls in unique_key columns cause match failures and duplicates. Composite keys need all columns to be non-null.

Schema changes: Adding columns with on_schema_change='append_new_columns' works for new data but won’t backfill historical records. Having proper tests in place helps catch schema drift early. Plan accordingly.

Late-arriving data: Insert_overwrite handles late data within overwritten partitions. It does not catch records arriving after the partition window closes. Periodic full refreshes remain necessary.

If your data is time-partitioned and you want even simpler incremental logic, consider the microbatch strategy introduced in dbt 1.9.

The right strategy depends on your data patterns, table sizes, and warehouse. Start simple with merge, measure your costs and run times, then optimize when the numbers justify the added complexity.