Incremental Strategy Decision Framework

The incremental strategy determines how new data enters an existing table, what happens on failure, and how costs scale as tables grow. The default strategy (merge) works for small-to-medium tables but becomes expensive above 100M rows without additional configuration.

This note covers what each strategy does, when each fits, and warehouse-specific behaviors that affect the choice.

Strategy Availability by Warehouse

Not every strategy works everywhere. This table saves you from configuring something your warehouse will reject.

Strategy	BigQuery	Snowflake	Databricks
append	Not available	Available	Available
merge	Default	Default	Default
delete+insert	Not available	Available	Available (v1.11+)
insert_overwrite	Available	Full table only	Available
replace_where	Not available	Not available	Delta Lake only
microbatch	Available	Available	Available

The critical gotcha: insert_overwrite on Snowflake replaces the entire table, not individual partitions. This naming inconsistency has caused data loss. On Snowflake, use delete+insert with date predicates for partition-like behavior.

What Each Strategy Does

Merge

Compares every incoming row against the destination table using your unique_key. Matching rows update; non-matching rows insert. The generated SQL is a standard MERGE INTO statement.

The problem is scan scope. Without incremental_predicates, the merge scans every row in the destination to find matches — even if you’re inserting 1,000 records into a 500M-row table. That full scan happens on every run, meaning costs scale with table size rather than with new data volume. Above 100M rows, this becomes visibly slow and expensive.

Merge is atomic: it either completes fully or rolls back. That makes it the safest option when you need row-level updates on small-to-medium tables.

Delete+Insert

Runs two separate operations: delete all matching records by unique_key, then insert the new batch. The two-step nature is the key trade-off. If the process fails between delete and insert, records are gone until the next successful run or a full refresh.

On Snowflake, delete+insert benchmarks at 3.4x faster than merge at 500M+ rows. The performance gap widens as tables grow because delete+insert avoids the row-by-row comparison that makes merge expensive.

Insert_Overwrite

Replaces entire partitions rather than comparing individual rows. On BigQuery, it discovers which partitions contain new data and atomically swaps them out. No row comparison, no full table scan. You can use static partition lists (faster, no discovery query) or dynamic partition discovery.

This strategy requires partitioned tables — typically time-partitioned by a date or timestamp column. It cannot update specific rows within a partition; the entire partition gets replaced.

Static partitions are the most cost-effective approach on BigQuery:

{% set partitions_to_replace = [
    'CURRENT_DATE()',
    'DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)',
    'DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)'
] %}

{{ config(
    materialized='incremental',
    incremental_strategy='insert_overwrite',
    partition_by={'field': 'event_date', 'data_type': 'date'},
    partitions=partitions_to_replace
) }}

Teams switching from merge to insert_overwrite with static partitions report 100-200x cost reductions on large tables.

Replace_Where (Databricks Only)

Replaces all rows matching a predicate condition, using Delta Lake’s INSERT INTO ... REPLACE WHERE syntax. More flexible than partition-based overwrite because the predicate can be any expression, not just a partition boundary. Atomic.

{{ config(
    materialized='incremental',
    incremental_strategy='replace_where',
    incremental_predicates=["event_date >= CURRENT_DATE - INTERVAL 3 DAY"]
) }}

This is often the cleanest choice on Databricks for date-bounded updates, especially on SQL Warehouses where insert_overwrite partition behavior can be unpredictable.

Microbatch (dbt 1.9+)

Processes data in time-based batches — one query per hour, day, or month — without writing is_incremental() logic. You configure event_time, batch_size, and lookback; dbt handles the rest.

Key advantages over traditional incremental: batch-level retry (a failure in one day’s batch doesn’t require reprocessing the entire run), built-in backfill via --event-time-start and --event-time-end flags, and automatic upstream filtering when referenced models also have event_time configured.

Under the hood, microbatch delegates to different strategies per warehouse: insert_overwrite on BigQuery (requires partition_by), delete+insert on Snowflake, and replace_where on Databricks.

Limitations: minimum batch size is hour (no sub-hourly), all time calculations use UTC, and batches run sequentially by default.

Append

Insert-only with no deduplication. The fastest strategy because there’s no comparison or deletion step. Only appropriate when data is truly immutable — raw event logs, audit trails, append-only streams. You handle deduplication downstream. Not available on BigQuery.

Decision Framework

Start with the data pattern and table size, then factor in your warehouse.

Use merge when:

Tables are under 100M rows
You need row-level updates (individual records change over time)
You’re handling CDC patterns with mixed inserts, updates, and deletes
Atomicity matters and you can’t tolerate intermediate failure states

Add incremental_predicates as soon as table size makes full scans noticeable. This limits the destination scan to recent partitions during the merge and is the single biggest performance lever for merge on large tables.

Use delete+insert when:

Tables exceed 100M rows on Snowflake
Updates are batch-oriented (replacing all records for a time window)
You can tolerate a brief intermediate state if the job fails mid-execution
You need partition-like behavior on Snowflake (since insert_overwrite won’t help)

Use insert_overwrite when:

Tables are time-partitioned on BigQuery or Databricks
You’re processing event or fact data where entire partitions get reprocessed
Cost optimization is a priority (static partitions are especially cheap)
Late-arriving data falls within your overwrite window

Use microbatch when:

Your data has a clear timestamp column and you process time-bounded batches
You want built-in backfill support without writing custom scripts
Batch-level retry would save significant reprocessing time on failure
You’d rather configure incremental logic than write is_incremental() SQL

Use append when:

Data is truly append-only with no corrections or updates
Deduplication happens in a downstream layer
You’re on Snowflake or Databricks

Warehouse-Specific Recommendations

BigQuery: Default to insert_overwrite with static partition lists for fact tables. Use merge with incremental_predicates for dimension tables needing row-level updates. Microbatch requires partition_by configuration matching your batch_size.

Snowflake: Start with merge for tables under 100M rows. Switch to delete+insert when tables grow or for batch-oriented updates. Never use insert_overwrite for incremental processing — it replaces the entire table. Microbatch uses delete+insert under the hood and works without extra configuration.

Databricks: Merge works well with Delta Lake’s ACID guarantees and supports schema evolution. Replace_where is often cleaner than insert_overwrite for date-bounded updates. Check that spark.sql.sources.partitionOverwriteMode=DYNAMIC is set if using insert_overwrite — a v1.8.0 regression caused full table replacement without it.

Performance Benchmarks

These numbers come from production systems and illustrate why strategy selection matters at scale:

Scenario	Merge	Better Alternative	Improvement
500M rows, Snowflake	44 min	13 min (delete+insert)	3.4x
192GB partition, BigQuery	43 min	26 min (insert_overwrite)	1.7x
Full vs incremental scan, BigQuery	24.6 GB scanned	500 MB scanned	50x reduction
With incremental_predicates added	9.5s	4s	2.4x

The pattern is consistent: merge works at smaller scales, but alternative strategies pull ahead significantly once tables grow past 100M rows.

Common Traps

Null values in unique_key columns cause match failures during merge, silently creating duplicates instead of updates. Every column in your unique_key must be non-null.

Snowflake’s insert_overwrite replaces the entire table. If your incremental query selects only the last 3 days, you lose everything older. Use delete+insert instead.

No lookback window means late-arriving data gets skipped permanently. A 3-day lookback handles the majority of late arrivals in most systems. For deeper patterns, see the late-arriving data strategies article.

Merge without incremental_predicates on large tables negates the benefit of incremental processing — you’re still scanning the whole destination on every run.

Relying solely on unique_key for deduplication fails on the initial full refresh, which doesn’t apply merge logic. Include explicit deduplication (QUALIFY/ROW_NUMBER) in your SELECT.