Incremental Predicates for dbt Merge

The merge strategy’s biggest performance problem isn’t the merge itself — it’s the destination table scan. Without incremental_predicates, a MERGE statement compares every incoming row against every row in the destination table. Insert 1,000 records into a 500 million row table, and the warehouse still reads all 500 million rows on the destination side to check for matches. That full scan happens on every single run, meaning your “incremental” model still reads the entire table each time.

incremental_predicates fix this by adding WHERE filters to the destination side of the MERGE statement. The warehouse uses these filters for partition pruning, scanning only the partitions that could possibly contain matching rows.

How It Works

You configure incremental_predicates as a list of SQL expressions in your model’s config block. These expressions are injected into the MERGE statement as filters on the destination table (aliased as DBT_INTERNAL_DEST):

{{ config(
    materialized='incremental',
    unique_key='id',
    incremental_strategy='merge',
    incremental_predicates=[
        "DBT_INTERNAL_DEST.created_at > dateadd(day, -7, current_date)"
    ]
) }}

SELECT
    id,
    created_at,
    user_id,
    event_type
FROM {{ ref('base__analytics__events') }}
{% if is_incremental() %}
WHERE created_at > dateadd(day, -7, current_date)
{% endif %}

The generated MERGE statement looks something like:

MERGE INTO target AS DBT_INTERNAL_DEST
USING tmp AS DBT_INTERNAL_SOURCE
ON DBT_INTERNAL_DEST.created_at > dateadd(day, -7, current_date)
   AND DBT_INTERNAL_DEST.id = DBT_INTERNAL_SOURCE.id
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT ...

Notice the predicate lands in the ON clause alongside the unique_key match. This lets the warehouse prune partitions before doing the row-by-row comparison.

The Two Filters Are Doing Different Jobs

A common source of confusion: the is_incremental() WHERE clause and incremental_predicates look like they’re doing the same thing, but they operate on different tables.

is_incremental() WHERE clause filters the source query. It controls which rows enter the temporary staging table.
incremental_predicates filter the destination table. They control which rows the warehouse scans during the MERGE match.

You need both. Without the source filter, you’re reading too much source data. Without the destination predicate, you’re scanning the entire target table during the merge.

-- Source filter (limits what goes INTO the temp table)
{% if is_incremental() %}
WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
{% endif %}

-- Destination filter (limits what the MERGE scans)
{{ config(
    incremental_predicates=[
        "DBT_INTERNAL_DEST.event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)"
    ]
) }}

The predicate window should be at least as wide as your source filter. If your source filter covers 7 days, your predicate should cover 7 days or more. A narrower predicate could cause the MERGE to miss legitimate matches on the destination side.

When to Add Predicates

The rule of thumb from practitioners: add incremental_predicates when your destination table exceeds 100M rows. Below that threshold, the full table scan during merge is usually fast enough that the added configuration isn’t worth the complexity.

Above 100M rows, the impact is dramatic. Production benchmarks show 2-4x improvements in merge runtime just from adding predicates. On BigQuery, where you pay per byte scanned, the cost reduction can be even more significant — an 88%+ reduction in destination-side bytes scanned.

BigQuery Considerations

On BigQuery, incremental_predicates interact with the require_partition_filter table option. If you’ve set require_partition_filter=true on a table and you’re using the merge strategy, the MERGE will fail unless you also provide incremental_predicates that include a partition filter. BigQuery enforces the partition filter requirement on the destination scan, and without predicates, there’s no filter to satisfy it.

BigQuery also can’t perform partition pruning from subqueries. This means a predicate like DBT_INTERNAL_DEST.event_date = (SELECT MAX(event_date) FROM other_table) won’t actually trigger pruning. Use literal expressions or date arithmetic against CURRENT_DATE() instead.

Snowflake Considerations

On Snowflake, predicates help with micro-partition pruning. Snowflake doesn’t have explicit partitions like BigQuery, but it organizes data into micro-partitions and maintains metadata about the value ranges in each. A well-targeted predicate lets Snowflake skip micro-partitions whose ranges don’t overlap with the predicate window.

For Snowflake tables above 500M rows, consider whether delete+insert would be a better choice entirely. At that scale, even with predicates, merge’s row-by-row comparison is slower than delete+insert’s bulk operations.

Aligning With Lookback Windows

If you’re using a lookback window pattern for late-arriving data, your incremental predicates should match or exceed the lookback window. If your lookback reprocesses 3 days of source data, the destination predicate needs to cover at least 3 days — otherwise the merge can’t find the destination rows that need updating.

{% set lookback_days = 3 %}

{{ config(
    materialized='incremental',
    unique_key='event_id',
    incremental_strategy='merge',
    incremental_predicates=[
        "DBT_INTERNAL_DEST.event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL " ~ lookback_days ~ " DAY)"
    ]
) }}

SELECT *
FROM {{ ref('base__events') }}
{% if is_incremental() %}
WHERE event_date >= (
    SELECT DATE_SUB(MAX(event_date), INTERVAL {{ lookback_days }} DAY)
    FROM {{ this }}
)
{% endif %}

This keeps both filters synchronized. If you change the lookback window, both the source filter and destination predicate update together.

The Alternative: Switch Strategies

incremental_predicates are a patch on merge’s fundamental limitation. If you find yourself adding predicates because your table is too large for a comfortable merge, consider whether a different strategy — insert_overwrite on BigQuery, delete+insert on Snowflake — would be a better architectural choice. Those strategies don’t need predicates because they don’t perform row-by-row comparisons against the full destination table in the first place.

Predicates make the most sense when you genuinely need merge behavior (row-level updates on individually changing records) but your table has grown large enough that the default full scan is painful.