Converting an existing incremental model to microbatch is usually straightforward. The mechanical changes are simple: remove the is_incremental() block, add event_time and batch_size configuration, and let dbt handle the filtering. The harder part is knowing when the conversion makes sense and handling the first run after migration.
When Migration Makes Sense
Not every incremental model benefits from microbatch. Before converting, check that your model meets these criteria:
-
Clear timestamp column: Your model needs a column that cleanly defines when each record belongs in time. This becomes your
event_time. Models that update records across arbitrary time ranges — like SCD Type 2 or complex CDC patterns — don’t fit the batch model. -
Hourly or coarser granularity: Microbatch’s minimum
batch_sizeishour. If your current model processes sub-hourly windows (15-minute or 5-minute intervals), microbatch won’t work. See the tradeoffs note for details. -
Time-bounded processing: Your current
is_incremental()logic should already be filtering by a time column. If your incremental filter is based on a non-time column (like a monotonically increasing ID or a status flag), microbatch isn’t the right strategy. -
Backfill or retry value: Microbatch adds overhead (sequential batch execution) in exchange for batch-level retry and built-in backfill. If your model never fails and you never need to reprocess specific date ranges, that trade isn’t worth it.
Side-by-Side: Before and After
Here’s a typical traditional incremental model and its microbatch equivalent.
Traditional Incremental (delete+insert)
{{ config( materialized='incremental', incremental_strategy='delete+insert', unique_key='date_day') }}
SELECT event_id, user_id, event_occurred_at, event_name, event_propertiesFROM {{ ref('base__app__events') }}{% if is_incremental() %}WHERE date_day >= ( SELECT {{ dbt.dateadd("day", -3, "max(date_day)") }} FROM {{ this }}){% endif %}Microbatch Equivalent
{{ config( materialized='incremental', incremental_strategy='microbatch', event_time='event_occurred_at', batch_size='day', lookback=3, begin='2020-01-01') }}
SELECT event_id, user_id, event_occurred_at, event_name, event_propertiesFROM {{ ref('base__app__events') }}The differences:
is_incremental()block removed: dbt handles time filtering automatically based onevent_timeandbatch_size.event_timeadded: Points to the timestamp column that defines batch boundaries. This replaces the manualWHERE date_day >= ...logic.lookbackreplaces manual window: Thelookback=3parameter tells dbt to reprocess the previous 3 batches, exactly like thedateadd("day", -3, ...)pattern in the original.begindate added: Required for microbatch. Defines the earliest date dbt will process when doing a full build.unique_keyremoved: Not required for most warehouses. On BigQuery, microbatch usesinsert_overwriteunder the hood (partition replacement, not row matching). On Snowflake, it usesdelete+insert. Only PostgreSQL’s microbatch requiresunique_keybecause it delegates to merge.
Step-by-Step Migration
1. Map Your Existing Logic
Before changing any code, identify:
- Which column is your time filter? This becomes
event_time. It should be the column in yourWHEREclause that filters records by time. - What’s your current lookback window? If you’re subtracting 3 days in your
is_incremental()block, setlookback=3. - What granularity fits? If you process daily, use
batch_size='day'. If your model runs hourly and processes the last hour, usebatch_size='hour'. - What’s the earliest date in your table? This becomes
begin. CheckSELECT MIN(event_time_column) FROM your_table.
2. Add Warehouse-Specific Configuration
Microbatch uses different underlying strategies per warehouse. Some require extra config:
BigQuery requires partition_by matching your event_time:
{{ config( materialized='incremental', incremental_strategy='microbatch', partition_by={ "field": "event_occurred_at", "data_type": "timestamp", "granularity": "day" }, event_time='event_occurred_at', batch_size='day', begin='2020-01-01') }}The partition_by granularity should match your batch_size. Mismatches won’t break the model but reduce efficiency — a monthly partition with daily batches means multiple batches write to the same partition, losing the atomic partition-replacement benefit.
Snowflake and Databricks need no extra configuration. The underlying strategies (delete+insert and replace_where respectively) work without additional settings.
PostgreSQL requires unique_key because microbatch delegates to merge:
{{ config( materialized='incremental', incremental_strategy='microbatch', unique_key='event_id', event_time='event_occurred_at', batch_size='day', begin='2020-01-01') }}3. Configure Upstream event_time
If your upstream models (the ones referenced via ref()) have a timestamp column, configure event_time on them too. This enables automatic upstream filtering — dbt will filter source tables to match the current batch window, preventing full table scans.
models: - name: base__app__events config: event_time: event_occurred_atThis isn’t required for migration, but it’s a meaningful performance optimization. Without it, each batch reads the full upstream table.
4. Remove Redundant Logic
Delete the is_incremental() block entirely. Any manual time filtering, lookback calculations, or conditional logic related to incremental processing is now handled by the microbatch configuration.
Also remove:
- Manual
dateadd/DATE_SUBlookback calculations - References to
{{ this }}for determining the max processed date - Any
unique_keyconfiguration (unless on PostgreSQL) - Custom backfill variable overrides (microbatch has built-in backfill)
5. Validate With Unit Tests
Before deploying, validate that the microbatch version produces the same results as the traditional version. Run both side by side and compare outputs using unit tests or manual comparison queries:
-- Compare row counts by daySELECT DATE(event_occurred_at) AS event_date, COUNT(*) AS row_countFROM traditional_modelGROUP BY 1
EXCEPT DISTINCT
SELECT DATE(event_occurred_at) AS event_date, COUNT(*) AS row_countFROM microbatch_modelGROUP BY 1;If the results diverge, the most common causes are:
- Timezone differences: Microbatch uses UTC for batch boundaries. If your original model used local time, records near midnight will land in different batches.
- Lookback mismatch: Make sure
lookbackmatches your original window size exactly. - Deduplication changes: If your original model relied on
unique_keyfor deduplication during merge, and the new microbatch delegates toinsert_overwrite(BigQuery), you may need explicit deduplication in your SELECT.
Handling the First Run
When you first run after converting to microbatch, you have two options:
Option A: Full Refresh
The simplest approach. Run dbt run --full-refresh --select your_model to rebuild the table from scratch using microbatch logic. dbt processes every batch from begin to now, sequentially.
For small-to-medium tables, this is fine. For very large tables, it’s slow — a table with 5 years of daily data means 1,825 sequential batch queries. Consider option B.
Option B: Bounded Rebuild in Chunks
Use --event-time-start and --event-time-end to rebuild in manageable chunks:
# Rebuild 2024 datadbt run --full-refresh --select int__sessions_aggregated \ --event-time-start "2024-01-01" --event-time-end "2024-07-01"
dbt run --full-refresh --select int__sessions_aggregated \ --event-time-start "2024-07-01" --event-time-end "2025-01-01"This is especially useful on BigQuery where each batch’s compute cost is proportional to the data scanned. Processing in smaller chunks keeps individual query costs manageable and lets you retry specific chunks if something fails.
Option C: Keep Existing Data
If your existing table structure is compatible (same columns, same partitioning), you may not need a full refresh at all. The next incremental run will use microbatch logic to process new batches from where the table left off. dbt determines the “last processed” point from the existing data in the table.
This works best when:
- The table is already partitioned by the same column you’re using as
event_time - You haven’t changed any transformation logic, only the incremental strategy
- You’re comfortable with the existing historical data staying as-is
Common Migration Pitfalls
Forgetting begin: Without begin, dbt doesn’t know how far back to process on a full refresh. The model will error.
UTC boundary surprises: Your original model may have used local-time batch boundaries (midnight Eastern, midnight CET). Microbatch uses UTC exclusively. Records near the end of the local business day will land in different batches than before. If this matters for your reporting, convert timestamps to UTC before setting them as event_time.
PostgreSQL unique_key requirement: Every other warehouse works without unique_key for microbatch, but PostgreSQL needs it because microbatch delegates to merge. Missing it causes errors, not silent failures.
Over-aggressive full_refresh=false: If you set full_refresh=false immediately after migration, you can’t easily do a full rebuild if something goes wrong. Wait until you’ve validated the microbatch behavior before adding this protection.