Adrienne Vermorel

Microbatch Incremental Strategy in dbt 1.9: A Practical Guide

If you’ve written incremental models in dbt, you know the is_incremental() dance. You craft filter logic, handle edge cases for late-arriving data, and manually implement backfill procedures. It works, but it’s tedious.

dbt 1.9 introduced microbatch, an incremental strategy that handles time-partitioned processing differently. Instead of one query for all new data, microbatch runs separate queries for each time period (hour, day, month). This changes how you think about incremental models.

This guide walks through when microbatch makes sense, how to configure it, and what to watch out for across different warehouses.

How microbatch differs from traditional incremental

Traditional incremental models run a single query that processes all new data since the last run. You write the is_incremental() block, define your lookback window, and hope nothing falls through the cracks.

Microbatch flips this model. Instead of one query, dbt runs separate queries for each time batch:

AspectTraditional IncrementalMicrobatch
Query structureSingle SQL for all new dataSeparate SQL per batch
User responsibilityWrite is_incremental() logicNo conditional logic needed
Batch definitionUser-defined in SQLConfigured via event_time, batch_size
Retry granularityEntire modelIndividual batches
BackfillCustom logic requiredBuilt-in flags

If processing fails halfway through a month of data, traditional incremental reruns everything. Microbatch retries only the failed batches.

Basic configuration

A microbatch model requires three configurations: the timestamp column (event_time), the batch granularity (batch_size), and the earliest date to process (begin).

{{ config(
materialized='incremental',
incremental_strategy='microbatch',
event_time='session__started_at',
batch_size='day',
begin='2020-01-01'
) }}
SELECT
session_id,
user_id,
session__started_at,
session__page_views,
session__duration_seconds
FROM {{ ref('base__app__sessions') }}

There’s no is_incremental() block. dbt handles the filtering automatically based on your event_time column.

The batch_size options are hour, day, month, or year. Choose based on your data volume and how frequently you need updates. Daily is the most common choice for event data.

Handling late-arriving data

Late-arriving data is a constant problem with incremental models. Records show up after you’ve already processed their time period, creating gaps in your data.

Microbatch handles this with the lookback parameter:

{{ config(
materialized='incremental',
incremental_strategy='microbatch',
event_time='event_occurred_at',
batch_size='day',
lookback=3,
begin='2020-01-01'
) }}
SELECT
event_id,
user_id,
event_occurred_at,
event_name,
event_properties
FROM {{ ref('base__app__events') }}

With lookback=3, each run reprocesses the three previous batches along with any new ones. If today is January 10th and you run your model, dbt processes batches for January 7, 8, 9, and 10.

This replaces the manual lookback pattern you’d write in traditional incremental models:

-- The old way
{% if is_incremental() %}
WHERE event_date >= (
SELECT dateadd(day, -3, max(event_date))
FROM {{ this }}
)
{% endif %}

Warehouse-specific behavior

Microbatch uses different underlying strategies depending on your warehouse. This matters because it affects what additional configuration you might need.

WarehouseStrategy UsedAdditional Config
BigQueryinsert_overwritepartition_by required
Snowflakedelete+insertNone
Databricksreplace_whereNone
Redshiftdelete+insertNone
PostgreSQLmergeunique_key required

BigQuery configuration

BigQuery’s microbatch requires partition configuration to match your event_time:

{{ config(
materialized='incremental',
incremental_strategy='microbatch',
partition_by={
"field": "session__started_at",
"data_type": "timestamp",
"granularity": "day"
},
event_time='session__started_at',
batch_size='day',
begin='2020-01-01'
) }}

The partition_by granularity should match your batch_size. Mismatches won’t break your model, but they’ll reduce efficiency.

PostgreSQL configuration

PostgreSQL uses merge under the hood, so you need a unique_key:

{{ config(
materialized='incremental',
incremental_strategy='microbatch',
unique_key='session_id',
event_time='session__started_at',
batch_size='day',
begin='2020-01-01'
) }}

Automatic upstream filtering

One of microbatch’s useful features is automatic filtering of upstream models. If a model you reference with ref() also has event_time configured, dbt automatically filters it to match the current batch.

-- If base__app__page_views has event_time configured,
-- dbt filters it automatically for each batch
SELECT
page_view_id,
session_id,
user_id,
page_url,
viewed_at
FROM {{ ref('base__app__page_views') }}

This prevents full table scans of your source tables. Without this, each batch would read the entire upstream table and then filter.

For tables without event_time (like dimension tables), dbt reads the full table for each batch. This is usually fine for small lookup tables, but be aware of it for larger references.

To opt out of automatic filtering on a specific reference:

-- Force full table read even if event_time is configured
SELECT
product_id,
product_name,
product_category
FROM {{ ref('mrt__product__products').render() }}

Backfill commands

Built-in backfill support removes a lot of manual work. Instead of writing custom scripts or overriding variables, you use CLI flags:

Terminal window
# Process a specific date range
dbt run --select int__sessions_aggregated --event-time-start "2024-09-01" --event-time-end "2024-09-04"
# Retry only failed batches from last run
dbt retry
# Full refresh with bounded history
dbt run --full-refresh --event-time-start "2024-01-01" --event-time-end "2024-02-01"

The --event-time-start and --event-time-end flags work with both regular runs and full refreshes. This makes it straightforward to reprocess specific time periods without touching the rest of your data.

Protecting against accidental full refreshes

Large incremental tables can be expensive to rebuild. Microbatch lets you prevent accidental full refreshes:

{{ config(
materialized='incremental',
incremental_strategy='microbatch',
event_time='event_occurred_at',
batch_size='day',
begin='2020-01-01',
full_refresh=false
) }}

With full_refresh=false, running dbt run --full-refresh on this model will fail rather than rebuild from scratch. You can still do bounded refreshes using the --event-time-start and --event-time-end flags.

Migration from traditional incremental

Converting an existing incremental model to microbatch is usually straightforward.

Traditional incremental version:

{{ config(
materialized='incremental',
incremental_strategy='delete+insert',
unique_key='date_day'
) }}
SELECT
event_id,
user_id,
event_occurred_at,
event_name,
event_properties
FROM {{ ref('base__app__events') }}
{% if is_incremental() %}
WHERE date_day >= (
SELECT {{ dbt.dateadd("day", -3, "max(date_day)") }}
FROM {{ this }}
)
{% endif %}

Microbatch version:

{{ config(
materialized='incremental',
incremental_strategy='microbatch',
event_time='event_occurred_at',
batch_size='day',
lookback=3,
begin='2020-01-01'
) }}
SELECT
event_id,
user_id,
event_occurred_at,
event_name,
event_properties
FROM {{ ref('base__app__events') }}

You remove the is_incremental() block, add event_time pointing to your timestamp column, and set batch_size and lookback to match your previous logic.

When you first run after converting, you’ll likely want a full refresh to establish the batched structure. For very large tables, consider using --event-time-start and --event-time-end to rebuild in chunks.

Limitations and considerations

Microbatch isn’t the right choice for every incremental model.

dbt doesn’t track which batches have been processed. If you skip runs, you’ll have data gaps. This is the same as traditional incremental, but worth remembering.

All time calculations use UTC. If your event_time column uses a different timezone, you’ll need to handle conversion.

The minimum batch size is hour. For high-frequency streaming data, traditional incremental may still be necessary.

Each batch waits for the previous one to complete by default. concurrent_batches=true exists, but it’s new and may have edge cases.

If an hourly microbatch model feeds a daily microbatch model, you need careful orchestration to ensure upstream data is complete before downstream processing starts.

If you’re using the Fusion engine, microbatch isn’t supported yet.

When to use microbatch

Microbatch works well when:

  • Your data has a clear timestamp for partitioning
  • You process time-bounded batches (daily, hourly)
  • You need straightforward backfill capabilities
  • Batch-level retry would save significant reprocessing time
  • You want to simplify your incremental logic

Stick with traditional incremental when:

  • Your data doesn’t have a natural time dimension
  • You need sub-hourly processing
  • You’re doing complex deduplication that doesn’t fit the batch model
  • You’re updating records across arbitrary time ranges (not bounded batches)

Practical example: session aggregation

Here’s a complete example aggregating page views into sessions:

{{ config(
materialized='incremental',
incremental_strategy='microbatch',
event_time='session__started_at',
batch_size='day',
lookback=3,
begin='2023-01-01',
partition_by={
"field": "session__started_at",
"data_type": "timestamp",
"granularity": "day"
}
) }}
WITH page_views AS (
SELECT
page_view_id,
session_id,
user_id,
page_url,
viewed_at
FROM {{ ref('base__app__page_views') }}
),
sessions AS (
SELECT
session_id,
user_id,
MIN(viewed_at) AS session__started_at,
MAX(viewed_at) AS session__ended_at,
COUNT(*) AS session__page_views,
TIMESTAMP_DIFF(MAX(viewed_at), MIN(viewed_at), SECOND) AS session__duration_seconds
FROM page_views
GROUP BY session_id, user_id
)
SELECT
session_id,
user_id,
session__started_at,
session__ended_at,
session__page_views,
session__duration_seconds
FROM sessions

Because base__app__page_views has event_time configured, dbt filters it automatically for each batch. The model processes one day at a time, reprocessing the last three days on each run to catch late-arriving events.

Summary

Microbatch simplifies time-partitioned incremental models by moving batch logic from your SQL into configuration. The trade-off is less flexibility: you’re constrained to time-based batching with predetermined granularity.

For event data and time-series processing, that constraint often matches your requirements. The built-in backfill support and batch-level retries make it worth considering for new incremental models.

For complex incremental logic that doesn’t fit the batch model, traditional is_incremental() patterns remain the better choice.