Microbatch (dbt 1.9+) is an incremental strategy with built-in backfill, batch-level retry, and declarative time-based configuration instead of is_incremental() boilerplate. It has three limitations that determine whether it fits a given model.
How Microbatch Works
Instead of writing conditional is_incremental() logic, you configure time-based batching declaratively:
{{ config( materialized='incremental', incremental_strategy='microbatch', event_time='session_start', begin='2020-01-01', batch_size='day', lookback=3) }}
-- No is_incremental() block neededSELECT session_id, session_start, user_id, session_durationFROM {{ ref('base__analytics__sessions') }}dbt generates separate queries for each batch period (hour, day, or month). On a normal incremental run, it processes only the latest unprocessed batches plus the lookback window. For backfills, you specify explicit time ranges:
# Targeted backfilldbt run --event-time-start "2024-09-01" --event-time-end "2024-09-04"
# Retry only failed batches from the last rundbt retryThe batch-level retry is especially valuable: if a 30-day backfill fails on day 17, you retry only day 17 instead of all 30 days.
Tradeoff 1: UTC Timezone Assumption
Microbatch uses UTC for all time calculations. Batch boundaries always fall at midnight UTC (for daily batches) or the top of the hour (for hourly batches). There’s no configuration to change this.
If your event_time column is stored in a local timezone — say US/Eastern — records near midnight local time will land in unexpected batches. An event at 11:30 PM Eastern on March 15 is 4:30 AM UTC on March 16, so it lands in the March 16 batch, not March 15.
For most analytical workloads this doesn’t matter. Aggregations that roll up to monthly or weekly granularity absorb the boundary offset. But if your business logic depends on records landing in the correct local-date batch — say, daily revenue reporting where you close the books at midnight local time — the UTC assumption creates off-by-one-day errors.
The fix: Convert your event_time column to UTC before using microbatch. If your source data is in a local timezone:
SELECT session_id, CONVERT_TIMEZONE('US/Eastern', 'UTC', session_start) AS session_start_utc, user_id, session_durationFROM {{ ref('base__analytics__sessions') }}Then configure event_time='session_start_utc'. This ensures batch boundaries align with the actual UTC timestamps, and downstream reporting can convert back to local time as needed.
If converting isn’t feasible, accept that batch boundaries won’t align with your business day and design downstream logic accordingly.
Tradeoff 2: No Sub-Hourly Granularity
The smallest batch_size is hour. If you need to process data in 15-minute or 5-minute windows — common for real-time dashboards, alerting systems, or near-real-time reporting — microbatch won’t work.
This limitation is architectural. Microbatch generates one SQL query per batch, and each query has overhead: compilation, warehouse scheduling, query execution, result merging. At sub-hourly granularity, that per-query overhead dominates total runtime. A 24-hour backfill at 5-minute granularity would be 288 separate queries.
When this matters: Systems that need data freshness under 1 hour. Streaming-adjacent workloads. Models that feed operational dashboards with latency SLAs.
The alternative: Use traditional incremental logic with a manual time filter. You lose microbatch’s built-in backfill and retry capabilities, but you can filter to whatever granularity your use case requires:
{% if is_incremental() %}WHERE event_timestamp >= ( SELECT DATEADD(MINUTE, -15, MAX(event_timestamp)) FROM {{ this }}){% endif %}Tradeoff 3: Sequential Execution
Each batch runs one after another, not in parallel. A 30-day backfill with batch_size='day' means 30 separate query executions, sequentially.
A single query processing all 30 days at once is almost always faster than 30 separate queries. Each query has fixed overhead: Jinja compilation, query planning, warehouse scheduling. Multiply that by 30 and the overhead accumulates. In practice, a 30-day traditional incremental run might take 5 minutes, while the same range as microbatch takes 15-20 minutes.
The tradeoff is explicit: microbatch trades speed for failure isolation. If day 17 of a 30-day traditional incremental run fails, you reprocess all 30 days. With microbatch, you retry only day 17. For workloads where failure is common (flaky source systems, timeout-prone queries) or reprocessing is expensive (very large daily volumes), the retry benefit can outweigh the speed cost.
When sequential execution hurts most:
- Large historical backfills (months or years of data)
- Models with low per-batch volume where per-query overhead dominates
- Time-sensitive backfill scenarios where you need data ready quickly
When it’s acceptable:
- Daily or hourly incremental runs processing only the latest batch
- Workloads with occasional failures where batch-level retry saves significant reprocessing
- Teams that value operational simplicity over raw speed
How Microbatch Delegates to Warehouse Strategies
Under the hood, microbatch doesn’t implement its own merge logic. It delegates to existing strategies per warehouse:
- BigQuery: Uses
insert_overwrite(requirespartition_bymatching yourbatch_size) - Snowflake: Uses
delete+insert - Databricks: Uses
replace_where
This means microbatch inherits the properties of the underlying strategy. On BigQuery, it’s atomic and partition-based. On Snowflake, it has the same atomicity caveats as delete+insert. Knowing the underlying strategy helps you understand failure modes and performance characteristics without reading microbatch-specific documentation.
Comparison With Traditional Incremental
| Aspect | Traditional Incremental | Microbatch |
|---|---|---|
| Query structure | Single SQL for all new data | Separate SQL per batch |
| Batch definition | User-defined in SQL | Configured via event_time, batch_size |
| Retry granularity | Entire model | Individual batches |
| Backfill | Custom logic or full refresh | Built-in --event-time-start/end |
| Time filtering | is_incremental() block | Automatic |
| Minimum granularity | Whatever you write in SQL | 1 hour |
| Execution speed | Single query (faster) | Sequential batches (slower) |
When Microbatch Is the Right Choice
Microbatch fits best when:
- Your data has a clear timestamp column and processes in natural time windows
- You want built-in backfill without writing custom scripts or one-off SQL
- Batch-level retry would save meaningful reprocessing time when failures occur
- You prefer declarative configuration over writing
is_incremental()logic - Hourly or daily granularity is sufficient for your use case
It doesn’t fit when you need sub-hourly processing, when backfill speed is critical, or when your data doesn’t have a clean timestamp column to partition on.
For a broader look at how microbatch compares to other strategies, see the strategy decision framework. For implementation details and backfill commands, see the microbatch strategy guide.