With traditional incremental models, reprocessing a specific date range requires custom scripts, dbt variable overrides, or targeted full refreshes with manual partition management. Microbatch provides built-in CLI flags for targeted backfill, batch-level retry on failure, and full-refresh protection.
Targeted Backfill With CLI Flags
The --event-time-start and --event-time-end flags let you specify exact date ranges to reprocess:
# Reprocess September 1-3, 2024dbt run --select int__sessions_aggregated \ --event-time-start "2024-09-01" \ --event-time-end "2024-09-04"dbt generates separate queries for each batch within that range. With batch_size='day', this runs three batch queries (September 1, 2, and 3). Each batch replaces the corresponding period in the target table using the underlying warehouse strategy — insert_overwrite on BigQuery, delete+insert on Snowflake, replace_where on Databricks.
The end date is exclusive, matching standard interval conventions. --event-time-end "2024-09-04" processes up to but not including September 4th.
These flags work with model selection, so you can backfill a single model or a group:
# Backfill all models with the microbatch+ tagdbt run --select tag:microbatch --event-time-start "2024-09-01" --event-time-end "2024-10-01"
# Backfill a model and all its downstream dependenciesdbt run --select int__sessions_aggregated+ --event-time-start "2024-09-01" --event-time-end "2024-09-04"This replaces the custom backfill patterns that teams typically build with traditional incremental models — variable overrides, shell scripts that loop through dates, or one-off SQL that manually deletes and reinserts partitions.
Batch-Level Retry
When a microbatch run fails partway through, dbt retry picks up from the failed batch instead of starting over:
# Initial run processes 30 days, fails on day 17dbt run --select int__sessions_aggregated \ --event-time-start "2024-09-01" \ --event-time-end "2024-10-01"
# Retry only the failed batch (day 17) and continue from theredbt retryWith traditional incremental, a failure in a 30-day backfill means rerunning all 30 days. The first 16 days of work are wasted. With microbatch, days 1-16 are already committed to the target table, and retry starts from day 17.
This is especially valuable for:
- Large historical backfills where reprocessing from scratch costs real money (BigQuery bytes scanned, Snowflake compute credits)
- Flaky source systems where intermittent failures are common and retrying a single batch is much cheaper than retrying everything
- Timeout-prone queries where individual batches might fail due to resource limits but most batches complete fine
The retry mechanism works because each batch is a self-contained operation. dbt tracks which batches succeeded and which failed in the run artifacts. The dbt retry command reads those artifacts and re-executes only the failures.
Bounded Full Refresh
Traditional --full-refresh rebuilds a table from scratch — from the begin date to now, processing everything. For a table with years of history, that’s expensive and slow. Microbatch lets you scope a full refresh to a specific time range:
# Full refresh only January 2024dbt run --full-refresh \ --select int__sessions_aggregated \ --event-time-start "2024-01-01" \ --event-time-end "2024-02-01"This rebuilds only the specified range while leaving the rest of the table intact. It’s the difference between “rebuild everything from 2020” and “rebuild just the month that had bad data.”
This is particularly useful when:
- A source system corrected historical data for a specific period
- You changed transformation logic and need to reprocess a bounded window
- A schema change requires reprocessing but only affects data after a certain date
Protecting Against Accidental Full Refreshes
Large incremental tables can be extremely expensive to rebuild. A careless dbt run --full-refresh on a table with 3 years of event data can take hours and cost hundreds of dollars in compute. Microbatch provides a safety net:
{{ config( materialized='incremental', incremental_strategy='microbatch', event_time='event_occurred_at', batch_size='day', begin='2020-01-01', full_refresh=false) }}With full_refresh=false, running dbt run --full-refresh on this model fails with an error rather than silently rebuilding the entire table. This is a guardrail, not a permanent lock — you can still do bounded refreshes using --event-time-start and --event-time-end.
This protection matters most for:
- Production tables where an accidental full refresh would cause hours of downtime while the table rebuilds
- Large-volume event tables where the rebuild cost is significant (think terabytes of BigQuery scans at $6.25/TB)
- Tables with no natural rebuild window — some tables are too large to rebuild even overnight
The full_refresh=false setting applies at the model level. You can set it globally in dbt_project.yml for all microbatch models and override it per-model where full refreshes are acceptable:
models: my_project: marts: +full_refresh: false # Protect all mart models staging: +full_refresh: true # Staging can be rebuilt freelyOperational Patterns
Scheduled Backfill After Source Outages
When a source system has a known outage, you can schedule a targeted backfill for the affected window once the source recovers:
# Source was down March 15-17, data backfilled on March 20dbt run --select tag:source_dependent \ --event-time-start "2024-03-15" \ --event-time-end "2024-03-18"Chunked Historical Rebuild
For very large tables where processing the entire history at once would exceed resource limits or be unacceptably slow, process in monthly chunks:
# Rebuild 2024 one month at a timefor month in 01 02 03 04 05 06 07 08 09 10 11 12; do dbt run --select int__sessions_aggregated \ --event-time-start "2024-${month}-01" \ --event-time-end "2024-$((10#$month + 1))-01" || breakdoneEach chunk processes independently. If month 7 fails, months 1-6 are already committed, and you can retry just month 7.
Validating Backfill Results
After a backfill, compare the reprocessed batches against expected counts or aggregates:
-- Check that backfilled days have expected row countsSELECT DATE(session__started_at) AS batch_date, COUNT(*) AS row_countFROM int__sessions_aggregatedWHERE session__started_at >= '2024-09-01' AND session__started_at < '2024-09-04'GROUP BY 1ORDER BY 1;This is basic, but it catches the most common backfill failures: empty batches (source data not available), duplicate batches (lookback overlap not handled correctly), and count discrepancies versus the source.
Comparison With Traditional Backfill Approaches
| Aspect | Traditional Incremental | Microbatch |
|---|---|---|
| Backfill mechanism | Variable overrides, custom scripts | Built-in --event-time-start/end flags |
| Failure recovery | Reprocess entire range | Retry only failed batches |
| Full refresh scope | Entire table | Bounded to specific date range |
| Protection | None built-in | full_refresh=false config |
| Granularity | Whatever your script handles | Per-batch (hour/day/month) |
With microbatch, backfill capability is part of the model configuration rather than a separate set of scripts. The standard dbt CLI handles targeted backfill without custom procedures.