ServicesAboutNotesContact Get in touch →
EN FR
Note

on_schema_change in dbt Incremental Models

How dbt handles column additions and removals in incremental models, the four on_schema_change options, and why none of them backfill historical data.

Planted
dbtincremental processingdata modeling

Incremental models have a problem that table materializations don’t: the schema can change. When you add a column to your SELECT, the incoming data has that column but the existing table doesn’t. When you remove a column, the existing table still has it but new data doesn’t. The on_schema_change config controls how dbt handles this mismatch.

By default, dbt uses ignore, which is the option most likely to cause confusion in production. Understanding all four options — and their shared limitation — prevents the kind of silent failures that take weeks to notice.

The Four Options

ignore (default)

New columns in your model are silently dropped during the merge into the existing table. The column exists in your staging query but never makes it to the target. Removed columns cause the model to fail because dbt tries to INSERT data that doesn’t include a column the target table expects.

This is the default because it was the original behavior before schema evolution support existed. It’s the worst default for teams actively developing models — you add a column, the run succeeds, and you assume it’s there. It isn’t.

fail

The model errors immediately if the schema of the incoming data doesn’t match the existing table. No data is written.

This is the safest option for production models where schema changes should go through a deliberate process. It forces you to either run --full-refresh or explicitly handle the migration. Pair it with model contracts for maximum strictness.

append_new_columns

New columns are added to the existing table via ALTER TABLE. Removed columns are left alone — they stay in the table with NULL values for new rows.

This is the most practical option for models under active development. You add a column to your SELECT, and on the next incremental run dbt adds it to the target table automatically. The catch: existing rows get NULL for the new column. If you need the new column populated for historical data, you need a --full-refresh.

sync_all_columns

Full bidirectional sync. New columns are added, removed columns are dropped. This is the most aggressive option — it actively removes columns from your target table if they’re no longer in your model’s SELECT.

Use this carefully. If you accidentally remove a column from your model (say, during a refactor), sync_all_columns will drop it from the target table on the next run. That data is gone unless you have backups or can full-refresh from source.

The Shared Limitation: No Backfill

None of these options backfill historical records when a new column is added. This is the single most important thing to understand about on_schema_change.

When append_new_columns or sync_all_columns adds a column, existing rows get NULL values for that column. Only rows processed after the column was added will have values. If your table has 2 years of data and you add a channel column, only data from the next incremental run onward will have channel populated. The previous 2 years will show NULL.

| event_id | event_date | channel |
|----------|------------|----------|
| 001 | 2024-01-15 | NULL | <- Existed before column was added
| 002 | 2024-06-20 | NULL | <- Existed before column was added
| 003 | 2026-03-27 | organic | <- Processed after column was added

To backfill, you have two options:

  1. Full refresh: dbt run --select model_name --full-refresh. Rebuilds the entire table from scratch. Simple but expensive on large tables, and you need to be careful with [[Incremental Models in dbt|models that have full_refresh: false set]].

  2. Manual UPDATE: Run a targeted UPDATE statement outside of dbt to populate the new column for historical records. More surgical but requires warehouse access and isn’t tracked by dbt.

Configuration

Set on_schema_change in your model’s config block:

{{ config(
materialized='incremental',
unique_key='event_id',
on_schema_change='append_new_columns'
) }}

Or in dbt_project.yml for project-wide defaults:

models:
my_project:
marts:
+on_schema_change: fail
staging:
+on_schema_change: append_new_columns

A reasonable pattern: fail for production marts (where schema changes should be deliberate), append_new_columns for staging and intermediate models (where iteration is frequent).

Interaction With Strategies

Schema change handling works slightly differently across incremental strategies:

  • merge: Schema changes apply to the target table before the MERGE executes. New columns are added via ALTER TABLE, then the MERGE includes them.
  • insert_overwrite: Since entire partitions are replaced, schema changes within replaced partitions are naturally handled. But unreplaced partitions still have NULLs for new columns.
  • microbatch: Each batch runs independently. If you add a column mid-backfill, only batches processed after the schema change will include the new column.

When Schema Changes Signal a Full Refresh

Some schema changes shouldn’t be handled incrementally at all. If you’re changing the grain of your model (adding a column to unique_key), changing the partition scheme, or fundamentally restructuring the transformation, a --full-refresh is the right call. The on_schema_change options handle column additions and removals — they don’t handle structural changes to how data is organized in the table.

The fail option makes this explicit: if the schema changed, something significant happened, and a human should decide how to handle it. For critical models in production, that deliberateness is worth the minor friction.