Building a Custom Materialization in dbt

Sometimes the standard table materialization isn’t safe enough. You rebuild a critical table and a dashboard query hits the gap between DROP and RENAME. Or a bad filter empties your model SQL, and dbt replaces a million-row table with zero rows. Or BigQuery drops the row-level security policies you spent an afternoon configuring because dbt recreated the table from scratch.

That’s when you build your own.

A custom materialization sounds intimidating, but it’s really just a macro with a specific structure. Once you understand the anatomy, you can create materializations that match exactly how your warehouse should build tables.

This tutorial walks through building two practical materializations: a zero-downtime table swap with validation checks, and a secured table that reapplies row-level security after every build. Both solve problems that dbt’s built-in materializations don’t address.

What dbt already handles

Before writing a custom materialization, check whether built-in options solve your problem. dbt’s incremental materialization is more flexible than it looks.

Append-only behavior is built in. Set incremental_strategy='append' to insert rows without merging or updating, and add full_refresh: false to prevent --full-refresh from dropping the table.

{{ config(
    materialized='incremental',
    incremental_strategy='append',
    full_refresh=false
) }}

Partition-level overwrites are built in too. The insert_overwrite strategy in dbt-bigquery replaces entire partitions instead of merging individual rows. It builds a temp table, identifies affected partitions, and swaps them.

{{ config(
    materialized='incremental',
    incremental_strategy='insert_overwrite',
    partition_by={'field': 'event_date', 'data_type': 'date'}
) }}

The delete+insert strategy handles row-level replacement by unique key. Between these three strategies and the full_refresh config, most incremental patterns are covered without custom code.

Custom materializations are for the gaps that remain: control over the swap process itself, validation before replacement, or post-build operations that go beyond what hooks can manage cleanly.

The anatomy of a materialization

Every materialization follows the same six-step structure:

{%- materialization my_custom_mat, adapter='default' -%}

  {# 1. SETUP - Prepare relations #}
  {% set target_relation = this.incorporate(type='table') %}
  {% set existing_relation = load_cached_relation(this) %}

  {# 2. PRE-HOOKS #}
  {{ run_hooks(pre_hooks) }}

  {# 3. MAIN SQL - Build the relation #}
  {% call statement('main') %}
    {{ sql }}
  {% endcall %}

  {# 4. POST-HOOKS #}
  {{ run_hooks(post_hooks) }}

  {# 5. CLEANUP - Grants, docs, permissions #}
  {% do apply_grants(target_relation, grant_config, should_revoke) %}
  {% do persist_docs(target_relation, model) %}

  {# 6. COMMIT AND RETURN #}
  {% do adapter.commit() %}
  {{ return({'relations': [target_relation]}) }}

{%- endmaterialization -%}

The key pieces:

this is the target relation (database.schema.model_name)
sql contains the compiled SELECT statement from the model file
config holds model configuration (anything in the config() block)
statement('main') executes SQL and makes it visible in logs
adapter provides database-specific methods

You’ll work with a few adapter methods repeatedly:

{# Check if the relation already exists #}
{% set existing = load_cached_relation(this) %}

{# Get column info from a relation #}
{% set columns = adapter.get_columns_in_relation(target_relation) %}

{# Drop a relation #}
{% do adapter.drop_relation(old_relation) %}

{# Rename a relation #}
{% do adapter.rename_relation(temp_relation, target_relation) %}

The load_cached_relation() function returns the relation if it exists, or none if it doesn’t. This is how you detect first-run versus subsequent runs.

Building a zero-downtime table materialization

dbt’s standard table materialization builds a temp table, drops the existing one, then renames the temp into place. This works for most models. It breaks down for production-critical tables in two ways.

There’s a brief window between DROP and RENAME where the table doesn’t exist. If a dashboard query or a scheduled extract runs during that window, it fails with a “table not found” error. For internal models, nobody notices. For the table feeding a client-facing dashboard, it’s a support ticket.

There’s also no validation step. If your model SQL returns zero rows because a source table is empty or a date filter is wrong, the standard materialization replaces the old table with an empty one. You don’t find out until someone notices missing data.

A zero-downtime materialization solves both: it builds to a temp name, validates the result, then swaps via rename. The old table stays queryable until the new one is confirmed ready.

Create macros/materializations/zero_downtime_table.sql:

{%- materialization zero_downtime_table, adapter='bigquery' -%}

  {# Configuration #}
  {% set min_row_count = config.get('min_row_count', 1) %}

  {# Setup relations #}
  {% set target_relation = this.incorporate(type='table') %}
  {% set existing_relation = load_cached_relation(this) %}
  {% set temp_relation = make_temp_relation(target_relation) %}
  {% set backup_relation = make_backup_relation(target_relation) %}

  {# Handle type mismatch (existing is a view, not a table) #}
  {% if existing_relation is not none and existing_relation.type != 'table' %}
    {% do adapter.drop_relation(existing_relation) %}
    {% set existing_relation = none %}
  {% endif %}

  {{ run_hooks(pre_hooks) }}

  {# Build new table under a temp name. Old table stays queryable. #}
  {% call statement('main') %}
    {{ create_table_as(false, temp_relation, sql) }}
  {% endcall %}

  {# Validate before swapping #}
  {% call statement('validate', fetch_result=True) %}
    SELECT COUNT(*) AS row_count FROM {{ temp_relation }}
  {% endcall %}

  {% set row_count = load_result('validate')['data'][0][0] %}

  {% if row_count < min_row_count %}
    {% do adapter.drop_relation(temp_relation) %}
    {{ exceptions.raise_compiler_error(
      "Validation failed: " ~ row_count ~ " rows, expected at least "
      ~ min_row_count ~ ". Old table preserved."
    ) }}
  {% endif %}

  {# Swap: rename existing to backup, then temp to target #}
  {% if existing_relation is not none %}
    {% do adapter.rename_relation(existing_relation, backup_relation) %}
  {% endif %}

  {% do adapter.rename_relation(temp_relation, target_relation) %}

  {# Drop backup after successful swap #}
  {% if existing_relation is not none %}
    {% do adapter.drop_relation(backup_relation) %}
  {% endif %}

  {{ run_hooks(post_hooks) }}

  {% set grant_config = config.get('grants') %}
  {% do apply_grants(target_relation, grant_config) %}
  {% do persist_docs(target_relation, model) %}

  {% do adapter.commit() %}
  {{ return({'relations': [target_relation]}) }}

{%- endmaterialization -%}

The sequence is important. The old table gets renamed to a backup name before the new one takes its place. If the build or validation fails, the old table stays untouched. The backup only gets dropped after the swap succeeds.

The window where neither old nor new table exists under the target name is the time it takes to execute two BigQuery metadata operations (rename old to backup, rename temp to target). That’s typically under a second, compared to the potentially minutes-long gap during a standard table rebuild. BigQuery doesn’t support atomic table swaps like Snowflake’s ALTER TABLE ... SWAP WITH, so this window can’t be eliminated entirely, but it’s small enough that most concurrent queries won’t hit it.

Use it in a model:

{{ config(
    materialized='zero_downtime_table',
    min_row_count=1000
) }}

SELECT
    user_id,
    email,
    account_status,
    created_at
FROM {{ ref('int__users_joined') }}
WHERE account_status != 'deleted'

If the model returns fewer than 1000 rows, dbt raises an error and the old table stays in place. The logs tell you exactly what happened.

Comparing against the previous build

A static row count threshold catches empty tables, but a relative check catches more subtle problems. You can compare the new table against the existing one to flag unexpected drops in volume:

{# Add this after the min_row_count check #}
{% set max_drop_pct = config.get('max_row_count_drop_pct', 50) %}

{% if existing_relation is not none and max_drop_pct is not none %}
  {% call statement('old_count', fetch_result=True) %}
    SELECT COUNT(*) FROM {{ existing_relation }}
  {% endcall %}
  {% set old_count = load_result('old_count')['data'][0][0] %}

  {% if old_count > 0 %}
    {% set drop_pct = ((old_count - row_count) * 100 / old_count) | int %}
    {% if drop_pct > max_drop_pct %}
      {% do adapter.drop_relation(temp_relation) %}
      {{ exceptions.raise_compiler_error(
        "Row count dropped " ~ drop_pct ~ "% (" ~ old_count ~ " to " ~ row_count
        ~ "), threshold is " ~ max_drop_pct ~ "%. Old table preserved."
      ) }}
    {% endif %}
  {% endif %}
{% endif %}

This catches the scenario where a source table goes partially empty or a filter removes too much data. The model fails safely rather than pushing incomplete data downstream.

For tables with naturally volatile row counts, set max_row_count_drop_pct to none in model config to disable the relative check while keeping the absolute minimum.

Building a secured table materialization

BigQuery supports row access policies that restrict which rows different user groups can see. When dbt drops and recreates a table (as the table materialization does), every row access policy on that table disappears. You need to reapply them after every build.

Post-hooks can do this. For one or two policies, they’re fine. But a table with three row access policies and eight column descriptions means a dozen separate SQL statements in post-hooks. That’s hard to maintain and easy to get wrong, especially when the same security pattern applies across thirty models.

A custom materialization makes this declarative. You define the policies in config, and the materialization applies them automatically after building the table.

Create macros/materializations/secured_table.sql:

{%- materialization secured_table, adapter='bigquery' -%}

  {# Setup relations #}
  {% set target_relation = this.incorporate(type='table') %}
  {% set existing_relation = load_cached_relation(this) %}

  {{ run_hooks(pre_hooks) }}

  {# Drop and rebuild #}
  {% if existing_relation is not none %}
    {% do adapter.drop_relation(existing_relation) %}
  {% endif %}

  {% call statement('main') %}
    {{ create_table_as(false, target_relation, sql) }}
  {% endcall %}

  {# Apply column descriptions from config #}
  {% set column_descriptions = config.get('column_descriptions', {}) %}
  {% for col_name, description in column_descriptions.items() %}
    {% call statement('describe_' ~ col_name) %}
      ALTER TABLE {{ target_relation }}
      ALTER COLUMN {{ col_name }}
      SET OPTIONS(description='{{ description | replace("'", "\\'") }}')
    {% endcall %}
  {% endfor %}

  {# Apply row access policies from config #}
  {% set row_access_policies = config.get('row_access_policies', []) %}
  {% for policy in row_access_policies %}
    {% call statement('policy_' ~ policy.name) %}
      CREATE OR REPLACE ROW ACCESS POLICY {{ policy.name }}
      ON {{ target_relation }}
      GRANT TO ({{ policy.grantees | join(', ') }})
      FILTER USING ({{ policy.filter_expression }})
    {% endcall %}
  {% endfor %}

  {{ run_hooks(post_hooks) }}

  {% set grant_config = config.get('grants') %}
  {% do apply_grants(target_relation, grant_config) %}
  {% do persist_docs(target_relation, model) %}

  {% do adapter.commit() %}
  {{ return({'relations': [target_relation]}) }}

{%- endmaterialization -%}

Use it in a model:

{{ config(
    materialized='secured_table',
    column_descriptions={
        'user_id': 'Unique identifier from the auth system',
        'email': 'User email address, restricted to PII-authorized groups',
        'region': 'Geographic region code used for access filtering',
        'lifetime_value': 'Total revenue attributed to this user in EUR'
    },
    row_access_policies=[
        {
            'name': 'eu_analysts_filter',
            'grantees': ['"group:eu-analysts@company.com"'],
            'filter_expression': "region = 'EU'"
        },
        {
            'name': 'us_analysts_filter',
            'grantees': ['"group:us-analysts@company.com"'],
            'filter_expression': "region = 'US'"
        },
        {
            'name': 'admin_full_access',
            'grantees': ['"group:data-admins@company.com"'],
            'filter_expression': 'TRUE'
        }
    ]
) }}

SELECT
    user_id,
    email,
    region,
    lifetime_value
FROM {{ ref('int__users_enriched') }}

Every time this model runs, it rebuilds with the correct column descriptions and row access policies. EU analysts only see EU rows, US analysts only see US rows, and data admins see everything. No manual steps, no forgotten post-hooks.

Why not post-hooks?

For a single model with one policy, post-hooks work:

{{ config(
    materialized='table',
    post_hook="CREATE OR REPLACE ROW ACCESS POLICY region_filter ON {{ this }} GRANT TO (\"group:eu-analysts@company.com\") FILTER USING (region = 'EU')"
) }}

This stops scaling at around three policies per model. The config block becomes unreadable, the quoting gets fragile (nested double quotes inside SQL strings inside Jinja), and applying the same pattern across multiple models means copying walls of post-hook text.

A materialization takes a config dict and generates the right DDL. Adding a new policy is one dict entry, not a carefully escaped SQL string.

Column descriptions have an alternative path: dbt’s persist_docs with columns: true reads descriptions from schema YAML files and applies them via the adapter. This works well if your team already maintains schema YAML. The materialization approach keeps descriptions in the model config alongside the SQL, which some teams prefer. Both approaches are valid.

Extending with data masking

BigQuery’s column-level security uses policy tags from Data Catalog. The pattern is similar to row access policies, but the DDL assigns policy tags to columns rather than creating standalone policies:

{# Add this block after row access policies #}
{% set column_policy_tags = config.get('column_policy_tags', {}) %}
{% for col_name, tag_path in column_policy_tags.items() %}
  {% call statement('tag_' ~ col_name) %}
    ALTER TABLE {{ target_relation }}
    ALTER COLUMN {{ col_name }}
    SET OPTIONS(policy_tags=['{{ tag_path }}'])
  {% endcall %}
{% endfor %}

Then in model config:

{{ config(
    materialized='secured_table',
    column_policy_tags={
        'email': 'projects/my-project/locations/us/taxonomies/123/policyTags/456',
        'phone': 'projects/my-project/locations/us/taxonomies/123/policyTags/789'
    }
) }}

The taxonomy and policy tags need to exist in Data Catalog before the materialization references them. Create those separately as part of your infrastructure setup. The materialization just handles the column-level assignment on each rebuild.

Debugging and testing materializations

Custom materializations fail in predictable ways. Here are the errors you’ll see and what causes them.

“Relation does not exist” usually means you’re referencing a relation that isn’t in the cache. Use load_cached_relation() instead of directly querying the database, or call adapter.cache.clear() if you’ve modified relations outside the cache’s knowledge.

“Transaction rolled back” or changes not persisting means you forgot adapter.commit(). Always call it before the return statement.

“Relation type mismatch” happens when the existing relation is a view but you’re trying to work with it as a table. Use this.incorporate(type='table') and handle type mismatches explicitly:

{% if existing_relation is not none and existing_relation.type != 'table' %}
  {% do adapter.drop_relation(existing_relation) %}
  {% set existing_relation = none %}
{% endif %}

Testing a new materialization

Create a simple test model that uses your materialization:

-- models/test_zero_downtime.sql
{{ config(
    materialized='zero_downtime_table',
    min_row_count=1
) }}

SELECT 1 AS id, 'test' AS value

Run it twice:

dbt run --select test_zero_downtime
dbt run --select test_zero_downtime

After both runs, query the target table. You should see one row that was rebuilt cleanly via the swap.

Check target/compiled/ to see the actual SQL your materialization generated. This is invaluable for debugging because you’ll see exactly what statement dbt tried to execute.

For the secured table, verify that row access policies were applied:

SELECT *
FROM `region-us`.INFORMATION_SCHEMA.OBJECT_PRIVILEGES
WHERE object_name = 'your_model_name'

When to write a custom materialization (and when not to)

Custom materializations add maintenance burden. Consider alternatives first.

Post-hooks handle many use cases. If you need a single GRANT after build or one ALTER TABLE, a hook is simpler than a materialization. Macros in the model SQL can generate complex statements without touching materializations either.

Write a custom materialization when the pattern is complex enough that copy-pasting would be error-prone, when you need the behavior across many models, or when built-in materializations actively work against what you need. The zero-downtime pattern makes sense when you have production tables that can’t tolerate build-time gaps or unvalidated swaps. The secured table pattern makes sense when row-level security applies across dozens of models and you want a single place to maintain the policy logic.

Skip the custom materialization when a post-hook solves the problem, when only one or two models need the behavior, or when you’re the only one who’ll maintain it.

The zero-downtime pattern is the simpler of the two and a good place to learn the structure. The secured table pattern follows the same principles with more moving parts. Both patterns solve problems that come up repeatedly once your dbt project moves past development into production use, where downstream consumers and data governance actually matter.