ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Macros

How dbt macros work — Jinja fundamentals, writing custom macros, using dbt_utils, dispatch patterns, and when macros help vs hurt

Planted Last tended
dbtdata engineeringdata modeling

Macros are dbt’s mechanism for code reuse. They are Jinja templates that generate SQL at compile time, turning repeated patterns into callable functions. A five-line currency conversion duplicated across twelve models becomes {{ cents_to_dollars('amount') }} in each one and a single source of truth in your macros/ directory.

The power is real, but so is the risk. Over-abstracted macros hide simple SQL behind layers nobody can debug. The goal is knowing when a macro genuinely helps and when inline SQL is the better choice.

Jinja Fundamentals

dbt uses Jinja2 as its templating engine with a critical twist: templates render in two phases. The parse phase builds the DAG and resolves dependencies. The execute phase generates the actual SQL sent to your warehouse. Any macro that queries the database (via run_query() or accesses graph.nodes) must guard against the parse phase using the execute flag.

Variables and Control Flow

{% set %} creates variables. Combine it with loops and conditionals to generate dynamic SQL:

{% set payment_methods = ["bank_transfer", "credit_card", "gift_card"] %}
{% for method in payment_methods %}
SUM(CASE WHEN payment_method = '{{ method }}' THEN amount END)
AS order__{{ method }}_amount
{{ ',' if not loop.last }}
{% endfor %}

This produces three SUM(CASE WHEN...) columns. The loop.last variable handles trailing commas, one of the most common pain points in generated SQL.

Whitespace control matters for readable compiled output. Use {%- to strip whitespace before a tag and -%} to strip after.

Macro Anatomy

Macros live in .sql files under macros/. A basic transformation macro:

{% macro cents_to_dollars(column_name, scale=2) %}
ROUND({{ column_name }} / 100.0, {{ scale }})
{% endmacro %}

Call it with {{ cents_to_dollars('amount_cents') }} or override precision: {{ cents_to_dollars('amount_cents', scale=4) }}. Required parameters come first without defaults; optional parameters have sensible defaults. Aim for five to seven parameters maximum.

The return() function preserves data types when you need to pass values back (dictionaries, lists, integers). Content between macro tags renders as string output, which becomes the generated SQL.

For macros that need flexibility without parameter explosion, use **kwargs:

{% macro flexible_macro(required_param, **kwargs) %}
{% set optional = kwargs.get('optional_param', 'default') %}
-- use optional here
{% endmacro %}

This is preferable to adding parameter fifteen to an already-long signature.

The Execute Guard

Any macro that runs queries at compile time needs this pattern:

{% macro get_column_values(table, column) %}
{% if not execute %}
{{ return([]) }}
{% endif %}
{% set query %}
SELECT DISTINCT {{ column }} FROM {{ table }}
{% endset %}
{% set results = run_query(query) %}
{{ return(results.columns[0].values()) }}
{% endmacro %}

Without the execute check, run_query() would fail during parsing when the target table might not exist yet. The empty list return during parse phase keeps dbt happy; the real query runs during execution.

Context Objects

dbt provides runtime context objects you will use constantly:

  • this — the current relation being built (.database, .schema, .identifier)
  • target — the deployment target (.name, .type, .schema), useful for environment-aware behavior
  • model — metadata about the current model (.unique_id, .config, .tags)
  • graph — project-wide node information (.nodes, .sources), only available during execution

Essential Package Macros

Before writing custom macros, check whether a package already solves your problem. The dbt-utils package (v1.3.3) provides battle-tested macros for common patterns.

generate_surrogate_key creates hashed keys from business columns, handling nulls consistently and preventing collision bugs:

{{ dbt_utils.generate_surrogate_key(['customer_id', 'order_date']) }} AS order_key

star generates column lists while excluding specific columns. When upstream models add columns, your model picks them up automatically:

SELECT
{{ dbt_utils.star(from=ref('base__shopify__customers'),
except=["_loaded_at", "_source"]) }}
FROM {{ ref('base__shopify__customers') }}

get_column_values enables dynamic SQL based on actual data, useful for pivot-style queries that adapt when new categories appear in production:

{% set methods = dbt_utils.get_column_values(
table=ref('base__shopify__payments'),
column='payment_method',
default=['credit_card', 'bank_transfer']
) %}

Always provide a default list as fallback, since the introspection query fails if the target relation does not exist yet (first run, new environment).

date_spine creates continuous date sequences for time-series models that need to show zeros instead of gaps. union_relations combines tables with different schemas, aligning columns and filling missing ones with NULL.

Since dbt-utils v1.0, all cross-database macros (date functions, string functions, type casting) moved to the dbt namespace in dbt-core. Use dbt.dateadd() instead of dbt_utils.dateadd().

Patterns Worth Writing Yourself

Some macros are too project-specific for packages but appear in project after project.

Audit columns track ETL metadata on every row:

{% macro add_audit_columns() %}
CURRENT_TIMESTAMP() AS _loaded_at,
'{{ invocation_id }}' AS _dbt_invocation_id,
'{{ target.name }}' AS _dbt_target
{% endmacro %}

The invocation ID lets you trace rows back to specific dbt runs during production incidents.

Environment-aware limiting speeds up development runs without changing production behavior:

{% macro limit_data_in_dev(column_name='created_at', days=3) %}
{% if target.name == 'dev' %}
AND {{ column_name }} >= CURRENT_DATE() - {{ days }}
{% endif %}
{% endmacro %}

Add it to WHERE clauses: WHERE 1=1 {{ limit_data_in_dev() }}. In dev, you query three days. In prod, the clause disappears. No commenting and uncommenting filters before commits.

Schema generation overrides dbt’s default naming so prod gets clean schema names while dev stays isolated:

{% macro generate_schema_name(custom_schema_name, node) %}
{% if target.name == 'prod' and custom_schema_name is not none %}
{{ custom_schema_name | trim }}
{% else %}
{{ target.schema }}{% if custom_schema_name is not none %}_{{ custom_schema_name | trim }}{% endif %}
{% endif %}
{% endmacro %}

This macro is called automatically by dbt. A model with schema: marts lands in marts in prod and dbt_yourname_marts in dev.

The Dispatch Pattern for Cross-Database Macros

When your macros need to work across BigQuery, Snowflake, and Databricks, the dispatch pattern lets you write adapter-specific implementations behind a single interface:

{% macro my_dateadd(datepart, interval, from_date) %}
{{ return(adapter.dispatch('my_dateadd')(datepart, interval, from_date)) }}
{% endmacro %}
{% macro default__my_dateadd(datepart, interval, from_date) %}
dateadd({{ datepart }}, {{ interval }}, {{ from_date }})
{% endmacro %}
{% macro bigquery__my_dateadd(datepart, interval, from_date) %}
date_add({{ from_date }}, interval {{ interval }} {{ datepart }})
{% endmacro %}

dbt searches for implementations in order: adapter-specific, parent adapter, then default. Start with a default that covers most databases, then add overrides only where SQL dialects diverge. The biggest divergences are in date functions (argument ordering), array operations (UNNEST vs LATERAL FLATTEN vs EXPLODE), and safe casting (SAFE_CAST vs TRY_CAST).

Before writing custom dispatch macros, check whether dbt.dateadd(), dbt.datediff(), dbt.safe_cast(), or dbt.concat() already handle your case.

When Macros Help vs. Hurt

The Rule of Three

First occurrence: write it inline. Second occurrence: note the pattern. Third occurrence: extract to a macro. Premature abstraction creates more problems than duplication — a model that requires reading four macro files to understand a single query is unreadable.

Signs a Macro Is Doing Too Much

When a macro grows past five or six parameters, it is probably combining multiple responsibilities. A process_amount macro with flags for discount application, currency conversion, and tax calculation should be three separate macros (cents_to_dollars, apply_discount, convert_currency) that compose explicitly in the model. The longer model code is a feature — anyone reading it can see exactly what happens.

Naming Conventions

Use verb prefixes that indicate what the macro does. get_ retrieves data, generate_ creates output, format_ transforms strings. Prefix internal helpers with underscore: _build_filter_clause. Clear naming makes macros discoverable without reading the implementation.

Project Organization

One macro per file, filename matching the macro name. Override macros (generate_schema_name) live at the macros/ root. Utility macros go in utils/. Domain-specific macros group by business area. This organization mirrors the principles in the three-layer architecture: clear boundaries make the project navigable.

macros/
├── _macros.yml # All macro documentation
├── generate_schema_name.sql # Override macros at root
├── utils/
│ ├── string_utils.sql
│ └── date_utils.sql
├── transformations/
│ ├── finance/
│ └── marketing/
└── tests/ # Custom generic tests

Document arguments in _macros.yml so they appear in dbt docs:

macros:
- name: cents_to_dollars
description: |
Converts cents integer to dollars with precision.
## Usage
```sql
{{ cents_to_dollars('amount_cents', scale=4) }}
```
arguments:
- name: column_name
type: column
description: Column containing cents
- name: scale
type: integer
description: Decimal places. Defaults to 2.

Debugging

When macros misbehave, check target/compiled/ for the rendered SQL. Run dbt compile --select model_name during development. Use {{ log("Debug: " ~ my_variable, info=true) }} for runtime inspection. For hard failures, {{ exceptions.raise_compiler_error("Invalid: " ~ value) }} stops compilation with a clear message.

Test macros through unit tests (dbt 1.8+) or integration test models that verify output against expected values. The compiled SQL is always the source of truth.

Common Pitfalls

Nested curlies. Jinja expressions inside other expressions do not need double braces: {{ my_macro(var('x')) }} is correct, not {{ my_macro({{ var('x') }}) }}.

Variable scope in loops. Variables set inside a for loop do not persist outside it. Use namespace for counters that need to survive the loop boundary:

{% set ns = namespace(total=0) %}
{% for item in items %}
{% set ns.total = ns.total + item %}
{% endfor %}

Missing execute guards. Any macro calling run_query() or accessing graph.nodes needs the execute check. Without it, you get cryptic parse-time errors about relations not existing.

Surrogate key migration. Moving from the old surrogate_key() to generate_surrogate_key() changes hash values for rows with nulls, which breaks incremental models and snapshots that used the old key.