10 Macros That Should Be in Every dbt Project

You’ve installed dbt-utils. You’ve seen the list of available macros. And yet most projects end up using the same three or four while writing custom SQL for patterns that already have solutions.

What follows is a short list of macros that earn their place in production projects, the ones that eliminate repeated work and prevent subtle bugs. Some come from packages, others you’ll write yourself, but all of them solve problems you’ll hit again and again.

Package macros worth installing for

These macros justify adding dbt-utils and dbt-expectations to your packages.yml. They solve problems that are surprisingly hard to get right from scratch.

1. generate_surrogate_key

Creating consistent hashed keys from business keys sounds simple until you hit your first null value. Different concatenation approaches handle nulls differently, leading to collision bugs that surface months later.

SELECT
{{ dbt_utils.generate_surrogate_key(['customer_id', 'order_date']) }} AS order__key,
customer_id,
order_date,
amount
FROM {{ ref('base__shopify__orders') }}

The macro handles null values consistently, uses a delimiter to prevent collisions between ('a', 'bc') and ('ab', 'c'), and works across BigQuery, Snowflake, and Databricks. The hashing algorithm varies by adapter, but the behavior stays consistent.

One thing to watch: if you’re migrating from the old surrogate_key macro, the null handling changed. Test your keys before swapping.

2. star

Selecting all columns except a few is a common pattern, especially when you need to exclude audit columns or sensitive fields without listing every column you want to keep.

SELECT
{{ dbt_utils.star(from=ref('base__stripe__customers'), except=['_loaded_at', '_source_file']) }},
CURRENT_TIMESTAMP() AS _processed_at
FROM {{ ref('base__stripe__customers') }}

This generates the full column list at compile time, excluding what you specify. When upstream models add columns, your model picks them up automatically. When they remove columns, compilation fails (which is what you want). Silent schema drift causes more production issues than loud failures.

If your project runs exclusively on BigQuery, SELECT * EXCEPT(...) achieves the same result natively. The star macro earns its place when you need cross-database compatibility or want explicit column lists in your compiled SQL for easier auditing.

3. get_column_values

Dynamic SQL based on actual data values enables patterns that would otherwise require hardcoding or manual updates. The classic use case is pivoting a column into multiple columns.

{% set payment_methods = dbt_utils.get_column_values(
table=ref('base__stripe__payments'),
column='payment_method',
order_by='COUNT(*) DESC',
max_records=10
) %}
SELECT
order_id,
{% for method in payment_methods %}
SUM(CASE WHEN payment_method = '{{ method }}' THEN amount ELSE 0 END) AS order__{{ method }}_amount
{{ ',' if not loop.last }}
{% endfor %}
FROM {{ ref('base__stripe__payments') }}
GROUP BY order_id

When a new payment method appears in production, your model handles it on the next run. No code change required.

The macro runs a query during the execute phase, so it requires the database connection to be available. This works fine in normal dbt runs but returns an empty list during dbt compile or when building the DAG.

4. date_spine

Time-series models with date gaps need a complete date sequence. Writing this from scratch means dealing with recursive CTEs, generate_series functions, or cross joins, all of which vary by database.

{{ dbt_utils.date_spine(
datepart="day",
start_date="CAST('2020-01-01' AS DATE)",
end_date="CAST('2026-12-31' AS DATE)"
) }}

This gives you one row per day. Join it to your core models to surface missing dates, calculate running totals that don’t skip gaps, or build proper time-series visualizations that show the zeros.

You can also use week, month, or hour as the datepart. For most analytics use cases, a single dates reference model using this macro is enough.

5. union_relations

Combining tables with slightly different schemas (sharded tables, multi-tenant data, or tables across environments) requires aligning columns that may not exist in all sources.

{{ dbt_utils.union_relations(
relations=[
ref('base__shopify_us__orders'),
ref('base__shopify_eu__orders'),
ref('base__shopify_apac__orders')
]
) }}

The macro finds all columns across all relations, fills in nulls where columns are missing, and adds a _dbt_source_relation column so you know which table each row came from. No more manual column list maintenance when schemas drift between regions.

6. expect_column_values_to_match_regex

Data quality testing belongs at the schema level, not buried in transformation logic. dbt-expectations brings Great Expectations-style tests to dbt, with regex validation being one of the most useful.

columns:
- name: email
tests:
- dbt_expectations.expect_column_values_to_match_regex:
regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
row_condition: "email is not null"

This catches malformed data before it reaches downstream models. When the test fails, you know exactly which column and which pattern was violated. Compare this to discovering bad emails when a marketing campaign bounces.

The package includes over 40 test types. Pattern matching, recency checks, value ranges, and cross-column validation cover most data quality scenarios without custom SQL. I cover the most useful tests in my dbt-expectations guide.

Custom macros you’ll write yourself

Some patterns don’t exist in packages because they’re too specific to standardize. But they show up in project after project. Here are the ones worth having.

7. Audit columns

Every row should carry metadata about when and how it was created. A simple macro keeps this consistent across hundreds of models.

-- macros/add_audit_columns.sql
{% macro add_audit_columns() %}
CURRENT_TIMESTAMP() AS _loaded_at,
'{{ invocation_id }}' AS _dbt_invocation_id,
'{{ target.name }}' AS _dbt_target
{% endmacro %}

Use it in any select statement:

SELECT
customer_id,
customer_name,
{{ add_audit_columns() }}
FROM {{ ref('base__stripe__customers') }}

The invocation ID lets you trace rows back to specific dbt runs. The target name shows whether data came from dev or prod. When debugging production issues at 2 AM, you’ll be grateful for these columns.

8. Environment-aware sampling

Development runs against full production data take forever. Sampling in dev speeds up iteration without changing production behavior.

-- macros/limit_data_in_dev.sql
{% macro limit_data_in_dev(column_name='created_at', days=3) %}
{% if target.name == 'dev' %}
AND {{ column_name }} >= DATE_SUB(CURRENT_DATE(), INTERVAL {{ days }} DAY)
{% endif %}
{% endmacro %}

Add it to any model’s WHERE clause:

SELECT
event_id,
event_name,
event_timestamp,
user_id
FROM {{ source('raw', 'events') }}
WHERE 1=1
{{ limit_data_in_dev('event_timestamp', 7) }}

In dev, you get seven days of data. In prod, you get everything. The model code stays identical across environments. No more commenting and uncommenting filters before commits.

9. Unit conversions

Cents to dollars, bytes to gigabytes, milliseconds to seconds. These conversions are trivial individually, but doing them inconsistently across models causes reconciliation nightmares.

-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name, scale=2) %}
ROUND({{ column_name }} / 100.0, {{ scale }})
{% endmacro %}
SELECT
order_id,
{{ cents_to_dollars('amount_cents') }} AS order__amount_dollars
FROM {{ ref('base__shopify__orders') }}

The macro is almost embarrassingly simple. That’s the point. When finance asks why two reports show different totals, you can confirm every model uses the same conversion. When you need to change the rounding behavior, you change it in one place.

10. Schema generation

dbt’s default schema naming concatenates your target schema with any custom schema you specify. Most teams want different behavior: in prod, use the custom schema directly; in dev, prefix everything with your target schema.

-- macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) %}
{% if target.name == 'prod' and custom_schema_name is not none %}
{{ custom_schema_name | trim }}
{% else %}
{{ target.schema }}{% if custom_schema_name is not none %}_{{ custom_schema_name | trim }}{% endif %}
{% endif %}
{% endmacro %}

With this macro, a model configured with schema: marts lands in:

  • Dev: dbt_yourname_marts
  • Prod: marts

This keeps dev environments isolated while giving prod clean schema names. The macro is called automatically by dbt, so you don’t invoke it directly.

Before you write another macro

The best macro is often no macro at all.

Wait for the rule of three. The first time you write similar code, just write it. The second time, note the pattern. The third time, extract to a macro. Premature abstraction creates more problems than duplication.

Favor readability over DRYness. dbt’s own style guide says it: “Favor readability when mixing Jinja with SQL, even if it means repeating some lines.” If understanding a model requires opening five macro files, you’ve optimized for the wrong thing. I explore this balance in DRY principles for dbt macros.

Watch for parameter explosion. When a macro needs seven parameters to handle all cases, it’s probably doing too much. Single-responsibility macros compose better than Swiss Army knife macros.

Test the generated SQL, not the macro. Run dbt compile --select model_name and look at what lands in target/compiled/. If the SQL looks wrong, the macro is wrong. The compiled output is the source of truth. For testing the data itself, see my testing strategy framework.

Getting started

Add the essential packages to your packages.yml:

packages:
- package: dbt-labs/dbt_utils
version: 1.3.3
- package: calogica/dbt_expectations
version: [">=0.10.0", "<0.11.0"]

Run dbt deps to install them.

For custom macros, create a macros/ folder structure that matches how you think about the code:

macros/
├── _macros.yml # Documentation for all macros
├── generate_schema_name.sql
├── utils/
│ ├── add_audit_columns.sql
│ ├── cents_to_dollars.sql
│ └── limit_data_in_dev.sql

One macro per file, with the filename matching the macro name. Document arguments in _macros.yml so your team knows what’s available. This mirrors the organization from my dbt project structure guide.

The audit columns macro is a good first addition. Once you can trace exactly which dbt run created a row during a production incident, these small investments start to compound.