dbt-utils: every function explained

dbt-utils is the most depended-on package in the dbt package ecosystem. Nearly every dbt project installs it. But most teams reach for generate_surrogate_key, maybe union_relations, and stop there. The package has 50+ macros and generic tests that cover SQL generation, data introspection, quality validation, and cross-warehouse compatibility.

This is a complete reference for dbt-utils v1.3.3 (the current release, compatible with dbt Core 1.x and Fusion 2.x). If you’re looking for a curated selection of the most useful ones, see my guide to essential dbt macros.

The v1.0 migration: what moved to dbt-core

The biggest change in dbt-utils history happened with v1.0. All cross-database macros were removed from dbt-utils and moved to the dbt namespace in dbt-core. If you’re still calling dbt_utils.datediff(), you’re using a path that no longer exists in current versions. Replace it with dbt.datediff().

Here’s the full list of migrated macros (use dbt.* instead of dbt_utils.*):

Date and time: dateadd, datediff, date_trunc, last_day, current_timestamp

String: concat, length, position, replace, right, split_part, string_literal, escape_single_quotes

Type casting: safe_cast, cast (dbt v1.8+), type_bigint, type_float, type_int, type_numeric, type_string, type_timestamp

Aggregation: any_value, bool_or, listagg

Set operations: except, intersect

Other: hash, cast_bool_to_text, array_append, array_concat, array_construct (dbt-core 1.3+), date (dbt v1.8+)

These macros also no longer exist in dbt-utils at all:

  • surrogate_key() replaced by generate_surrogate_key()
  • unique_where and not_null_where tests replaced by the native where config on built-in tests
  • insert_by_period materialization moved to the experimental-features repo
  • condition argument on expression_is_true replaced by native where config

SQL generation helpers

These macros generate SQL fragments or complete queries. They’re the most commonly used part of dbt-utils.

date_spine

Generates a complete date or time series. Useful for filling gaps in time-series data.

SELECT *
FROM (
{{ dbt_utils.date_spine(
datepart="day",
start_date="CAST('2024-01-01' AS DATE)",
end_date="CAST('2025-01-01' AS DATE)"
) }}
)

Accepts datepart values: day, week, month, year, hour, minute.

Gotcha: the end date is exclusive. The series above stops at 2024-12-31, not 2025-01-01.

deduplicate

Removes duplicate rows by keeping one row per partition, ordered by your criteria:

SELECT *
FROM (
{{ dbt_utils.deduplicate(
relation=ref('base__crm__contacts'),
partition_by='contact__id',
order_by='contact__updated_at DESC'
) }}
)

Breaking changes from v1.0: group_by was renamed to partition_by, relation_alias was removed, and order_by is now mandatory (pass 1 if you don’t care about ordering). On Snowflake and BigQuery, this compiles to a QUALIFY clause. On other warehouses, it uses a ROW_NUMBER subquery.

star

Selects all columns from a relation with optional exclusions:

SELECT
{{ dbt_utils.star(from=ref('base__shopify__orders'), except=["_fivetran_synced", "_fivetran_deleted"]) }}
FROM {{ ref('base__shopify__orders') }}

Supports relation_alias, prefix, suffix, and quote_identifiers (default True). Runs introspection at compile time, so the relation must already be materialized.

union_relations

Unions multiple relations, automatically handling columns that exist in one but not another (fills missing columns with NULL):

{{ dbt_utils.union_relations(
relations=[ref('base__stripe__charges_us'), ref('base__stripe__charges_eu')],
exclude=["_loaded_at"],
source_column_name='_dbt_source_relation'
) }}

The include and exclude parameters are mutually exclusive. Pair this with get_relations_by_pattern for dynamic table discovery (useful for sharded tables).

generate_surrogate_key

Creates a hash-based surrogate key from multiple columns:

SELECT
{{ dbt_utils.generate_surrogate_key(['order__id', 'order__line_number']) }} AS order_line__surrogate_key,
order__id,
order__line_number,
product__id,
line__quantity,
line__price
FROM {{ ref('base__shopify__order_lines') }}

Critical gotcha: if you’re migrating from the old surrogate_key() macro to generate_surrogate_key(), the hash values change for rows containing nulls. This breaks incremental models and snapshots that used the old key. To maintain backward compatibility, add this to your dbt_project.yml:

vars:
surrogate_key_treat_nulls_as_empty_strings: true

pivot and unpivot

pivot creates pivot tables using CASE expressions:

SELECT
order__date,
{{ dbt_utils.pivot(
column='order__status',
values=dbt_utils.get_column_values(ref('base__shopify__orders'), 'order__status'),
agg='COUNT',
then_value=1,
else_value=0,
prefix='status_'
) }}
FROM {{ ref('base__shopify__orders') }}
GROUP BY 1

unpivot reverses the operation. Note: the table argument was renamed to relation in v1.0.

{{ dbt_utils.unpivot(
relation=ref('mrt__sales__quarterly_revenue'),
cast_to='FLOAT64',
exclude=['region__name'],
field_name='quarter',
value_name='revenue'
) }}

Other SQL generators

MacroWhat it doesExample
group_by(n)Outputs GROUP BY 1, 2, ..., n{{ dbt_utils.group_by(3) }}
generate_series(upper_bound)Creates a 1-indexed number series{{ dbt_utils.generate_series(100) }}
safe_add(values)Null-safe addition (list argument required since v1.0){{ dbt_utils.safe_add(['col_a', 'col_b']) }}
safe_divide(numerator, denominator)Returns NULL on division by zero{{ dbt_utils.safe_divide('revenue', 'sessions') }}
safe_subtract(values)Null-safe subtraction{{ dbt_utils.safe_subtract(['col_a', 'col_b']) }}
haversine_distance(lat1, lon1, lat2, lon2, unit)Distance between two coordinatesunit defaults to 'mi', use 'km' for kilometers
width_bucket(expr, min, max, buckets)Assigns values to equal-width histogram bucketsUseful for distribution analysis

Introspective macros

These macros query your database at compile time to inform your SQL. They’re powerful but come with a catch: they run queries before your models execute, so the target relations need to already exist.

get_column_values

Retrieves distinct values from a column. Commonly paired with pivot:

{% set status_values = dbt_utils.get_column_values(
table=ref('base__shopify__orders'),
column='order__status',
order_by='COUNT(*) DESC',
max_records=50,
default=['pending', 'completed', 'cancelled']
) %}

Gotcha: since this runs at compile time, it fails if the model doesn’t exist yet (first run, new environment). Always provide a default list as a fallback.

get_filtered_columns_in_relation

Returns column names from a relation, with optional exclusions. Only works on materialized relations, not CTEs. Returns an empty list in parsing mode (fixed in v1.3.2).

get_relations_by_pattern

Discovers relations matching SQL wildcards:

{% set sharded_tables = dbt_utils.get_relations_by_pattern(
'analytics_%',
'events_%',
exclude='%deprecated'
) %}
{{ dbt_utils.union_relations(relations=sharded_tables) }}

This is the standard approach for consolidating sharded tables (common with BigQuery date-sharded exports).

get_query_results_as_dict and get_single_value

get_query_results_as_dict runs arbitrary SQL and returns results as a dictionary. get_single_value (new in v1.0) returns a single scalar value. Both run at compile time.

{% set row_count = dbt_utils.get_single_value(
"SELECT COUNT(*) FROM " ~ ref('base__shopify__orders')
) %}

Use these sparingly. Every introspective macro adds a query to your compile step, which slows down dbt compile and dbt run.

Generic tests

dbt-utils ships with generic tests that go well beyond the built-in unique, not_null, accepted_values, and relationships. For Fusion (dbt 2.0) and dbt Core 1.10.6+, test arguments must be nested under an arguments: key in YAML.

Key tests

unique_combination_of_columns tests composite uniqueness:

models:
- name: mrt__finance__daily_revenue
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- revenue__date
- revenue__currency

accepted_range validates that values fall within bounds:

columns:
- name: order__total_amount
data_tests:
- dbt_utils.accepted_range:
min_value: 0
inclusive: true

expression_is_true evaluates any SQL expression per row:

models:
- name: mrt__finance__invoices
data_tests:
- dbt_utils.expression_is_true:
expression: "invoice__total_amount >= invoice__tax_amount"
where: "invoice__status != 'voided'"

Note: the old condition argument was removed in v1.0. Use the native where config instead.

recency checks that data is fresh:

models:
- name: base__stripe__charges
data_tests:
- dbt_utils.recency:
datepart: hour
field: charge__created_at
interval: 24

Complete test reference

TestPurpose
unique_combination_of_columnsComposite uniqueness
accepted_rangeValue within min/max bounds
expression_is_trueArbitrary SQL expression per row
recencyData freshness by datepart/interval
at_least_oneAt least one non-null value exists
not_constantValues vary (not all the same)
not_null_proportionThreshold-based null check (e.g., at_least: 0.95)
relationships_whereFiltered referential integrity
mutually_exclusive_rangesNo overlapping ranges (both bounds must be NOT NULL)
sequential_valuesContinuity check (no gaps)
equal_rowcountTwo relations have the same row count
fewer_rows_thanRelation A has fewer rows than B
equalityFull model comparison (with optional precision for numerics)
not_accepted_valuesInverse of accepted_values
not_empty_stringNon-empty strings (with trim_whitespace option)
cardinality_equalitySame number of distinct values across two columns

Several tests support a group_by_columns parameter for segmented validation: equal_rowcount, fewer_rows_than, recency, at_least_one, not_constant, sequential_values, and not_null_proportion.

For a broader look at how these tests fit into a testing strategy, I covered that in a separate article.

Web macros and Jinja helpers

dbt-utils includes three URL extraction macros that are surprisingly useful for marketing analytics:

  • get_url_parameter(field, url_parameter) extracts query parameters from URLs
  • get_url_host(field) extracts the hostname (handles http, https, and android-app protocols)
  • get_url_path(field) extracts the page path
SELECT
{{ dbt_utils.get_url_host('page__url') }} AS page__host,
{{ dbt_utils.get_url_path('page__url') }} AS page__path,
{{ dbt_utils.get_url_parameter('page__url', 'utm_source') }} AS session__utm_source
FROM {{ ref('base__ga4__events') }}

For Jinja development and debugging, log_info(message) prints to the console during compilation, pretty_time(format) returns a formatted timestamp, and slugify(string) converts text to URL-safe slugs.

Cross-warehouse behavior

dbt-utils uses adapter.dispatch() to generate warehouse-specific SQL under the hood. Most of the time this is transparent, but some macros compile differently depending on your adapter:

MacroSnowflake/BigQueryPostgres/Redshift
deduplicateQUALIFY ROW_NUMBER()ROW_NUMBER() subquery
safe_cast (now dbt.safe_cast)TRY_CAST / SAFE_CASTRegular CAST
listagg (now dbt.listagg)LISTAGGSTRING_AGG
bool_or (now dbt.bool_or)BOOL_OR / LOGICAL_ORBOOL_OR
any_value (now dbt.any_value)ANY_VALUEANY() (Postgres)

If you’re on a non-core adapter like Spark or Databricks, you’ll likely need to configure dispatch in your dbt_project.yml to map dbt_utils to your adapter’s shim package (like spark_utils). See my package ecosystem guide for details on dispatch configuration.

Quick reference: what’s where

If you’re unsure whether a macro lives in dbt-utils or dbt-core, here’s the rule of thumb:

Still in dbt-utils: SQL generators (date_spine, deduplicate, star, union_relations, generate_surrogate_key, pivot, unpivot), introspective macros (get_column_values, get_relations_by_pattern), generic tests, web macros, and Jinja helpers.

Moved to dbt-core (dbt.*): All cross-database type casting, date/time functions, string functions, aggregation functions, and set operations.

Removed entirely: surrogate_key(), unique_where, not_null_where, insert_by_period, and the condition argument on expression_is_true.

When in doubt, check the dbt-utils changelog on GitHub for the v1.0.0 migration guide.