dbt-utils: every function explained

dbt-utils is the most depended-on package in the dbt package ecosystem. Nearly every dbt project installs it. But most teams reach for generate_surrogate_key, maybe union_relations, and stop there. The package has 50+ macros and generic tests that cover SQL generation, data introspection, quality validation, and cross-warehouse compatibility.

This is a complete reference for dbt-utils v1.3.3 (the current release, compatible with dbt Core 1.x and Fusion 2.x). If you’re looking for a curated selection of the most useful ones, see my guide to essential dbt macros.

The v1.0 migration: what moved to dbt-core

The biggest change in dbt-utils history happened with v1.0. All cross-database macros were removed from dbt-utils and moved to the dbt namespace in dbt-core. If you’re still calling dbt_utils.datediff(), you’re using a path that no longer exists in current versions. Replace it with dbt.datediff().

Here’s the full list of migrated macros (use dbt.* instead of dbt_utils.*):

Date and time: dateadd, datediff, date_trunc, last_day, current_timestamp

String: concat, length, position, replace, right, split_part, string_literal, escape_single_quotes

Type casting: safe_cast, cast (dbt v1.8+), type_bigint, type_float, type_int, type_numeric, type_string, type_timestamp

Aggregation: any_value, bool_or, listagg

Set operations: except, intersect

Other: hash, cast_bool_to_text, array_append, array_concat, array_construct (dbt-core 1.3+), date (dbt v1.8+)

These macros also no longer exist in dbt-utils at all:

surrogate_key() replaced by generate_surrogate_key()
unique_where and not_null_where tests replaced by the native where config on built-in tests
insert_by_period materialization moved to the experimental-features repo
condition argument on expression_is_true replaced by native where config

SQL generation helpers

These macros generate SQL fragments or complete queries. They’re the most commonly used part of dbt-utils.

date_spine

Generates a complete date or time series. Useful for filling gaps in time-series data.

SELECT *
FROM (
  {{ dbt_utils.date_spine(
    datepart="day",
    start_date="CAST('2024-01-01' AS DATE)",
    end_date="CAST('2025-01-01' AS DATE)"
  ) }}
)

Accepts datepart values: day, week, month, year, hour, minute.

Gotcha: the end date is exclusive. The series above stops at 2024-12-31, not 2025-01-01.

deduplicate

Removes duplicate rows by keeping one row per partition, ordered by your criteria:

SELECT *
FROM (
  {{ dbt_utils.deduplicate(
    relation=ref('base__crm__contacts'),
    partition_by='contact__id',
    order_by='contact__updated_at DESC'
  ) }}
)

Breaking changes from v1.0: group_by was renamed to partition_by, relation_alias was removed, and order_by is now mandatory (pass 1 if you don’t care about ordering). On Snowflake and BigQuery, this compiles to a QUALIFY clause. On other warehouses, it uses a ROW_NUMBER subquery.

star

Selects all columns from a relation with optional exclusions:

SELECT
  {{ dbt_utils.star(from=ref('base__shopify__orders'), except=["_fivetran_synced", "_fivetran_deleted"]) }}
FROM {{ ref('base__shopify__orders') }}

Supports relation_alias, prefix, suffix, and quote_identifiers (default True). Runs introspection at compile time, so the relation must already be materialized.

union_relations

Unions multiple relations, automatically handling columns that exist in one but not another (fills missing columns with NULL):

{{ dbt_utils.union_relations(
  relations=[ref('base__stripe__charges_us'), ref('base__stripe__charges_eu')],
  exclude=["_loaded_at"],
  source_column_name='_dbt_source_relation'
) }}

The include and exclude parameters are mutually exclusive. Pair this with get_relations_by_pattern for dynamic table discovery (useful for sharded tables).

generate_surrogate_key

Creates a hash-based surrogate key from multiple columns:

SELECT
  {{ dbt_utils.generate_surrogate_key(['order__id', 'order__line_number']) }} AS order_line__surrogate_key,
  order__id,
  order__line_number,
  product__id,
  line__quantity,
  line__price
FROM {{ ref('base__shopify__order_lines') }}

Critical gotcha: if you’re migrating from the old surrogate_key() macro to generate_surrogate_key(), the hash values change for rows containing nulls. This breaks incremental models and snapshots that used the old key. To maintain backward compatibility, add this to your dbt_project.yml:

vars:
  surrogate_key_treat_nulls_as_empty_strings: true

pivot and unpivot

pivot creates pivot tables using CASE expressions:

SELECT
  order__date,
  {{ dbt_utils.pivot(
    column='order__status',
    values=dbt_utils.get_column_values(ref('base__shopify__orders'), 'order__status'),
    agg='COUNT',
    then_value=1,
    else_value=0,
    prefix='status_'
  ) }}
FROM {{ ref('base__shopify__orders') }}
GROUP BY 1

unpivot reverses the operation. Note: the table argument was renamed to relation in v1.0.

{{ dbt_utils.unpivot(
  relation=ref('mrt__sales__quarterly_revenue'),
  cast_to='FLOAT64',
  exclude=['region__name'],
  field_name='quarter',
  value_name='revenue'
) }}

Other SQL generators

Macro	What it does	Example
`group_by(n)`	Outputs `GROUP BY 1, 2, ..., n`	`{{ dbt_utils.group_by(3) }}`
`generate_series(upper_bound)`	Creates a 1-indexed number series	`{{ dbt_utils.generate_series(100) }}`
`safe_add(values)`	Null-safe addition (list argument required since v1.0)	`{{ dbt_utils.safe_add(['col_a', 'col_b']) }}`
`safe_divide(numerator, denominator)`	Returns NULL on division by zero	`{{ dbt_utils.safe_divide('revenue', 'sessions') }}`
`safe_subtract(values)`	Null-safe subtraction	`{{ dbt_utils.safe_subtract(['col_a', 'col_b']) }}`
`haversine_distance(lat1, lon1, lat2, lon2, unit)`	Distance between two coordinates	`unit` defaults to `'mi'`, use `'km'` for kilometers
`width_bucket(expr, min, max, buckets)`	Assigns values to equal-width histogram buckets	Useful for distribution analysis

Introspective macros

These macros query your database at compile time to inform your SQL. They’re powerful but come with a catch: they run queries before your models execute, so the target relations need to already exist.

get_column_values

Retrieves distinct values from a column. Commonly paired with pivot:

{% set status_values = dbt_utils.get_column_values(
  table=ref('base__shopify__orders'),
  column='order__status',
  order_by='COUNT(*) DESC',
  max_records=50,
  default=['pending', 'completed', 'cancelled']
) %}

Gotcha: since this runs at compile time, it fails if the model doesn’t exist yet (first run, new environment). Always provide a default list as a fallback.

get_filtered_columns_in_relation

Returns column names from a relation, with optional exclusions. Only works on materialized relations, not CTEs. Returns an empty list in parsing mode (fixed in v1.3.2).

get_relations_by_pattern

Discovers relations matching SQL wildcards:

{% set sharded_tables = dbt_utils.get_relations_by_pattern(
  'analytics_%',
  'events_%',
  exclude='%deprecated'
) %}

{{ dbt_utils.union_relations(relations=sharded_tables) }}

This is the standard approach for consolidating sharded tables (common with BigQuery date-sharded exports).

get_query_results_as_dict and get_single_value

get_query_results_as_dict runs arbitrary SQL and returns results as a dictionary. get_single_value (new in v1.0) returns a single scalar value. Both run at compile time.

{% set row_count = dbt_utils.get_single_value(
  "SELECT COUNT(*) FROM " ~ ref('base__shopify__orders')
) %}

Use these sparingly. Every introspective macro adds a query to your compile step, which slows down dbt compile and dbt run.

Generic tests

dbt-utils ships with generic tests that go well beyond the built-in unique, not_null, accepted_values, and relationships. For Fusion (dbt 2.0) and dbt Core 1.10.6+, test arguments must be nested under an arguments: key in YAML.

Key tests

unique_combination_of_columns tests composite uniqueness:

models:
  - name: mrt__finance__daily_revenue
    data_tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns:
            - revenue__date
            - revenue__currency

accepted_range validates that values fall within bounds:

columns:
  - name: order__total_amount
    data_tests:
      - dbt_utils.accepted_range:
          min_value: 0
          inclusive: true

expression_is_true evaluates any SQL expression per row:

models:
  - name: mrt__finance__invoices
    data_tests:
      - dbt_utils.expression_is_true:
          expression: "invoice__total_amount >= invoice__tax_amount"
          where: "invoice__status != 'voided'"

Note: the old condition argument was removed in v1.0. Use the native where config instead.

recency checks that data is fresh:

models:
  - name: base__stripe__charges
    data_tests:
      - dbt_utils.recency:
          datepart: hour
          field: charge__created_at
          interval: 24

Complete test reference

Test	Purpose
`unique_combination_of_columns`	Composite uniqueness
`accepted_range`	Value within min/max bounds
`expression_is_true`	Arbitrary SQL expression per row
`recency`	Data freshness by datepart/interval
`at_least_one`	At least one non-null value exists
`not_constant`	Values vary (not all the same)
`not_null_proportion`	Threshold-based null check (e.g., `at_least: 0.95`)
`relationships_where`	Filtered referential integrity
`mutually_exclusive_ranges`	No overlapping ranges (both bounds must be NOT NULL)
`sequential_values`	Continuity check (no gaps)
`equal_rowcount`	Two relations have the same row count
`fewer_rows_than`	Relation A has fewer rows than B
`equality`	Full model comparison (with optional `precision` for numerics)
`not_accepted_values`	Inverse of accepted_values
`not_empty_string`	Non-empty strings (with `trim_whitespace` option)
`cardinality_equality`	Same number of distinct values across two columns

Several tests support a group_by_columns parameter for segmented validation: equal_rowcount, fewer_rows_than, recency, at_least_one, not_constant, sequential_values, and not_null_proportion.

For a broader look at how these tests fit into a testing strategy, I covered that in a separate article.

Web macros and Jinja helpers

dbt-utils includes three URL extraction macros that are surprisingly useful for marketing analytics:

get_url_parameter(field, url_parameter) extracts query parameters from URLs
get_url_host(field) extracts the hostname (handles http, https, and android-app protocols)
get_url_path(field) extracts the page path

SELECT
  {{ dbt_utils.get_url_host('page__url') }} AS page__host,
  {{ dbt_utils.get_url_path('page__url') }} AS page__path,
  {{ dbt_utils.get_url_parameter('page__url', 'utm_source') }} AS session__utm_source
FROM {{ ref('base__ga4__events') }}

For Jinja development and debugging, log_info(message) prints to the console during compilation, pretty_time(format) returns a formatted timestamp, and slugify(string) converts text to URL-safe slugs.

Cross-warehouse behavior

dbt-utils uses adapter.dispatch() to generate warehouse-specific SQL under the hood. Most of the time this is transparent, but some macros compile differently depending on your adapter:

Macro	Snowflake/BigQuery	Postgres/Redshift
`deduplicate`	`QUALIFY ROW_NUMBER()`	`ROW_NUMBER()` subquery
`safe_cast` (now `dbt.safe_cast`)	`TRY_CAST` / `SAFE_CAST`	Regular `CAST`
`listagg` (now `dbt.listagg`)	`LISTAGG`	`STRING_AGG`
`bool_or` (now `dbt.bool_or`)	`BOOL_OR` / `LOGICAL_OR`	`BOOL_OR`
`any_value` (now `dbt.any_value`)	`ANY_VALUE`	`ANY()` (Postgres)

If you’re on a non-core adapter like Spark or Databricks, you’ll likely need to configure dispatch in your dbt_project.yml to map dbt_utils to your adapter’s shim package (like spark_utils). See my package ecosystem guide for details on dispatch configuration.

Quick reference: what’s where

If you’re unsure whether a macro lives in dbt-utils or dbt-core, here’s the rule of thumb:

Still in dbt-utils: SQL generators (date_spine, deduplicate, star, union_relations, generate_surrogate_key, pivot, unpivot), introspective macros (get_column_values, get_relations_by_pattern), generic tests, web macros, and Jinja helpers.

Moved to dbt-core (dbt.*): All cross-database type casting, date/time functions, string functions, aggregation functions, and set operations.

Removed entirely: surrogate_key(), unique_where, not_null_where, insert_by_period, and the condition argument on expression_is_true.

When in doubt, check the dbt-utils changelog on GitHub for the v1.0.0 migration guide.