dbt-utils is the most depended-on package in the dbt package ecosystem. Nearly every dbt project installs it. But most teams reach for generate_surrogate_key, maybe union_relations, and stop there. The package has 50+ macros and generic tests that cover SQL generation, data introspection, quality validation, and cross-warehouse compatibility.
This is a complete reference for dbt-utils v1.3.3 (the current release, compatible with dbt Core 1.x and Fusion 2.x). If you’re looking for a curated selection of the most useful ones, see my guide to essential dbt macros.
The v1.0 migration: what moved to dbt-core
The biggest change in dbt-utils history happened with v1.0. All cross-database macros were removed from dbt-utils and moved to the dbt namespace in dbt-core. If you’re still calling dbt_utils.datediff(), you’re using a path that no longer exists in current versions. Replace it with dbt.datediff().
Here’s the full list of migrated macros (use dbt.* instead of dbt_utils.*):
Date and time: dateadd, datediff, date_trunc, last_day, current_timestamp
String: concat, length, position, replace, right, split_part, string_literal, escape_single_quotes
Type casting: safe_cast, cast (dbt v1.8+), type_bigint, type_float, type_int, type_numeric, type_string, type_timestamp
Aggregation: any_value, bool_or, listagg
Set operations: except, intersect
Other: hash, cast_bool_to_text, array_append, array_concat, array_construct (dbt-core 1.3+), date (dbt v1.8+)
These macros also no longer exist in dbt-utils at all:
surrogate_key()replaced bygenerate_surrogate_key()unique_whereandnot_null_wheretests replaced by the nativewhereconfig on built-in testsinsert_by_periodmaterialization moved to the experimental-features repoconditionargument onexpression_is_truereplaced by nativewhereconfig
SQL generation helpers
These macros generate SQL fragments or complete queries. They’re the most commonly used part of dbt-utils.
date_spine
Generates a complete date or time series. Useful for filling gaps in time-series data.
SELECT *FROM ( {{ dbt_utils.date_spine( datepart="day", start_date="CAST('2024-01-01' AS DATE)", end_date="CAST('2025-01-01' AS DATE)" ) }})Accepts datepart values: day, week, month, year, hour, minute.
Gotcha: the end date is exclusive. The series above stops at 2024-12-31, not 2025-01-01.
deduplicate
Removes duplicate rows by keeping one row per partition, ordered by your criteria:
SELECT *FROM ( {{ dbt_utils.deduplicate( relation=ref('base__crm__contacts'), partition_by='contact__id', order_by='contact__updated_at DESC' ) }})Breaking changes from v1.0: group_by was renamed to partition_by, relation_alias was removed, and order_by is now mandatory (pass 1 if you don’t care about ordering). On Snowflake and BigQuery, this compiles to a QUALIFY clause. On other warehouses, it uses a ROW_NUMBER subquery.
star
Selects all columns from a relation with optional exclusions:
SELECT {{ dbt_utils.star(from=ref('base__shopify__orders'), except=["_fivetran_synced", "_fivetran_deleted"]) }}FROM {{ ref('base__shopify__orders') }}Supports relation_alias, prefix, suffix, and quote_identifiers (default True). Runs introspection at compile time, so the relation must already be materialized.
union_relations
Unions multiple relations, automatically handling columns that exist in one but not another (fills missing columns with NULL):
{{ dbt_utils.union_relations( relations=[ref('base__stripe__charges_us'), ref('base__stripe__charges_eu')], exclude=["_loaded_at"], source_column_name='_dbt_source_relation') }}The include and exclude parameters are mutually exclusive. Pair this with get_relations_by_pattern for dynamic table discovery (useful for sharded tables).
generate_surrogate_key
Creates a hash-based surrogate key from multiple columns:
SELECT {{ dbt_utils.generate_surrogate_key(['order__id', 'order__line_number']) }} AS order_line__surrogate_key, order__id, order__line_number, product__id, line__quantity, line__priceFROM {{ ref('base__shopify__order_lines') }}Critical gotcha: if you’re migrating from the old surrogate_key() macro to generate_surrogate_key(), the hash values change for rows containing nulls. This breaks incremental models and snapshots that used the old key. To maintain backward compatibility, add this to your dbt_project.yml:
vars: surrogate_key_treat_nulls_as_empty_strings: truepivot and unpivot
pivot creates pivot tables using CASE expressions:
SELECT order__date, {{ dbt_utils.pivot( column='order__status', values=dbt_utils.get_column_values(ref('base__shopify__orders'), 'order__status'), agg='COUNT', then_value=1, else_value=0, prefix='status_' ) }}FROM {{ ref('base__shopify__orders') }}GROUP BY 1unpivot reverses the operation. Note: the table argument was renamed to relation in v1.0.
{{ dbt_utils.unpivot( relation=ref('mrt__sales__quarterly_revenue'), cast_to='FLOAT64', exclude=['region__name'], field_name='quarter', value_name='revenue') }}Other SQL generators
| Macro | What it does | Example |
|---|---|---|
group_by(n) | Outputs GROUP BY 1, 2, ..., n | {{ dbt_utils.group_by(3) }} |
generate_series(upper_bound) | Creates a 1-indexed number series | {{ dbt_utils.generate_series(100) }} |
safe_add(values) | Null-safe addition (list argument required since v1.0) | {{ dbt_utils.safe_add(['col_a', 'col_b']) }} |
safe_divide(numerator, denominator) | Returns NULL on division by zero | {{ dbt_utils.safe_divide('revenue', 'sessions') }} |
safe_subtract(values) | Null-safe subtraction | {{ dbt_utils.safe_subtract(['col_a', 'col_b']) }} |
haversine_distance(lat1, lon1, lat2, lon2, unit) | Distance between two coordinates | unit defaults to 'mi', use 'km' for kilometers |
width_bucket(expr, min, max, buckets) | Assigns values to equal-width histogram buckets | Useful for distribution analysis |
Introspective macros
These macros query your database at compile time to inform your SQL. They’re powerful but come with a catch: they run queries before your models execute, so the target relations need to already exist.
get_column_values
Retrieves distinct values from a column. Commonly paired with pivot:
{% set status_values = dbt_utils.get_column_values( table=ref('base__shopify__orders'), column='order__status', order_by='COUNT(*) DESC', max_records=50, default=['pending', 'completed', 'cancelled']) %}Gotcha: since this runs at compile time, it fails if the model doesn’t exist yet (first run, new environment). Always provide a default list as a fallback.
get_filtered_columns_in_relation
Returns column names from a relation, with optional exclusions. Only works on materialized relations, not CTEs. Returns an empty list in parsing mode (fixed in v1.3.2).
get_relations_by_pattern
Discovers relations matching SQL wildcards:
{% set sharded_tables = dbt_utils.get_relations_by_pattern( 'analytics_%', 'events_%', exclude='%deprecated') %}
{{ dbt_utils.union_relations(relations=sharded_tables) }}This is the standard approach for consolidating sharded tables (common with BigQuery date-sharded exports).
get_query_results_as_dict and get_single_value
get_query_results_as_dict runs arbitrary SQL and returns results as a dictionary. get_single_value (new in v1.0) returns a single scalar value. Both run at compile time.
{% set row_count = dbt_utils.get_single_value( "SELECT COUNT(*) FROM " ~ ref('base__shopify__orders')) %}Use these sparingly. Every introspective macro adds a query to your compile step, which slows down dbt compile and dbt run.
Generic tests
dbt-utils ships with generic tests that go well beyond the built-in unique, not_null, accepted_values, and relationships. For Fusion (dbt 2.0) and dbt Core 1.10.6+, test arguments must be nested under an arguments: key in YAML.
Key tests
unique_combination_of_columns tests composite uniqueness:
models: - name: mrt__finance__daily_revenue data_tests: - dbt_utils.unique_combination_of_columns: combination_of_columns: - revenue__date - revenue__currencyaccepted_range validates that values fall within bounds:
columns: - name: order__total_amount data_tests: - dbt_utils.accepted_range: min_value: 0 inclusive: trueexpression_is_true evaluates any SQL expression per row:
models: - name: mrt__finance__invoices data_tests: - dbt_utils.expression_is_true: expression: "invoice__total_amount >= invoice__tax_amount" where: "invoice__status != 'voided'"Note: the old condition argument was removed in v1.0. Use the native where config instead.
recency checks that data is fresh:
models: - name: base__stripe__charges data_tests: - dbt_utils.recency: datepart: hour field: charge__created_at interval: 24Complete test reference
| Test | Purpose |
|---|---|
unique_combination_of_columns | Composite uniqueness |
accepted_range | Value within min/max bounds |
expression_is_true | Arbitrary SQL expression per row |
recency | Data freshness by datepart/interval |
at_least_one | At least one non-null value exists |
not_constant | Values vary (not all the same) |
not_null_proportion | Threshold-based null check (e.g., at_least: 0.95) |
relationships_where | Filtered referential integrity |
mutually_exclusive_ranges | No overlapping ranges (both bounds must be NOT NULL) |
sequential_values | Continuity check (no gaps) |
equal_rowcount | Two relations have the same row count |
fewer_rows_than | Relation A has fewer rows than B |
equality | Full model comparison (with optional precision for numerics) |
not_accepted_values | Inverse of accepted_values |
not_empty_string | Non-empty strings (with trim_whitespace option) |
cardinality_equality | Same number of distinct values across two columns |
Several tests support a group_by_columns parameter for segmented validation: equal_rowcount, fewer_rows_than, recency, at_least_one, not_constant, sequential_values, and not_null_proportion.
For a broader look at how these tests fit into a testing strategy, I covered that in a separate article.
Web macros and Jinja helpers
dbt-utils includes three URL extraction macros that are surprisingly useful for marketing analytics:
get_url_parameter(field, url_parameter)extracts query parameters from URLsget_url_host(field)extracts the hostname (handleshttp,https, andandroid-appprotocols)get_url_path(field)extracts the page path
SELECT {{ dbt_utils.get_url_host('page__url') }} AS page__host, {{ dbt_utils.get_url_path('page__url') }} AS page__path, {{ dbt_utils.get_url_parameter('page__url', 'utm_source') }} AS session__utm_sourceFROM {{ ref('base__ga4__events') }}For Jinja development and debugging, log_info(message) prints to the console during compilation, pretty_time(format) returns a formatted timestamp, and slugify(string) converts text to URL-safe slugs.
Cross-warehouse behavior
dbt-utils uses adapter.dispatch() to generate warehouse-specific SQL under the hood. Most of the time this is transparent, but some macros compile differently depending on your adapter:
| Macro | Snowflake/BigQuery | Postgres/Redshift |
|---|---|---|
deduplicate | QUALIFY ROW_NUMBER() | ROW_NUMBER() subquery |
safe_cast (now dbt.safe_cast) | TRY_CAST / SAFE_CAST | Regular CAST |
listagg (now dbt.listagg) | LISTAGG | STRING_AGG |
bool_or (now dbt.bool_or) | BOOL_OR / LOGICAL_OR | BOOL_OR |
any_value (now dbt.any_value) | ANY_VALUE | ANY() (Postgres) |
If you’re on a non-core adapter like Spark or Databricks, you’ll likely need to configure dispatch in your dbt_project.yml to map dbt_utils to your adapter’s shim package (like spark_utils). See my package ecosystem guide for details on dispatch configuration.
Quick reference: what’s where
If you’re unsure whether a macro lives in dbt-utils or dbt-core, here’s the rule of thumb:
Still in dbt-utils: SQL generators (date_spine, deduplicate, star, union_relations, generate_surrogate_key, pivot, unpivot), introspective macros (get_column_values, get_relations_by_pattern), generic tests, web macros, and Jinja helpers.
Moved to dbt-core (dbt.*): All cross-database type casting, date/time functions, string functions, aggregation functions, and set operations.
Removed entirely: surrogate_key(), unique_where, not_null_where, insert_by_period, and the condition argument on expression_is_true.
When in doubt, check the dbt-utils changelog on GitHub for the v1.0.0 migration guide.