SQL generators are the most commonly used part of dbt-utils. They don’t query your database — they generate SQL fragments or complete queries at compile time, based on parameters you pass them. Unlike introspective macros, they don’t have compile-time dependencies on existing relations.
This note covers the generators that reshape data: date_spine, deduplicate, star, union_relations, pivot, and unpivot. For generate_surrogate_key, see the dedicated note — its null handling gotcha is significant enough to warrant separate treatment.
date_spine
Generates a complete date or time series. The canonical use case: a time-series model that needs to show zeros for days with no activity, rather than simply omitting those days.
SELECT *FROM ( {{ dbt_utils.date_spine( datepart="day", start_date="CAST('2024-01-01' AS DATE)", end_date="CAST('2025-01-01' AS DATE)" ) }})The accepted datepart values are: day, week, month, year, hour, minute.
Gotcha: the end date is exclusive. The series above stops at 2024-12-31, not 2025-01-01. If you want a full year including the last day, add one unit to your end date. This catches people every time.
The typical pattern is to join this against your fact table and use COALESCE to fill missing values with zero:
WITH spine AS ( {{ dbt_utils.date_spine( datepart="day", start_date="CAST('2024-01-01' AS DATE)", end_date="CAST('2025-01-01' AS DATE)" ) }}),daily_orders AS ( SELECT DATE(order__created_at) AS order__date, COUNT(*) AS order__count FROM {{ ref('base__shopify__orders') }} GROUP BY 1)
SELECT spine.date_day, COALESCE(daily_orders.order__count, 0) AS order__countFROM spineLEFT JOIN daily_orders ON spine.date_day = daily_orders.order__dateFor more complex date series (fiscal calendars, business days), consider the dbt-date package, which builds on top of date_spine.
deduplicate
Removes duplicate rows by keeping one row per partition, ordered by your criteria:
SELECT *FROM ( {{ dbt_utils.deduplicate( relation=ref('base__crm__contacts'), partition_by='contact__id', order_by='contact__updated_at DESC' ) }})This is the standard way to deduplicate raw source tables where the same record appears multiple times — often because of replication behavior. You keep the most recent version of each entity.
Breaking changes from v0.x to v1.0:
group_bywas renamed topartition_byrelation_aliaswas removed entirelyorder_byis now mandatory — pass1if you genuinely don’t care which row wins
On Snowflake and BigQuery, this compiles to a QUALIFY clause, which is the native, efficient way to express window-function-based filtering on those platforms:
SELECT *FROM base__crm__contactsQUALIFY ROW_NUMBER() OVER (PARTITION BY contact__id ORDER BY contact__updated_at DESC) = 1On Postgres and Redshift (no QUALIFY), it compiles to a ROW_NUMBER() subquery. The semantics are identical; only the SQL shape differs.
Use deduplicate in base-layer models where you’re bringing in a raw snapshot table and need to establish one row per grain. Don’t use it to paper over a model that’s producing duplicates — find and fix the source of the duplication instead.
star
Selects all columns from a relation with optional exclusions:
SELECT {{ dbt_utils.star(from=ref('base__shopify__orders'), except=["_fivetran_synced", "_fivetran_deleted"]) }}FROM {{ ref('base__shopify__orders') }}This is invaluable for base-layer models where you want to pass through all upstream columns while dropping metadata or staging columns you don’t need downstream.
Additional parameters:
relation_alias— adds a table alias prefix to each column (e.g.,orders.order__id)prefix— prepends a string to every column namesuffix— appends a string to every column namequote_identifiers— defaults toTrue; setFalseif your warehouse is case-insensitive and you don’t want quoted names
Critical constraint: the relation must already be materialized. star runs introspection at compile time to discover the column list. If you’re running a model for the first time in a fresh environment, the upstream relation doesn’t exist yet, and the macro will fail. This is the same constraint as all introspective macros.
In practice, this means star works well in base-layer models that sit directly on top of source tables (which always exist before dbt runs), but can be fragile in intermediate models that depend on other dbt models that might not have been built yet.
union_relations
Unions multiple relations, automatically handling columns that exist in one but not another by filling missing columns with NULL:
{{ dbt_utils.union_relations( relations=[ref('base__stripe__charges_us'), ref('base__stripe__charges_eu')], exclude=["_loaded_at"], source_column_name='_dbt_source_relation') }}The source_column_name parameter adds a column indicating which relation each row came from — essential for tracing records back to their source after the union.
A few details worth knowing:
includeandexcludeare mutually exclusive. You can list the columns to keep, or the columns to drop, but not both.- The macro handles column ordering: it builds a consistent column list across all relations and emits
NULL AS column_namefor any column that doesn’t exist in a given relation. - The output is a full
UNION ALL, notUNION DISTINCT. Deduplication is your responsibility downstream.
The most powerful use of union_relations is pairing it with get_relations_by_pattern for dynamic table discovery. This is the standard approach for consolidating BigQuery date-sharded tables:
{% set sharded_tables = dbt_utils.get_relations_by_pattern( 'raw_data_%', 'events_%', exclude='%deprecated') %}
{{ dbt_utils.union_relations(relations=sharded_tables) }}This discovers all tables matching the pattern at compile time and unions them into a single model. As new shards are created, the model picks them up automatically on the next run. See dbt-utils Introspective Macros for the get_relations_by_pattern details.
pivot and unpivot
pivot creates pivot tables using CASE expressions, turning a categorical column’s values into separate columns. The typical use: counting or summing a metric by category, where each category becomes its own column.
SELECT order__date, {{ dbt_utils.pivot( column='order__status', values=dbt_utils.get_column_values(ref('base__shopify__orders'), 'order__status'), agg='COUNT', then_value=1, else_value=0, prefix='status_' ) }}FROM {{ ref('base__shopify__orders') }}GROUP BY 1The values parameter accepts either a hardcoded list or the output of get_column_values. Using get_column_values makes the pivot dynamic — it adapts when new status values appear in the data. The trade-off is the compile-time dependency on an existing relation.
unpivot does the reverse: takes wide data with many columns and converts it to long format with a column name column and a value column:
{{ dbt_utils.unpivot( relation=ref('mrt__sales__quarterly_revenue'), cast_to='FLOAT64', exclude=['region__name'], field_name='quarter', value_name='revenue') }}Migration note from v0.x: The table argument was renamed to relation in v1.0. If you have existing unpivot calls with table=, update them.
The Smaller Helpers
These are genuinely useful but come up less often:
| Macro | What it does | Example |
|---|---|---|
group_by(n) | Outputs GROUP BY 1, 2, ..., n | {{ dbt_utils.group_by(3) }} |
generate_series(upper_bound) | Creates a 1-indexed number series | {{ dbt_utils.generate_series(100) }} |
safe_add(values) | Null-safe addition (list argument required since v1.0) | {{ dbt_utils.safe_add(['col_a', 'col_b']) }} |
safe_divide(numerator, denominator) | Returns NULL on division by zero | {{ dbt_utils.safe_divide('revenue', 'sessions') }} |
safe_subtract(values) | Null-safe subtraction | {{ dbt_utils.safe_subtract(['col_a', 'col_b']) }} |
haversine_distance(lat1, lon1, lat2, lon2, unit) | Distance between two GPS coordinates | unit defaults to 'mi'; use 'km' for kilometers |
width_bucket(expr, min, max, buckets) | Assigns values to equal-width histogram buckets | Useful for distribution analysis |
safe_add and safe_subtract deserve a note: since v1.0 they take a list argument. The old two-argument form no longer works. Use {{ dbt_utils.safe_add(['col_a', 'col_b', 'col_c']) }} instead of {{ dbt_utils.safe_add('col_a', 'col_b') }}.
haversine_distance is niche but there’s usually a moment in analytics work where you need to measure geographic distance — retail catchment areas, delivery radius analysis, proximity matching. This saves writing the formula from scratch.
Cross-Warehouse Compilation
Most SQL generators compile identically across adapters. The exception is deduplicate, which uses QUALIFY on Snowflake and BigQuery but a ROW_NUMBER() subquery on Postgres and Redshift. This is handled automatically by dbt-utils’s dispatch mechanism — you don’t need to configure anything unless you’re on a non-standard adapter like Databricks, in which case you’ll need spark_utils in your dispatch configuration. See dbt Dispatch Configuration for setup details.