ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt-utils Introspective Macros

How dbt-utils compile-time introspection macros work — get_column_values, get_relations_by_pattern, get_query_results_as_dict, and get_single_value — and when they cause problems.

Planted
dbtdata engineeringdata modeling

Introspective macros query your database at compile time to inform your SQL. They’re more powerful than static SQL generators — they let your models adapt to the actual state of your data. But that power comes with a constraint: the target relations must already exist when you run dbt compile or dbt run.

This is the fundamental trade-off with all introspective macros. Plan around it.

How Compile-Time Introspection Works

When dbt runs, it has two phases:

  1. Compile phase — Jinja templates are evaluated, macros run, and SQL is generated. This is when introspective macros execute their database queries.
  2. Execute phase — The compiled SQL is sent to the warehouse and run.

Introspective macros execute database queries during phase 1, before any models have been built in the current run. That means:

  • Source tables are fine (they always exist)
  • Upstream dbt models that were built in previous runs are fine
  • Upstream dbt models that haven’t been built yet will cause the macro to fail

This is not a bug — it’s the intentional constraint of compile-time introspection. Any macro that queries the database during execute rather than compile time is a different beast (that’s what run_query() in model SQL does).

get_column_values

Retrieves distinct values from a column. The canonical use case is feeding a pivot macro with the current set of category values:

{% set status_values = dbt_utils.get_column_values(
table=ref('base__shopify__orders'),
column='order__status',
order_by='COUNT(*) DESC',
max_records=50,
default=['pending', 'completed', 'cancelled']
) %}

Parameters worth knowing:

  • order_by — controls the order of returned values, which determines column order in a pivot
  • max_records — caps the number of distinct values returned
  • default — the fallback list if the query fails (see below)

Always provide a default list. The introspection query runs against the warehouse at compile time. If the model doesn’t exist yet — first run in a new environment, new model, CI with a fresh schema — the query fails and dbt errors out. The default list prevents this by giving the macro something to return when the relation isn’t there.

-- This will break on first run in a fresh environment
{% set statuses = dbt_utils.get_column_values(
table=ref('base__shopify__orders'),
column='order__status'
) %}
-- This is safe
{% set statuses = dbt_utils.get_column_values(
table=ref('base__shopify__orders'),
column='order__status',
default=['pending', 'confirmed', 'shipped', 'delivered', 'cancelled']
) %}

The default list is also what runs when dbt parses the project (the parse phase), since the execute flag is false during parsing. So the default list determines what SQL gets compiled during dbt parse, and the live query determines what SQL gets compiled during dbt run.

get_filtered_columns_in_relation

Returns column names from a relation, with optional exclusions. Like star, it’s useful for building dynamic column lists:

{% set columns = dbt_utils.get_filtered_columns_in_relation(
from=ref('base__shopify__orders'),
except=['_fivetran_synced', '_loaded_at']
) %}

Constraints:

  • Only works on materialized relations, not CTEs
  • Returns an empty list during parse mode (this was a bug, fixed in v1.3.2)

In practice, star in the SELECT clause handles most cases where you want to select all columns except a few. get_filtered_columns_in_relation is useful when you need the column list as a Jinja variable to do something else with it — passing to another macro, generating custom SQL per column, etc.

get_relations_by_pattern

Discovers relations matching a SQL wildcard pattern. This is the standard approach for consolidating date-sharded tables, which are common with BigQuery GA4 exports and other Google products:

{% set sharded_tables = dbt_utils.get_relations_by_pattern(
schema_pattern='analytics_%',
table_pattern='events_%',
exclude='%deprecated'
) %}
{{ dbt_utils.union_relations(relations=sharded_tables) }}

The pattern arguments use SQL wildcard syntax: % matches any sequence of characters. The exclude parameter applies as a second filter after the initial pattern match.

The returned value is a list of relation objects, which you can pass directly to union_relations or iterate over in Jinja.

When this is worth the complexity: When you have genuinely dynamic table names that grow over time. BigQuery date-sharded exports (events_20240101, events_20240102, …) are the prime example — new shards appear daily, and a pattern-based approach handles them automatically. If you have a fixed set of tables with predictable names, just hardcode them in a union_relations call.

The discovery query runs at compile time, so the tables need to exist when you run dbt. This is usually fine for sharded tables since they’re populated externally.

get_query_results_as_dict and get_single_value

These run arbitrary SQL and return the results as Jinja data structures.

get_single_value (added in v1.0) returns a single scalar value. Useful for dynamic thresholds:

{% set max_date = dbt_utils.get_single_value(
"SELECT MAX(order__created_at) FROM " ~ ref('base__shopify__orders')
) %}

get_query_results_as_dict runs arbitrary SQL and returns a dictionary. The query must return exactly two columns: the first becomes the key, the second the value.

{% set exchange_rates = dbt_utils.get_query_results_as_dict(
"SELECT currency__code, exchange__rate FROM " ~ ref('base__finance__exchange_rates')
) %}

Both macros run their queries at compile time. The same first-run constraint applies: the referenced relations must already exist.

Use these sparingly. Every introspective macro adds a query to your compile step. A project with ten introspective macros runs ten queries before any model builds. On a slow warehouse or with cold connections, this materially slows down dbt compile and dbt run. The query count also makes CI pipelines harder to reason about.

The performance cost is worth paying for dynamic schema discovery (get_relations_by_pattern) and pivot configurations (get_column_values). It’s usually not worth paying for things you could just hardcode or compute in the model’s own SQL.

The Execute Guard

If you’re writing custom introspective macros that call run_query(), always include the execute guard:

{% macro my_introspective_macro(table, column) %}
{% if not execute %}
{{ return([]) }}
{% endif %}
{% set query %}
SELECT DISTINCT {{ column }} FROM {{ table }}
{% endset %}
{% set results = run_query(query) %}
{{ return(results.columns[0].values()) }}
{% endmacro %}

The dbt-utils macros handle this internally. But if you’re extending or wrapping them, you need to understand why the guard exists: during dbt’s parse phase, execute is False. Any run_query() call without this guard will attempt to run against the database during parsing, when table references might not be fully resolved yet. The guard makes the macro return a safe empty value during parse and run the real query during execution.

This pattern is covered in more depth in dbt Macros.