ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Cross-Database Array Operations

How array syntax diverges across BigQuery, Snowflake, and Databricks — UNNEST vs LATERAL FLATTEN vs EXPLODE — and dispatch macros to handle it.

Planted
dbtbigquerysnowflakedatabricksdata engineeringdata modeling

Array operations differ more fundamentally across warehouses than date functions or type casting. Date and casting differences are mostly argument ordering; array operations use completely different syntax and keywords, with no syntactic similarity across platforms.

dbt provides no built-in cross-database macros for arrays. The dispatch pattern is the standard approach.

The Syntax Landscape

OperationBigQuerySnowflakeDatabricks
FlattenUNNEST(array)LATERAL FLATTEN(input => array)EXPLODE(array)
Create[1, 2, 3]ARRAY_CONSTRUCT(1, 2, 3)ARRAY(1, 2, 3)
Containsvalue IN UNNEST(array)ARRAY_CONTAINS(value, array)ARRAY_CONTAINS(array, value)

Even ARRAY_CONTAINS can’t agree on argument order between Snowflake and Databricks. BigQuery sidesteps the function entirely with its IN UNNEST pattern.

The flattening differences are the most impactful because they affect the FROM clause structure itself. BigQuery’s UNNEST joins implicitly with a comma. Snowflake uses LATERAL FLATTEN as a table function. Databricks uses LATERAL VIEW EXPLODE, a Hive-inherited syntax. These aren’t just different function names — they’re different grammatical structures in the query.

Dispatch Macro for Array Flattening

Here’s a dispatch macro that handles flattening arrays in a FROM clause:

{% macro flatten_array(table_alias, array_column, element_alias) %}
{{ return(adapter.dispatch('flatten_array', 'my_project')(
table_alias, array_column, element_alias
)) }}
{% endmacro %}
{% macro bigquery__flatten_array(table_alias, array_column, element_alias) %}
{{ table_alias }}, UNNEST({{ array_column }}) AS {{ element_alias }}
{% endmacro %}
{% macro snowflake__flatten_array(table_alias, array_column, element_alias) %}
{{ table_alias }}, LATERAL FLATTEN(input => {{ array_column }}) AS {{ element_alias }}
{% endmacro %}
{% macro databricks__flatten_array(table_alias, array_column, element_alias) %}
{{ table_alias }} LATERAL VIEW EXPLODE({{ array_column }}) AS {{ element_alias }}
{% endmacro %}

Notice that the second argument to adapter.dispatch() is the project name ('my_project'). This is required when you want the dispatch search order to apply correctly, especially if this macro lives in a package.

Usage in a model:

SELECT
e.event_id,
param.value AS param_value
FROM {{ ref('base__app__events') }} e
{{ flatten_array('e', 'e.event_params', 'param') }}

This compiles to UNNEST on BigQuery, LATERAL FLATTEN on Snowflake, or LATERAL VIEW EXPLODE on Databricks. The model itself reads cleanly — the database-specific complexity is hidden behind the macro.

Why Arrays Are Common in Analytics Data

Array columns appear frequently in analytics engineering, particularly in event data. GA4’s BigQuery export stores event parameters as nested repeated fields that require UNNEST to query. Snowflake’s semi-structured data from JSON ingestion often contains arrays. Databricks processes arrays from Kafka streams and nested Parquet files.

If you’re building a dbt package that handles event data, array flattening is likely the first cross-database problem you’ll hit. It’s also the one where copy-pasting a Snowflake-specific pattern from Stack Overflow into a BigQuery project causes the most confusion, because the syntax is so fundamentally different that the error messages don’t help you understand what went wrong.

Nested Array Access

Flattening is just the first operation. Once you’ve flattened, you often need to access fields within the flattened elements. This diverges too:

On BigQuery, UNNEST produces a row per element, and you access struct fields directly:

SELECT
event_name,
param.key,
param.value.string_value
FROM events, UNNEST(event_params) AS param

On Snowflake, LATERAL FLATTEN produces a result with metadata columns. The actual value lives in value:

SELECT
event_name,
f.value:key::STRING AS param_key,
f.value:value:string_value::STRING AS param_value
FROM events, LATERAL FLATTEN(input => event_params) f

On Databricks, EXPLODE works similarly to BigQuery for simple arrays but uses different syntax for struct access:

SELECT
event_name,
param.key,
param.value.string_value
FROM events
LATERAL VIEW EXPLODE(event_params) AS param

The access patterns after flattening are different enough that a single dispatch macro for flattening won’t fully abstract the problem. For complex nested data, you may need dispatch macros for the field access patterns too, or design your base models to normalize the arrays into flat columns early in the DAG so downstream models don’t need to deal with array syntax at all.

The Pragmatic Approach

For most teams, the pragmatic strategy is:

  1. Flatten arrays in the base layer of your three-layer architecture. One model per source that handles all the array operations.
  2. Use dispatch macros in those base models so they’re portable.
  3. Keep intermediate and mart layers array-free. Downstream models work with flat, relational data that’s identical across databases.

This concentrates the cross-database complexity in a small number of models rather than spreading it throughout the project. When you add support for a new database, you only need to update the base layer dispatch implementations.