The SQL patterns for unnesting GA4 data translate directly into dbt models. A well-structured dbt project separates GA4 concerns into base, intermediate, and mart layers, with each layer handling a specific responsibility. The base layer unnests. The intermediate layer filters and reshapes. The mart layer aggregates.
This separation matters because GA4 unnesting is expensive and repetitive. If every mart model independently reaches into event_params with correlated subqueries, you’re paying the nested scan cost for every downstream table. Concentrate the unnesting in one place, materialize it, and let everything downstream work with flat columns.
Base Layer: The Unnesting Model
The base model does two things: converts sharded tables to a partitioned table, and extracts commonly used parameters from the nested arrays.
-- models/base/ga4/base__ga4_bigquery__events.sql{{ config( materialized='incremental', partition_by={'field': 'event_date', 'data_type': 'date'}, incremental_strategy='insert_overwrite', cluster_by=['event_name'] )}}
WITH source AS ( SELECT event_date, event_timestamp, event_name, event_params, user_pseudo_id, user_properties, traffic_source, device, geo, ecommerce, items FROM {{ source('ga4', 'events') }} WHERE TRUE {% if is_incremental() %} AND PARSE_DATE('%Y%m%d', _TABLE_SUFFIX) >= DATE_SUB( (SELECT MAX(event_date) FROM {{ this }}), INTERVAL 3 DAY ) {% else %} AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)) {% endif %})
SELECT -- Primary keys CONCAT( user_pseudo_id, '-', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'), '-', event_timestamp ) AS event__key,
CONCAT( user_pseudo_id, '-', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') ) AS session__key,
-- Timestamps PARSE_DATE('%Y%m%d', event_date) AS event__date, TIMESTAMP_MICROS(event_timestamp) AS event__timestamp,
-- Event identification event_name AS event__name, user_pseudo_id AS user__pseudo_id,
-- Common parameters (unnested) (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS event__ga_session_id, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS event__page_location, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS event__page_title, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_referrer') AS event__page_referrer, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS event__engagement_time_msec, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') AS event__session_engaged, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'entrances') AS event__entrances,
-- Traffic source traffic_source.source AS event__traffic_source, traffic_source.medium AS event__traffic_medium, traffic_source.name AS event__traffic_campaign,
-- Device and geo device.category AS event__device_category, device.operating_system AS event__operating_system, device.web_info.browser AS event__browser, geo.country AS event__country, geo.city AS event__city,
-- E-commerce (keep nested for intermediate layer) ecommerce, items,
-- Keep original arrays for custom extraction event_params, user_properties
FROM sourceSeveral design decisions here deserve explanation:
The event__key includes event_timestamp. GA4 doesn’t provide a unique event ID. The composite of user + session + timestamp is the closest approximation. It’s not perfectly unique — two events from the same user in the same session at the same microsecond produce a collision — but it’s sufficient for most practical purposes.
The 3-day lookback window in the incremental logic re-processes recent partitions to catch GA4’s late-arriving data. GA4 can take up to 72 hours to finalize daily tables. Without the lookback, you’d miss events that arrive after the initial export.
The ecommerce, items, event_params, and user_properties fields are preserved. The base model flattens the most common parameters but keeps the original nested structures for downstream models that need access to less common parameters or the items array. This avoids the need to join back to the raw source.
The Parameter Extraction Macro
The correlated subqueries above are repetitive. A reusable macro makes the base model more readable:
-- macros/unnest_event_param.sql{% macro unnest_event_param(param_key, value_type='string') %} (SELECT value.{{ value_type }}_value FROM UNNEST(event_params) WHERE key = '{{ param_key }}'){% endmacro %}Usage:
SELECT {{ unnest_event_param('page_location', 'string') }} AS event__page_location, {{ unnest_event_param('ga_session_id', 'int') }} AS event__ga_session_id, {{ unnest_event_param('value', 'double') }} AS event__valueFROM {{ source('ga4', 'events') }}For parameters with uncertain types, use the COALESCE variant.
Intermediate Layer: Event-Specific Models
The intermediate layer breaks out specific event types for easier downstream consumption. Each model filters the base table to a single event type and applies event-specific transformations:
-- models/intermediate/ga4/int__ga4_events__page_views.sqlSELECT event__key, session__key, event__date, event__timestamp, user__pseudo_id, event__page_location, event__page_title, event__page_referrer, REGEXP_REPLACE( REGEXP_REPLACE(event__page_location, r'^https?://[^/]+', ''), r'[\?#].*$', '' ) AS event__page_path, IFNULL(event__entrances, 0) = 1 AS event__is_landing_page, event__traffic_source, event__traffic_medium, event__device_category, event__countryFROM {{ ref('base__ga4_bigquery__events') }}WHERE event__name = 'page_view'-- models/intermediate/ga4/int__ga4_events__purchases.sqlSELECT event__key, session__key, event__date, event__timestamp, user__pseudo_id, ecommerce.transaction_id AS purchase__transaction_id, ecommerce.purchase_revenue AS purchase__revenue, ecommerce.total_item_quantity AS purchase__total_item_quantity, ecommerce.shipping_value AS purchase__shipping_value, ecommerce.tax_value AS purchase__tax_value, event__traffic_source, event__traffic_medium, event__device_category, event__country, itemsFROM {{ ref('base__ga4_bigquery__events') }}WHERE event__name = 'purchase' AND ecommerce.transaction_id IS NOT NULLThe purchase items model uses the items UNNEST pattern at this layer:
-- models/intermediate/ga4/int__ga4_events__purchase_items.sqlSELECT e.event__key, e.session__key, e.event__date, e.ecommerce.transaction_id AS purchase__transaction_id, item.item_id AS item__id, item.item_name AS item__name, item.item_brand AS item__brand, item.item_category AS item__category, item.item_category2 AS item__category2, item.item_category3 AS item__category3, item.price AS item__price, item.quantity AS item__quantity, item.item_revenue AS item__revenueFROM {{ ref('base__ga4_bigquery__events') }} AS e, UNNEST(items) AS itemWHERE event__name = 'purchase'This is the one place where FROM clause UNNEST is intentional — the item grain model exists specifically to expand the items array. The CROSS JOIN is safe here because purchase events always have items.
Mart Layer: Analytics-Ready Aggregations
The mart layer aggregates into business-ready tables:
-- models/marts/ga4/mrt__analytics__sessions.sql{{ config( materialized='table', partition_by={'field': 'session__date', 'data_type': 'date'}, cluster_by=['session__traffic_source', 'session__traffic_medium'] )}}
SELECT session__key, ANY_VALUE(user__pseudo_id) AS user__pseudo_id, MIN(event__date) AS session__date, MIN(event__timestamp) AS session__started_at, MAX(event__timestamp) AS session__ended_at, TIMESTAMP_DIFF(MAX(event__timestamp), MIN(event__timestamp), SECOND) AS session__duration_seconds, COUNT(*) AS session__events, COUNTIF(event__name = 'page_view') AS session__pageviews, SUM(event__engagement_time_msec) / 1000 AS session__engagement_seconds, MAX(event__session_engaged) = '1' AS session__is_engaged, COUNTIF(event__name = 'purchase') > 0 AS session__is_converted, SUM(CASE WHEN event__name = 'purchase' THEN ecommerce.purchase_revenue ELSE 0 END) AS session__revenue, ANY_VALUE(event__traffic_source) AS session__traffic_source, ANY_VALUE(event__traffic_medium) AS session__traffic_medium, ANY_VALUE(event__traffic_campaign) AS session__traffic_campaign, ANY_VALUE(event__device_category) AS session__device_category, ANY_VALUE(event__country) AS session__countryFROM {{ ref('base__ga4_bigquery__events') }}GROUP BY session__keyNotice that the mart references the base model directly, not the intermediate models. The session mart needs all events (not just page views or purchases), so it groups the base layer by session__key. The intermediate models are most useful for event-specific marts (page performance, product performance) or when other marts join to them.
Testing the Unnesting
Add tests that catch common UNNEST errors:
version: 2
models: - name: base__ga4_bigquery__events columns: - name: event__key tests: - unique - not_null - name: session__key tests: - not_nullThe most insidious UNNEST error in e-commerce is revenue multiplication. A custom test catches it:
-- tests/assert_revenue_not_multiplied.sqlWITH order_revenue AS ( SELECT purchase__transaction_id, purchase__revenue FROM {{ ref('int__ga4_events__purchases') }}),
item_revenue AS ( SELECT purchase__transaction_id, SUM(item__revenue) AS total_item_revenue FROM {{ ref('int__ga4_events__purchase_items') }} GROUP BY purchase__transaction_id)
SELECT o.purchase__transaction_id, o.purchase__revenue, i.total_item_revenue, ABS(o.purchase__revenue - i.total_item_revenue) AS differenceFROM order_revenue AS oJOIN item_revenue AS i ON o.purchase__transaction_id = i.purchase__transaction_idWHERE ABS(o.purchase__revenue - i.total_item_revenue) > 0.01This test fails if the sum of item-level revenue doesn’t match the order-level revenue — a strong signal that something in the UNNEST logic is duplicating or losing rows.
The Naming Convention
The double-underscore prefix in column names (event__page_location, session__key, purchase__revenue) serves two purposes. It namespaces columns by their grain or domain, making it obvious what each column represents when you see it in a downstream query. And it prevents naming collisions when multiple models are joined — event__date and session__date can coexist without ambiguity.
This convention is a choice, not a requirement. Consistency within the project matters more than the specific scheme; whichever pattern is chosen should be applied uniformly across all GA4 models.