Adrienne Vermorel
Unnesting GA4 Events: Patterns for Every Use Case
The first time you query GA4 data in BigQuery, you write something like SELECT page_location FROM events and get an error. There is no page_location column. The data you need is buried inside an array called event_params, and extracting it requires a SQL pattern that most analysts have never encountered: UNNEST.
This article provides production-ready SQL patterns for every common GA4 scenario, from basic page views to complex e-commerce funnels, plus dbt model templates to operationalize these patterns at scale. Whether you read the schema reference in Article 1 or you’re starting here, you’ll leave with copy-paste queries that actually work.
Why GA4 data requires unnesting
GA4 stores event parameters in repeated RECORD fields rather than flat columns. A single page_view event doesn’t have a page_location column. Instead, it has an event_params array containing key-value pairs like {key: 'page_location', value: {string_value: 'https://example.com/products'}}.
This design choice wasn’t arbitrary. Nested structures allow GA4 to store any number of custom parameters without schema changes, and BigQuery’s columnar storage processes nested data more efficiently than equivalent flat tables.
Three nested arrays appear in every GA4 export:
- event_params: All event-level parameters (page_location, ga_session_id, custom params)
- user_properties: User-scoped custom dimensions set via gtag or Firebase
- items: Product data for e-commerce events (item_name, item_id, price, quantity)
Each parameter stores its value in one of four typed fields: string_value, int_value, float_value, or double_value. Only one field is populated based on the detected data type. Query the wrong one and you get NULL.
The two UNNEST approaches every analyst must know
There are exactly two ways to extract data from nested arrays in BigQuery. Choosing the wrong one is the most common source of incorrect GA4 metrics.
Correlated subqueries preserve row integrity
The recommended approach for extracting specific parameters uses a subquery in the SELECT clause:
SELECT event_name, event_timestamp, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_idFROM `project.dataset.events_*`WHERE event_name = 'page_view' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'This pattern maintains one row per event regardless of how many parameters exist. An event with 20 parameters still produces exactly one output row. The subquery reaches into the array, finds the matching key, and returns the value.
Use correlated subqueries when you know which parameters you need and want to preserve the event-level grain of your data.
FROM clause UNNEST expands arrays into rows
When you need to analyze the parameters themselves (counting occurrences, discovering what parameters exist, or pivoting array contents), use UNNEST in the FROM clause:
SELECT ep.key AS parameter_name, COUNT(*) AS occurrencesFROM `project.dataset.events_*`, UNNEST(event_params) AS epWHERE _TABLE_SUFFIX = '20240115'GROUP BY parameter_nameORDER BY occurrences DESCThis pattern creates one row per array element. An event with 15 parameters becomes 15 rows, each exposing one parameter’s key and value fields.
The row multiplication is the point, since you’re deliberately expanding the array to analyze its contents. This same behavior causes errors when misapplied.
The decision is simple
Ask yourself: “Do I need to analyze the array structure itself, or do I just need specific values?”
If you need specific values, use a correlated subquery. If you need to examine or aggregate across array elements, use FROM clause UNNEST. When in doubt, correlated subqueries are safer.
CROSS JOIN versus LEFT JOIN UNNEST
The comma syntax in FROM table, UNNEST(array) is shorthand for CROSS JOIN. This matters because CROSS JOIN silently drops rows where the array is empty or NULL.
-- Implicit CROSS JOIN: Events without items disappearSELECT event_name, item.item_nameFROM `project.dataset.events_*`, UNNEST(items) AS itemWHERE event_name IN ('purchase', 'session_start')If you run this query expecting to see both purchases and session starts, you’ll only see purchases. Session start events have no items, so CROSS JOIN excludes them entirely.
Use LEFT JOIN when you need to preserve all rows:
-- LEFT JOIN: All events appear, NULL for missing itemsSELECT event_name, item.item_nameFROM `project.dataset.events_*`LEFT JOIN UNNEST(items) AS itemWHERE event_name IN ('purchase', 'session_start')For event_params, CROSS JOIN is almost always correct since every event has parameters. For items and user_properties, default to LEFT JOIN unless you specifically want to filter to only events containing those arrays.
E-commerce patterns
E-commerce analysis in GA4 requires understanding which data lives in the ecommerce RECORD (pre-aggregated order totals) versus the items array (individual product details).
Order-level metrics without UNNEST
The ecommerce field contains transaction totals that don’t require unnesting:
SELECT PARSE_DATE('%Y%m%d', event_date) AS order_date, COUNT(DISTINCT ecommerce.transaction_id) AS transactions, SUM(ecommerce.purchase_revenue) AS revenue, SUM(ecommerce.total_item_quantity) AS items_sold, SAFE_DIVIDE( SUM(ecommerce.purchase_revenue), COUNT(DISTINCT ecommerce.transaction_id) ) AS average_order_valueFROM `project.dataset.events_*`WHERE event_name = 'purchase' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'GROUP BY order_dateORDER BY order_dateThis query processes millions of purchase events efficiently because it never touches the nested arrays.
Item-level product performance
When you need product-level metrics, UNNEST the items array:
SELECT item.item_id, item.item_name, item.item_category, COUNT(DISTINCT ecommerce.transaction_id) AS orders_containing_item, SUM(item.quantity) AS units_sold, SUM(item.item_revenue) AS product_revenue, SAFE_DIVIDE(SUM(item.item_revenue), SUM(item.quantity)) AS avg_unit_priceFROM `project.dataset.events_*`, UNNEST(items) AS itemWHERE event_name = 'purchase' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'GROUP BY item.item_id, item.item_name, item.item_categoryORDER BY product_revenue DESCNotice that revenue comes from item.item_revenue, not ecommerce.purchase_revenue. If you sum ecommerce.purchase_revenue after unnesting items, you multiply the order total by the number of items in each order.
Add-to-cart and product view analysis
The same items array appears in add_to_cart, view_item, and other e-commerce events:
SELECT item.item_name, item.item_category, COUNTIF(event_name = 'view_item') AS product_views, COUNTIF(event_name = 'add_to_cart') AS add_to_carts, COUNTIF(event_name = 'purchase') AS purchases, SAFE_DIVIDE( COUNTIF(event_name = 'add_to_cart'), COUNTIF(event_name = 'view_item') ) AS view_to_cart_rate, SAFE_DIVIDE( COUNTIF(event_name = 'purchase'), COUNTIF(event_name = 'add_to_cart') ) AS cart_to_purchase_rateFROM `project.dataset.events_*`, UNNEST(items) AS itemWHERE event_name IN ('view_item', 'add_to_cart', 'purchase') AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'GROUP BY item.item_name, item.item_categoryHAVING product_views > 100ORDER BY product_views DESCComplete checkout funnel
Session-based funnel analysis requires extracting ga_session_id and combining it with user_pseudo_id to create a truly unique session identifier:
WITH funnel_events AS ( SELECT CONCAT( user_pseudo_id, '-', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') ) AS session_id, event_name, event_timestamp FROM `project.dataset.events_*` WHERE event_name IN ( 'view_item', 'add_to_cart', 'begin_checkout', 'add_shipping_info', 'add_payment_info', 'purchase' ) AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131')
SELECT COUNT(DISTINCT CASE WHEN event_name = 'view_item' THEN session_id END) AS sessions_viewed_item, COUNT(DISTINCT CASE WHEN event_name = 'add_to_cart' THEN session_id END) AS sessions_added_to_cart, COUNT(DISTINCT CASE WHEN event_name = 'begin_checkout' THEN session_id END) AS sessions_began_checkout, COUNT(DISTINCT CASE WHEN event_name = 'add_shipping_info' THEN session_id END) AS sessions_added_shipping, COUNT(DISTINCT CASE WHEN event_name = 'add_payment_info' THEN session_id END) AS sessions_added_payment, COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN session_id END) AS sessions_purchased, SAFE_DIVIDE( COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN session_id END), COUNT(DISTINCT CASE WHEN event_name = 'view_item' THEN session_id END) ) AS overall_conversion_rateFROM funnel_eventsEngagement event patterns
Beyond e-commerce, GA4 tracks dozens of engagement events. Each requires extracting specific parameters.
Page views with clean URL parsing
SELECT PARSE_DATE('%Y%m%d', event_date) AS event_date, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_url, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS page_title, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_referrer') AS referrer, -- Extract clean path without protocol or query params REGEXP_REPLACE( REGEXP_REPLACE( (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'), r'^https?://[^/]+', '' ), r'[\?#].*$', '' ) AS page_path, -- Identify landing pages IFNULL( (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'entrances'), 0 ) = 1 AS is_landing_pageFROM `project.dataset.events_*`WHERE event_name = 'page_view' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'Session metrics with proper session keys
The ga_session_id parameter is a Unix timestamp that identifies a session, but it’s only unique within a single user. Different users can have the same ga_session_id if they started sessions at the same second. Always concatenate with user_pseudo_id:
WITH session_events AS ( SELECT CONCAT( user_pseudo_id, '-', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') ) AS session_id, user_pseudo_id, event_timestamp, event_name, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS engagement_time_msec, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') AS session_engaged, traffic_source.source, traffic_source.medium, device.category AS device_category, geo.country FROM `project.dataset.events_*` WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131')
SELECT session_id, user_pseudo_id, MIN(TIMESTAMP_MICROS(event_timestamp)) AS session_start, MAX(TIMESTAMP_MICROS(event_timestamp)) AS session_end, TIMESTAMP_DIFF( MAX(TIMESTAMP_MICROS(event_timestamp)), MIN(TIMESTAMP_MICROS(event_timestamp)), SECOND ) AS session__duration_seconds, COUNT(*) AS session__events, COUNTIF(event_name = 'page_view') AS session__pageviews, SUM(engagement_time_msec) / 1000 AS session__engagement_seconds, MAX(session_engaged) = '1' AS session__is_engaged, ANY_VALUE(source) AS source, ANY_VALUE(medium) AS medium, ANY_VALUE(device_category) AS device_category, ANY_VALUE(country) AS countryFROM session_eventsGROUP BY session_id, user_pseudo_idScroll depth analysis
The scroll event fires when users reach 90% scroll depth (by default):
SELECT REGEXP_REPLACE( REGEXP_REPLACE( (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'), r'^https?://[^/]+', '' ), r'[\?#].*$', '' ) AS page_path, COUNT(*) AS scroll_events, COUNT(DISTINCT CONCAT( user_pseudo_id, '-', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') )) AS sessions_with_scrollFROM `project.dataset.events_*`WHERE event_name = 'scroll' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'GROUP BY page_pathORDER BY scroll_events DESCTo calculate scroll rate, join with page view counts:
WITH page_views AS ( SELECT REGEXP_REPLACE( REGEXP_REPLACE( (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'), r'^https?://[^/]+', '' ), r'[\?#].*$', '' ) AS page_path, COUNT(*) AS pageviews FROM `project.dataset.events_*` WHERE event_name = 'page_view' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131' GROUP BY page_path),
scrolls AS ( SELECT REGEXP_REPLACE( REGEXP_REPLACE( (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'), r'^https?://[^/]+', '' ), r'[\?#].*$', '' ) AS page_path, COUNT(*) AS scroll_events FROM `project.dataset.events_*` WHERE event_name = 'scroll' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131' GROUP BY page_path)
SELECT pv.page_path, pv.pageviews, IFNULL(s.scroll_events, 0) AS scroll_events, SAFE_DIVIDE(s.scroll_events, pv.pageviews) AS scroll_rateFROM page_views pvLEFT JOIN scrolls s ON pv.page_path = s.page_pathWHERE pv.pageviews > 100ORDER BY scroll_rate DESCClick and outbound link tracking
SELECT (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_url') AS link_url, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_domain') AS link_domain, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'outbound') AS is_outbound, COUNT(*) AS clicksFROM `project.dataset.events_*`WHERE event_name = 'click' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'GROUP BY link_url, link_domain, is_outboundORDER BY clicks DESCFile download tracking
SELECT (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'file_name') AS file_name, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'file_extension') AS file_extension, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'link_url') AS download_url, COUNT(*) AS downloads, COUNT(DISTINCT user_pseudo_id) AS unique_usersFROM `project.dataset.events_*`WHERE event_name = 'file_download' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'GROUP BY file_name, file_extension, download_urlORDER BY downloads DESCVideo engagement funnel
SELECT (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'video_title') AS video_title, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'video_provider') AS video_provider, COUNTIF(event_name = 'video_start') AS starts, COUNTIF(event_name = 'video_progress' AND (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'video_percent') = 25 ) AS reached_25_pct, COUNTIF(event_name = 'video_progress' AND (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'video_percent') = 50 ) AS reached_50_pct, COUNTIF(event_name = 'video_progress' AND (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'video_percent') = 75 ) AS reached_75_pct, COUNTIF(event_name = 'video_complete') AS completions, SAFE_DIVIDE( COUNTIF(event_name = 'video_complete'), COUNTIF(event_name = 'video_start') ) AS completion_rateFROM `project.dataset.events_*`WHERE event_name IN ('video_start', 'video_progress', 'video_complete') AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'GROUP BY video_title, video_providerHAVING starts > 10ORDER BY starts DESCCustom parameters and type coercion
GA4 parameters can arrive in different value types depending on how they were sent. A parameter configured as a number might be stored as int_value in some events and double_value in others.
Universal parameter extraction with COALESCE
For string output (safe for any parameter type):
SELECT (SELECT COALESCE( value.string_value, CAST(value.int_value AS STRING), CAST(value.double_value AS STRING), CAST(value.float_value AS STRING) ) FROM UNNEST(event_params) WHERE key = 'custom_dimension' ) AS custom_dimensionFROM `project.dataset.events_*`WHERE _TABLE_SUFFIX = '20240115'For numeric calculations:
SELECT (SELECT COALESCE( value.double_value, CAST(value.int_value AS FLOAT64), value.float_value ) FROM UNNEST(event_params) WHERE key = 'custom_metric' ) AS custom_metricFROM `project.dataset.events_*`WHERE _TABLE_SUFFIX = '20240115'User properties extraction
The user_properties array follows the same structure as event_params:
SELECT user_pseudo_id, (SELECT value.string_value FROM UNNEST(user_properties) WHERE key = 'customer_tier') AS customer_tier, (SELECT value.int_value FROM UNNEST(user_properties) WHERE key = 'lifetime_purchases') AS lifetime_purchases, COUNT(*) AS eventsFROM `project.dataset.events_*`WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'GROUP BY user_pseudo_id, customer_tier, lifetime_purchasesCustom item parameters
Since October 2023, GA4 exports custom item-scoped parameters in a nested item_params array within each item. This requires double UNNEST:
SELECT item.item_name, item.item_id, (SELECT value.string_value FROM UNNEST(item.item_params) WHERE key = 'color') AS item_color, (SELECT value.string_value FROM UNNEST(item.item_params) WHERE key = 'size') AS item_size, SUM(item.quantity) AS units_soldFROM `project.dataset.events_*`, UNNEST(items) AS itemWHERE event_name = 'purchase' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'GROUP BY item.item_name, item.item_id, item_color, item_sizeORDER BY units_sold DESCReusable temporary functions
For complex queries extracting many parameters, CREATE TEMP FUNCTION reduces repetition:
CREATE TEMP FUNCTION GetParamString(params ANY TYPE, param_key STRING) AS ( (SELECT value.string_value FROM UNNEST(params) WHERE key = param_key));
CREATE TEMP FUNCTION GetParamInt(params ANY TYPE, param_key STRING) AS ( (SELECT value.int_value FROM UNNEST(params) WHERE key = param_key));
CREATE TEMP FUNCTION GetParamDouble(params ANY TYPE, param_key STRING) AS ( (SELECT COALESCE(value.double_value, CAST(value.int_value AS FLOAT64)) FROM UNNEST(params) WHERE key = param_key));
-- Now queries become more readableSELECT event_name, GetParamString(event_params, 'page_location') AS page_location, GetParamInt(event_params, 'ga_session_id') AS ga_session_id, GetParamDouble(event_params, 'value') AS event_valueFROM `project.dataset.events_*`WHERE event_name = 'purchase' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'Performance optimization strategies
GA4 tables grow large quickly. A medium-traffic site generates millions of events per month, and careless queries can scan terabytes and cost hundreds of dollars.
Always filter on _TABLE_SUFFIX
GA4 exports create daily sharded tables (events_20240115, events_20240116, etc.). The _TABLE_SUFFIX pseudo-column enables partition pruning. Without it, BigQuery scans every table in the dataset:
-- ❌ Scans ALL historical dataSELECT COUNT(*)FROM `project.dataset.events_*`WHERE event_name = 'purchase'
-- ✅ Scans only January 2024SELECT COUNT(*)FROM `project.dataset.events_*`WHERE event_name = 'purchase' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'Avoid Cartesian products from multiple UNNEST
Unnesting multiple arrays in the same query creates a Cartesian product (one event with 15 parameters and 3 items becomes 45 rows):
-- ❌ Dangerous: 15 params × 3 items = 45 rows per eventSELECT ep.key, item.item_nameFROM `project.dataset.events_*`, UNNEST(event_params) AS ep, UNNEST(items) AS itemWHERE event_name = 'purchase'If you need data from multiple arrays, use correlated subqueries or process arrays in separate CTEs:
-- ✅ Safe: One row per item, params extracted via subquerySELECT item.item_name, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id') AS transaction_idFROM `project.dataset.events_*`, UNNEST(items) AS itemWHERE event_name = 'purchase' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'When to materialize intermediate tables
For dashboards or repeated queries, pre-unnesting into a flat table can dramatically reduce costs:
CREATE OR REPLACE TABLE `project.dataset.ga4_events_flat`PARTITION BY event_dateCLUSTER BY event_name, session_idAS ( SELECT PARSE_DATE('%Y%m%d', event_date) AS event_date, TIMESTAMP_MICROS(event_timestamp) AS event_timestamp, event_name, user_pseudo_id, CONCAT( user_pseudo_id, '-', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') ) AS session_id, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS page_title, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS engagement_time_msec, traffic_source.source, traffic_source.medium, traffic_source.name AS campaign, device.category AS device_category, device.operating_system, device.web_info.browser, geo.country, geo.city, ecommerce.transaction_id, ecommerce.purchase_revenue FROM `project.dataset.events_*` WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)));Schedule this to run daily, appending new data, and downstream queries against the flat table run faster and cost less.
dbt model patterns
The SQL patterns above translate directly into dbt models. A well-structured dbt project separates concerns into base, intermediate, and mart layers.
Base layer: unnesting
The base model unnests common parameters while preserving the original arrays for downstream use:
-- models/base/ga4/base__ga4_bigquery__events.sql{{ config( materialized='incremental', partition_by={'field': 'event_date', 'data_type': 'date'}, incremental_strategy='insert_overwrite', cluster_by=['event_name'] )}}
WITH source AS ( SELECT event_date, event_timestamp, event_name, event_params, user_pseudo_id, user_properties, traffic_source, device, geo, ecommerce, items FROM {{ source('ga4', 'events') }} WHERE TRUE {% if is_incremental() %} AND PARSE_DATE('%Y%m%d', _TABLE_SUFFIX) >= DATE_SUB( (SELECT MAX(event_date) FROM {{ this }}), INTERVAL 3 DAY ) {% else %} AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)) {% endif %})
SELECT -- Primary keys CONCAT( user_pseudo_id, '-', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'), '-', event_timestamp ) AS event__key,
CONCAT( user_pseudo_id, '-', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') ) AS session__key,
-- Timestamps PARSE_DATE('%Y%m%d', event_date) AS event__date, TIMESTAMP_MICROS(event_timestamp) AS event__timestamp,
-- Event identification event_name AS event__name, user_pseudo_id AS user__pseudo_id,
-- Common parameters (unnested) (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS event__ga_session_id, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS event__page_location, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS event__page_title, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_referrer') AS event__page_referrer, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS event__engagement_time_msec, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') AS event__session_engaged, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'entrances') AS event__entrances,
-- Traffic source traffic_source.source AS event__traffic_source, traffic_source.medium AS event__traffic_medium, traffic_source.name AS event__traffic_campaign,
-- Device and geo device.category AS event__device_category, device.operating_system AS event__operating_system, device.web_info.browser AS event__browser, geo.country AS event__country, geo.city AS event__city,
-- E-commerce (keep nested for intermediate layer) ecommerce, items,
-- Keep original arrays for custom extraction event_params, user_properties
FROM sourceReusable macro for parameter extraction
-- macros/unnest_event_param.sql{% macro unnest_event_param(param_key, value_type='string') %} (SELECT value.{{ value_type }}_value FROM UNNEST(event_params) WHERE key = '{{ param_key }}'){% endmacro %}Usage in models:
SELECT {{ unnest_event_param('page_location', 'string') }} AS event__page_location, {{ unnest_event_param('ga_session_id', 'int') }} AS event__ga_session_id, {{ unnest_event_param('value', 'double') }} AS event__valueFROM {{ source('ga4', 'events') }}Intermediate layer: event-specific models
Break out specific event types for easier downstream consumption:
-- models/intermediate/ga4/int__ga4_events__page_views.sqlSELECT event__key, session__key, event__date, event__timestamp, user__pseudo_id, event__page_location, event__page_title, event__page_referrer, REGEXP_REPLACE( REGEXP_REPLACE(event__page_location, r'^https?://[^/]+', ''), r'[\?#].*$', '' ) AS event__page_path, IFNULL(event__entrances, 0) = 1 AS event__is_landing_page, event__traffic_source, event__traffic_medium, event__device_category, event__countryFROM {{ ref('base__ga4_bigquery__events') }}WHERE event__name = 'page_view'-- models/intermediate/ga4/int__ga4_events__purchases.sqlSELECT event__key, session__key, event__date, event__timestamp, user__pseudo_id, ecommerce.transaction_id AS purchase__transaction_id, ecommerce.purchase_revenue AS purchase__revenue, ecommerce.total_item_quantity AS purchase__total_item_quantity, ecommerce.shipping_value AS purchase__shipping_value, ecommerce.tax_value AS purchase__tax_value, event__traffic_source, event__traffic_medium, event__device_category, event__country, itemsFROM {{ ref('base__ga4_bigquery__events') }}WHERE event__name = 'purchase' AND ecommerce.transaction_id IS NOT NULL-- models/intermediate/ga4/int__ga4_events__purchase_items.sqlSELECT e.event__key, e.session__key, e.event__date, e.ecommerce.transaction_id AS purchase__transaction_id, item.item_id AS item__id, item.item_name AS item__name, item.item_brand AS item__brand, item.item_category AS item__category, item.item_category2 AS item__category2, item.item_category3 AS item__category3, item.price AS item__price, item.quantity AS item__quantity, item.item_revenue AS item__revenueFROM {{ ref('base__ga4_bigquery__events') }} AS e, UNNEST(items) AS itemWHERE event__name = 'purchase'Mart layer: analytics-ready aggregations
-- models/marts/ga4/mrt__analytics__sessions.sql{{ config( materialized='table', partition_by={'field': 'session__date', 'data_type': 'date'}, cluster_by=['session__traffic_source', 'session__traffic_medium'] )}}
SELECT session__key, ANY_VALUE(user__pseudo_id) AS user__pseudo_id, MIN(event__date) AS session__date, MIN(event__timestamp) AS session__started_at, MAX(event__timestamp) AS session__ended_at, TIMESTAMP_DIFF(MAX(event__timestamp), MIN(event__timestamp), SECOND) AS session__duration_seconds, COUNT(*) AS session__events, COUNTIF(event__name = 'page_view') AS session__pageviews, SUM(event__engagement_time_msec) / 1000 AS session__engagement_seconds, MAX(event__session_engaged) = '1' AS session__is_engaged, COUNTIF(event__name = 'purchase') > 0 AS session__is_converted, SUM(CASE WHEN event__name = 'purchase' THEN ecommerce.purchase_revenue ELSE 0 END) AS session__revenue, ANY_VALUE(event__traffic_source) AS session__traffic_source, ANY_VALUE(event__traffic_medium) AS session__traffic_medium, ANY_VALUE(event__traffic_campaign) AS session__traffic_campaign, ANY_VALUE(event__device_category) AS session__device_category, ANY_VALUE(event__country) AS session__countryFROM {{ ref('base__ga4_bigquery__events') }}GROUP BY session__keyTesting strategies
Add tests to catch common UNNEST errors:
version: 2
models: - name: base__ga4_bigquery__events columns: - name: event__key tests: - unique - not_null - name: session__key tests: - not_null
- name: int__ga4_events__purchases columns: - name: purchase__transaction_id tests: - unique - not_nullCustom test for revenue accuracy:
-- tests/assert_revenue_not_multiplied.sql-- This test fails if item-level revenue doesn't match order-level revenueWITH order_revenue AS ( SELECT purchase__transaction_id, purchase__revenue FROM {{ ref('int__ga4_events__purchases') }}),
item_revenue AS ( SELECT purchase__transaction_id, SUM(item__revenue) AS total_item_revenue FROM {{ ref('int__ga4_events__purchase_items') }} GROUP BY purchase__transaction_id)
SELECT o.purchase__transaction_id, o.purchase__revenue, i.total_item_revenue, ABS(o.purchase__revenue - i.total_item_revenue) AS differenceFROM order_revenue AS oJOIN item_revenue AS i ON o.purchase__transaction_id = i.purchase__transaction_idWHERE ABS(o.purchase__revenue - i.total_item_revenue) > 0.01Conclusion
Unnesting GA4 data follows predictable patterns once you internalize two core concepts: correlated subqueries for extracting specific parameters without row explosion, and explicit JOIN type awareness for arrays that may be empty.
The SQL patterns in this article handle the vast majority of GA4 analysis scenarios. For edge cases, remember that BigQuery’s nested structure is a feature optimized for exactly this kind of semi-structured event data. Working with it rather than fighting it yields better performance.