Adrienne Vermorel

Unnesting GA4 Events: Patterns for Every Use Case

The first time you query GA4 data in BigQuery, you write something like SELECT page_location FROM events and get an error. There is no page_location column. The data you need is buried inside an array called event_params, and extracting it requires a SQL pattern that most analysts have never encountered: UNNEST.

This article provides production-ready SQL patterns for every common GA4 scenario, from basic page views to complex e-commerce funnels, plus dbt model templates to operationalize these patterns at scale. Whether you read the schema reference in Article 1 or you’re starting here, you’ll leave with copy-paste queries that actually work.

Why GA4 data requires unnesting

GA4 stores event parameters in repeated RECORD fields rather than flat columns. A single page_view event doesn’t have a page_location column. Instead, it has an event_params array containing key-value pairs like {key: 'page_location', value: {string_value: 'https://example.com/products'}}.

This design choice wasn’t arbitrary. Nested structures allow GA4 to store any number of custom parameters without schema changes, and BigQuery’s columnar storage processes nested data more efficiently than equivalent flat tables.

Three nested arrays appear in every GA4 export:

  • event_params: All event-level parameters (page_location, ga_session_id, custom params)
  • user_properties: User-scoped custom dimensions set via gtag or Firebase
  • items: Product data for e-commerce events (item_name, item_id, price, quantity)

Each parameter stores its value in one of four typed fields: string_value, int_value, float_value, or double_value. Only one field is populated based on the detected data type. Query the wrong one and you get NULL.

The two UNNEST approaches every analyst must know

There are exactly two ways to extract data from nested arrays in BigQuery. Choosing the wrong one is the most common source of incorrect GA4 metrics.

Correlated subqueries preserve row integrity

The recommended approach for extracting specific parameters uses a subquery in the SELECT clause:

SELECT
event_name,
event_timestamp,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_location') AS page_location,
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id') AS ga_session_id
FROM `project.dataset.events_*`
WHERE event_name = 'page_view'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'

This pattern maintains one row per event regardless of how many parameters exist. An event with 20 parameters still produces exactly one output row. The subquery reaches into the array, finds the matching key, and returns the value.

Use correlated subqueries when you know which parameters you need and want to preserve the event-level grain of your data.

FROM clause UNNEST expands arrays into rows

When you need to analyze the parameters themselves (counting occurrences, discovering what parameters exist, or pivoting array contents), use UNNEST in the FROM clause:

SELECT
ep.key AS parameter_name,
COUNT(*) AS occurrences
FROM `project.dataset.events_*`,
UNNEST(event_params) AS ep
WHERE _TABLE_SUFFIX = '20240115'
GROUP BY parameter_name
ORDER BY occurrences DESC

This pattern creates one row per array element. An event with 15 parameters becomes 15 rows, each exposing one parameter’s key and value fields.

The row multiplication is the point, since you’re deliberately expanding the array to analyze its contents. This same behavior causes errors when misapplied.

The decision is simple

Ask yourself: “Do I need to analyze the array structure itself, or do I just need specific values?”

If you need specific values, use a correlated subquery. If you need to examine or aggregate across array elements, use FROM clause UNNEST. When in doubt, correlated subqueries are safer.

CROSS JOIN versus LEFT JOIN UNNEST

The comma syntax in FROM table, UNNEST(array) is shorthand for CROSS JOIN. This matters because CROSS JOIN silently drops rows where the array is empty or NULL.

-- Implicit CROSS JOIN: Events without items disappear
SELECT event_name, item.item_name
FROM `project.dataset.events_*`,
UNNEST(items) AS item
WHERE event_name IN ('purchase', 'session_start')

If you run this query expecting to see both purchases and session starts, you’ll only see purchases. Session start events have no items, so CROSS JOIN excludes them entirely.

Use LEFT JOIN when you need to preserve all rows:

-- LEFT JOIN: All events appear, NULL for missing items
SELECT event_name, item.item_name
FROM `project.dataset.events_*`
LEFT JOIN UNNEST(items) AS item
WHERE event_name IN ('purchase', 'session_start')

For event_params, CROSS JOIN is almost always correct since every event has parameters. For items and user_properties, default to LEFT JOIN unless you specifically want to filter to only events containing those arrays.

E-commerce patterns

E-commerce analysis in GA4 requires understanding which data lives in the ecommerce RECORD (pre-aggregated order totals) versus the items array (individual product details).

Order-level metrics without UNNEST

The ecommerce field contains transaction totals that don’t require unnesting:

SELECT
PARSE_DATE('%Y%m%d', event_date) AS order_date,
COUNT(DISTINCT ecommerce.transaction_id) AS transactions,
SUM(ecommerce.purchase_revenue) AS revenue,
SUM(ecommerce.total_item_quantity) AS items_sold,
SAFE_DIVIDE(
SUM(ecommerce.purchase_revenue),
COUNT(DISTINCT ecommerce.transaction_id)
) AS average_order_value
FROM `project.dataset.events_*`
WHERE event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY order_date
ORDER BY order_date

This query processes millions of purchase events efficiently because it never touches the nested arrays.

Item-level product performance

When you need product-level metrics, UNNEST the items array:

SELECT
item.item_id,
item.item_name,
item.item_category,
COUNT(DISTINCT ecommerce.transaction_id) AS orders_containing_item,
SUM(item.quantity) AS units_sold,
SUM(item.item_revenue) AS product_revenue,
SAFE_DIVIDE(SUM(item.item_revenue), SUM(item.quantity)) AS avg_unit_price
FROM `project.dataset.events_*`,
UNNEST(items) AS item
WHERE event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY item.item_id, item.item_name, item.item_category
ORDER BY product_revenue DESC

Notice that revenue comes from item.item_revenue, not ecommerce.purchase_revenue. If you sum ecommerce.purchase_revenue after unnesting items, you multiply the order total by the number of items in each order.

Add-to-cart and product view analysis

The same items array appears in add_to_cart, view_item, and other e-commerce events:

SELECT
item.item_name,
item.item_category,
COUNTIF(event_name = 'view_item') AS product_views,
COUNTIF(event_name = 'add_to_cart') AS add_to_carts,
COUNTIF(event_name = 'purchase') AS purchases,
SAFE_DIVIDE(
COUNTIF(event_name = 'add_to_cart'),
COUNTIF(event_name = 'view_item')
) AS view_to_cart_rate,
SAFE_DIVIDE(
COUNTIF(event_name = 'purchase'),
COUNTIF(event_name = 'add_to_cart')
) AS cart_to_purchase_rate
FROM `project.dataset.events_*`,
UNNEST(items) AS item
WHERE event_name IN ('view_item', 'add_to_cart', 'purchase')
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY item.item_name, item.item_category
HAVING product_views > 100
ORDER BY product_views DESC

Complete checkout funnel

Session-based funnel analysis requires extracting ga_session_id and combining it with user_pseudo_id to create a truly unique session identifier:

WITH funnel_events AS (
SELECT
CONCAT(
user_pseudo_id, '-',
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
) AS session_id,
event_name,
event_timestamp
FROM `project.dataset.events_*`
WHERE event_name IN (
'view_item', 'add_to_cart', 'begin_checkout',
'add_shipping_info', 'add_payment_info', 'purchase'
)
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
)
SELECT
COUNT(DISTINCT CASE WHEN event_name = 'view_item' THEN session_id END) AS sessions_viewed_item,
COUNT(DISTINCT CASE WHEN event_name = 'add_to_cart' THEN session_id END) AS sessions_added_to_cart,
COUNT(DISTINCT CASE WHEN event_name = 'begin_checkout' THEN session_id END) AS sessions_began_checkout,
COUNT(DISTINCT CASE WHEN event_name = 'add_shipping_info' THEN session_id END) AS sessions_added_shipping,
COUNT(DISTINCT CASE WHEN event_name = 'add_payment_info' THEN session_id END) AS sessions_added_payment,
COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN session_id END) AS sessions_purchased,
SAFE_DIVIDE(
COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN session_id END),
COUNT(DISTINCT CASE WHEN event_name = 'view_item' THEN session_id END)
) AS overall_conversion_rate
FROM funnel_events

Engagement event patterns

Beyond e-commerce, GA4 tracks dozens of engagement events. Each requires extracting specific parameters.

Page views with clean URL parsing

SELECT
PARSE_DATE('%Y%m%d', event_date) AS event_date,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_location') AS page_url,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_title') AS page_title,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_referrer') AS referrer,
-- Extract clean path without protocol or query params
REGEXP_REPLACE(
REGEXP_REPLACE(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'),
r'^https?://[^/]+', ''
),
r'[\?#].*$', ''
) AS page_path,
-- Identify landing pages
IFNULL(
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'entrances'),
0
) = 1 AS is_landing_page
FROM `project.dataset.events_*`
WHERE event_name = 'page_view'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'

Session metrics with proper session keys

The ga_session_id parameter is a Unix timestamp that identifies a session, but it’s only unique within a single user. Different users can have the same ga_session_id if they started sessions at the same second. Always concatenate with user_pseudo_id:

WITH session_events AS (
SELECT
CONCAT(
user_pseudo_id, '-',
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
) AS session_id,
user_pseudo_id,
event_timestamp,
event_name,
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'engagement_time_msec') AS engagement_time_msec,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'session_engaged') AS session_engaged,
traffic_source.source,
traffic_source.medium,
device.category AS device_category,
geo.country
FROM `project.dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
)
SELECT
session_id,
user_pseudo_id,
MIN(TIMESTAMP_MICROS(event_timestamp)) AS session_start,
MAX(TIMESTAMP_MICROS(event_timestamp)) AS session_end,
TIMESTAMP_DIFF(
MAX(TIMESTAMP_MICROS(event_timestamp)),
MIN(TIMESTAMP_MICROS(event_timestamp)),
SECOND
) AS session__duration_seconds,
COUNT(*) AS session__events,
COUNTIF(event_name = 'page_view') AS session__pageviews,
SUM(engagement_time_msec) / 1000 AS session__engagement_seconds,
MAX(session_engaged) = '1' AS session__is_engaged,
ANY_VALUE(source) AS source,
ANY_VALUE(medium) AS medium,
ANY_VALUE(device_category) AS device_category,
ANY_VALUE(country) AS country
FROM session_events
GROUP BY session_id, user_pseudo_id

Scroll depth analysis

The scroll event fires when users reach 90% scroll depth (by default):

SELECT
REGEXP_REPLACE(
REGEXP_REPLACE(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'),
r'^https?://[^/]+', ''
),
r'[\?#].*$', ''
) AS page_path,
COUNT(*) AS scroll_events,
COUNT(DISTINCT CONCAT(
user_pseudo_id, '-',
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
)) AS sessions_with_scroll
FROM `project.dataset.events_*`
WHERE event_name = 'scroll'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY page_path
ORDER BY scroll_events DESC

To calculate scroll rate, join with page view counts:

WITH page_views AS (
SELECT
REGEXP_REPLACE(
REGEXP_REPLACE(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'),
r'^https?://[^/]+', ''
),
r'[\?#].*$', ''
) AS page_path,
COUNT(*) AS pageviews
FROM `project.dataset.events_*`
WHERE event_name = 'page_view'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY page_path
),
scrolls AS (
SELECT
REGEXP_REPLACE(
REGEXP_REPLACE(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'),
r'^https?://[^/]+', ''
),
r'[\?#].*$', ''
) AS page_path,
COUNT(*) AS scroll_events
FROM `project.dataset.events_*`
WHERE event_name = 'scroll'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY page_path
)
SELECT
pv.page_path,
pv.pageviews,
IFNULL(s.scroll_events, 0) AS scroll_events,
SAFE_DIVIDE(s.scroll_events, pv.pageviews) AS scroll_rate
FROM page_views pv
LEFT JOIN scrolls s ON pv.page_path = s.page_path
WHERE pv.pageviews > 100
ORDER BY scroll_rate DESC
SELECT
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'link_url') AS link_url,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'link_domain') AS link_domain,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'outbound') AS is_outbound,
COUNT(*) AS clicks
FROM `project.dataset.events_*`
WHERE event_name = 'click'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY link_url, link_domain, is_outbound
ORDER BY clicks DESC

File download tracking

SELECT
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'file_name') AS file_name,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'file_extension') AS file_extension,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'link_url') AS download_url,
COUNT(*) AS downloads,
COUNT(DISTINCT user_pseudo_id) AS unique_users
FROM `project.dataset.events_*`
WHERE event_name = 'file_download'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY file_name, file_extension, download_url
ORDER BY downloads DESC

Video engagement funnel

SELECT
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'video_title') AS video_title,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'video_provider') AS video_provider,
COUNTIF(event_name = 'video_start') AS starts,
COUNTIF(event_name = 'video_progress'
AND (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'video_percent') = 25
) AS reached_25_pct,
COUNTIF(event_name = 'video_progress'
AND (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'video_percent') = 50
) AS reached_50_pct,
COUNTIF(event_name = 'video_progress'
AND (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'video_percent') = 75
) AS reached_75_pct,
COUNTIF(event_name = 'video_complete') AS completions,
SAFE_DIVIDE(
COUNTIF(event_name = 'video_complete'),
COUNTIF(event_name = 'video_start')
) AS completion_rate
FROM `project.dataset.events_*`
WHERE event_name IN ('video_start', 'video_progress', 'video_complete')
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY video_title, video_provider
HAVING starts > 10
ORDER BY starts DESC

Custom parameters and type coercion

GA4 parameters can arrive in different value types depending on how they were sent. A parameter configured as a number might be stored as int_value in some events and double_value in others.

Universal parameter extraction with COALESCE

For string output (safe for any parameter type):

SELECT
(SELECT COALESCE(
value.string_value,
CAST(value.int_value AS STRING),
CAST(value.double_value AS STRING),
CAST(value.float_value AS STRING)
)
FROM UNNEST(event_params)
WHERE key = 'custom_dimension'
) AS custom_dimension
FROM `project.dataset.events_*`
WHERE _TABLE_SUFFIX = '20240115'

For numeric calculations:

SELECT
(SELECT COALESCE(
value.double_value,
CAST(value.int_value AS FLOAT64),
value.float_value
)
FROM UNNEST(event_params)
WHERE key = 'custom_metric'
) AS custom_metric
FROM `project.dataset.events_*`
WHERE _TABLE_SUFFIX = '20240115'

User properties extraction

The user_properties array follows the same structure as event_params:

SELECT
user_pseudo_id,
(SELECT value.string_value
FROM UNNEST(user_properties)
WHERE key = 'customer_tier') AS customer_tier,
(SELECT value.int_value
FROM UNNEST(user_properties)
WHERE key = 'lifetime_purchases') AS lifetime_purchases,
COUNT(*) AS events
FROM `project.dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY user_pseudo_id, customer_tier, lifetime_purchases

Custom item parameters

Since October 2023, GA4 exports custom item-scoped parameters in a nested item_params array within each item. This requires double UNNEST:

SELECT
item.item_name,
item.item_id,
(SELECT value.string_value
FROM UNNEST(item.item_params)
WHERE key = 'color') AS item_color,
(SELECT value.string_value
FROM UNNEST(item.item_params)
WHERE key = 'size') AS item_size,
SUM(item.quantity) AS units_sold
FROM `project.dataset.events_*`,
UNNEST(items) AS item
WHERE event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY item.item_name, item.item_id, item_color, item_size
ORDER BY units_sold DESC

Reusable temporary functions

For complex queries extracting many parameters, CREATE TEMP FUNCTION reduces repetition:

CREATE TEMP FUNCTION GetParamString(params ANY TYPE, param_key STRING) AS (
(SELECT value.string_value FROM UNNEST(params) WHERE key = param_key)
);
CREATE TEMP FUNCTION GetParamInt(params ANY TYPE, param_key STRING) AS (
(SELECT value.int_value FROM UNNEST(params) WHERE key = param_key)
);
CREATE TEMP FUNCTION GetParamDouble(params ANY TYPE, param_key STRING) AS (
(SELECT COALESCE(value.double_value, CAST(value.int_value AS FLOAT64))
FROM UNNEST(params) WHERE key = param_key)
);
-- Now queries become more readable
SELECT
event_name,
GetParamString(event_params, 'page_location') AS page_location,
GetParamInt(event_params, 'ga_session_id') AS ga_session_id,
GetParamDouble(event_params, 'value') AS event_value
FROM `project.dataset.events_*`
WHERE event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'

Performance optimization strategies

GA4 tables grow large quickly. A medium-traffic site generates millions of events per month, and careless queries can scan terabytes and cost hundreds of dollars.

Always filter on _TABLE_SUFFIX

GA4 exports create daily sharded tables (events_20240115, events_20240116, etc.). The _TABLE_SUFFIX pseudo-column enables partition pruning. Without it, BigQuery scans every table in the dataset:

-- ❌ Scans ALL historical data
SELECT COUNT(*)
FROM `project.dataset.events_*`
WHERE event_name = 'purchase'
-- ✅ Scans only January 2024
SELECT COUNT(*)
FROM `project.dataset.events_*`
WHERE event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'

Avoid Cartesian products from multiple UNNEST

Unnesting multiple arrays in the same query creates a Cartesian product (one event with 15 parameters and 3 items becomes 45 rows):

-- ❌ Dangerous: 15 params × 3 items = 45 rows per event
SELECT ep.key, item.item_name
FROM `project.dataset.events_*`,
UNNEST(event_params) AS ep,
UNNEST(items) AS item
WHERE event_name = 'purchase'

If you need data from multiple arrays, use correlated subqueries or process arrays in separate CTEs:

-- ✅ Safe: One row per item, params extracted via subquery
SELECT
item.item_name,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'transaction_id') AS transaction_id
FROM `project.dataset.events_*`,
UNNEST(items) AS item
WHERE event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'

When to materialize intermediate tables

For dashboards or repeated queries, pre-unnesting into a flat table can dramatically reduce costs:

CREATE OR REPLACE TABLE `project.dataset.ga4_events_flat`
PARTITION BY event_date
CLUSTER BY event_name, session_id
AS (
SELECT
PARSE_DATE('%Y%m%d', event_date) AS event_date,
TIMESTAMP_MICROS(event_timestamp) AS event_timestamp,
event_name,
user_pseudo_id,
CONCAT(
user_pseudo_id, '-',
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
) AS session_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS page_title,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS engagement_time_msec,
traffic_source.source,
traffic_source.medium,
traffic_source.name AS campaign,
device.category AS device_category,
device.operating_system,
device.web_info.browser,
geo.country,
geo.city,
ecommerce.transaction_id,
ecommerce.purchase_revenue
FROM `project.dataset.events_*`
WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY))
);

Schedule this to run daily, appending new data, and downstream queries against the flat table run faster and cost less.

dbt model patterns

The SQL patterns above translate directly into dbt models. A well-structured dbt project separates concerns into base, intermediate, and mart layers.

Base layer: unnesting

The base model unnests common parameters while preserving the original arrays for downstream use:

-- models/base/ga4/base__ga4_bigquery__events.sql
{{
config(
materialized='incremental',
partition_by={'field': 'event_date', 'data_type': 'date'},
incremental_strategy='insert_overwrite',
cluster_by=['event_name']
)
}}
WITH source AS (
SELECT
event_date,
event_timestamp,
event_name,
event_params,
user_pseudo_id,
user_properties,
traffic_source,
device,
geo,
ecommerce,
items
FROM {{ source('ga4', 'events') }}
WHERE TRUE
{% if is_incremental() %}
AND PARSE_DATE('%Y%m%d', _TABLE_SUFFIX) >= DATE_SUB(
(SELECT MAX(event_date) FROM {{ this }}),
INTERVAL 3 DAY
)
{% else %}
AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY))
{% endif %}
)
SELECT
-- Primary keys
CONCAT(
user_pseudo_id, '-',
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'),
'-',
event_timestamp
) AS event__key,
CONCAT(
user_pseudo_id, '-',
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
) AS session__key,
-- Timestamps
PARSE_DATE('%Y%m%d', event_date) AS event__date,
TIMESTAMP_MICROS(event_timestamp) AS event__timestamp,
-- Event identification
event_name AS event__name,
user_pseudo_id AS user__pseudo_id,
-- Common parameters (unnested)
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS event__ga_session_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS event__page_location,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS event__page_title,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_referrer') AS event__page_referrer,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS event__engagement_time_msec,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') AS event__session_engaged,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'entrances') AS event__entrances,
-- Traffic source
traffic_source.source AS event__traffic_source,
traffic_source.medium AS event__traffic_medium,
traffic_source.name AS event__traffic_campaign,
-- Device and geo
device.category AS event__device_category,
device.operating_system AS event__operating_system,
device.web_info.browser AS event__browser,
geo.country AS event__country,
geo.city AS event__city,
-- E-commerce (keep nested for intermediate layer)
ecommerce,
items,
-- Keep original arrays for custom extraction
event_params,
user_properties
FROM source

Reusable macro for parameter extraction

-- macros/unnest_event_param.sql
{% macro unnest_event_param(param_key, value_type='string') %}
(SELECT value.{{ value_type }}_value
FROM UNNEST(event_params)
WHERE key = '{{ param_key }}')
{% endmacro %}

Usage in models:

SELECT
{{ unnest_event_param('page_location', 'string') }} AS event__page_location,
{{ unnest_event_param('ga_session_id', 'int') }} AS event__ga_session_id,
{{ unnest_event_param('value', 'double') }} AS event__value
FROM {{ source('ga4', 'events') }}

Intermediate layer: event-specific models

Break out specific event types for easier downstream consumption:

-- models/intermediate/ga4/int__ga4_events__page_views.sql
SELECT
event__key,
session__key,
event__date,
event__timestamp,
user__pseudo_id,
event__page_location,
event__page_title,
event__page_referrer,
REGEXP_REPLACE(
REGEXP_REPLACE(event__page_location, r'^https?://[^/]+', ''),
r'[\?#].*$', ''
) AS event__page_path,
IFNULL(event__entrances, 0) = 1 AS event__is_landing_page,
event__traffic_source,
event__traffic_medium,
event__device_category,
event__country
FROM {{ ref('base__ga4_bigquery__events') }}
WHERE event__name = 'page_view'
-- models/intermediate/ga4/int__ga4_events__purchases.sql
SELECT
event__key,
session__key,
event__date,
event__timestamp,
user__pseudo_id,
ecommerce.transaction_id AS purchase__transaction_id,
ecommerce.purchase_revenue AS purchase__revenue,
ecommerce.total_item_quantity AS purchase__total_item_quantity,
ecommerce.shipping_value AS purchase__shipping_value,
ecommerce.tax_value AS purchase__tax_value,
event__traffic_source,
event__traffic_medium,
event__device_category,
event__country,
items
FROM {{ ref('base__ga4_bigquery__events') }}
WHERE event__name = 'purchase'
AND ecommerce.transaction_id IS NOT NULL
-- models/intermediate/ga4/int__ga4_events__purchase_items.sql
SELECT
e.event__key,
e.session__key,
e.event__date,
e.ecommerce.transaction_id AS purchase__transaction_id,
item.item_id AS item__id,
item.item_name AS item__name,
item.item_brand AS item__brand,
item.item_category AS item__category,
item.item_category2 AS item__category2,
item.item_category3 AS item__category3,
item.price AS item__price,
item.quantity AS item__quantity,
item.item_revenue AS item__revenue
FROM {{ ref('base__ga4_bigquery__events') }} AS e,
UNNEST(items) AS item
WHERE event__name = 'purchase'

Mart layer: analytics-ready aggregations

-- models/marts/ga4/mrt__analytics__sessions.sql
{{
config(
materialized='table',
partition_by={'field': 'session__date', 'data_type': 'date'},
cluster_by=['session__traffic_source', 'session__traffic_medium']
)
}}
SELECT
session__key,
ANY_VALUE(user__pseudo_id) AS user__pseudo_id,
MIN(event__date) AS session__date,
MIN(event__timestamp) AS session__started_at,
MAX(event__timestamp) AS session__ended_at,
TIMESTAMP_DIFF(MAX(event__timestamp), MIN(event__timestamp), SECOND) AS session__duration_seconds,
COUNT(*) AS session__events,
COUNTIF(event__name = 'page_view') AS session__pageviews,
SUM(event__engagement_time_msec) / 1000 AS session__engagement_seconds,
MAX(event__session_engaged) = '1' AS session__is_engaged,
COUNTIF(event__name = 'purchase') > 0 AS session__is_converted,
SUM(CASE WHEN event__name = 'purchase' THEN ecommerce.purchase_revenue ELSE 0 END) AS session__revenue,
ANY_VALUE(event__traffic_source) AS session__traffic_source,
ANY_VALUE(event__traffic_medium) AS session__traffic_medium,
ANY_VALUE(event__traffic_campaign) AS session__traffic_campaign,
ANY_VALUE(event__device_category) AS session__device_category,
ANY_VALUE(event__country) AS session__country
FROM {{ ref('base__ga4_bigquery__events') }}
GROUP BY session__key

Testing strategies

Add tests to catch common UNNEST errors:

models/base/ga4/schema.yml
version: 2
models:
- name: base__ga4_bigquery__events
columns:
- name: event__key
tests:
- unique
- not_null
- name: session__key
tests:
- not_null
- name: int__ga4_events__purchases
columns:
- name: purchase__transaction_id
tests:
- unique
- not_null

Custom test for revenue accuracy:

-- tests/assert_revenue_not_multiplied.sql
-- This test fails if item-level revenue doesn't match order-level revenue
WITH order_revenue AS (
SELECT
purchase__transaction_id,
purchase__revenue
FROM {{ ref('int__ga4_events__purchases') }}
),
item_revenue AS (
SELECT
purchase__transaction_id,
SUM(item__revenue) AS total_item_revenue
FROM {{ ref('int__ga4_events__purchase_items') }}
GROUP BY purchase__transaction_id
)
SELECT
o.purchase__transaction_id,
o.purchase__revenue,
i.total_item_revenue,
ABS(o.purchase__revenue - i.total_item_revenue) AS difference
FROM order_revenue AS o
JOIN item_revenue AS i ON o.purchase__transaction_id = i.purchase__transaction_id
WHERE ABS(o.purchase__revenue - i.total_item_revenue) > 0.01

Conclusion

Unnesting GA4 data follows predictable patterns once you internalize two core concepts: correlated subqueries for extracting specific parameters without row explosion, and explicit JOIN type awareness for arrays that may be empty.

The SQL patterns in this article handle the vast majority of GA4 analysis scenarios. For edge cases, remember that BigQuery’s nested structure is a feature optimized for exactly this kind of semi-structured event data. Working with it rather than fighting it yields better performance.