ServicesAboutNotesContact Get in touch →
EN FR
Note

GA4 Ecommerce Items UNNEST Pattern

How to handle GA4's nested items array in dbt — building a separate item-level grain model with intentional Cartesian UNNEST.

Planted
ga4dbtbigquerydata modelinganalytics

GA4 ecommerce data lives in the items field — a repeated RECORD containing one element per product in a given event. A single purchase event for a three-item cart produces three elements in the items array. Building accurate ecommerce analysis requires expanding this array into a separate item-level grain model.

Why a Separate Model

The rest of your GA4 pipeline works at event grain: one row per event. The sessionized events table maintains this grain carefully — correlated subqueries for event_params specifically because they don’t expand rows.

items is different. You genuinely need one row per item per event. A purchase with three products needs three rows to track each product’s revenue, quantity, and category correctly. Keeping items embedded in the event-grain table isn’t feasible — you’d need to aggregate them into arrays or denormalized columns, losing the ability to query by product.

The solution is a dedicated intermediate model at item-level grain, materialized as a view (cheap because it references an already-materialized incremental base model).

The Model

-- models/intermediate/ga4/int__ga4__event_items.sql
{{
config(
materialized='view'
)
}}
WITH events_with_items AS (
SELECT
event__key,
event__date,
event__timestamp_utc,
user__pseudo_id,
session__key,
event__name,
transaction__id,
items__array
FROM {{ ref('base__ga4__events') }}
WHERE event__name IN ('purchase', 'add_to_cart', 'remove_from_cart',
'view_item', 'begin_checkout', 'add_payment_info')
AND ARRAY_LENGTH(items__array) > 0
)
SELECT
e.event__key,
e.event__date,
e.event__timestamp_utc,
e.user__pseudo_id,
e.session__key,
e.event__name,
e.transaction__id,
-- Item details
item.item_id AS item__id,
item.item_name AS item__name,
item.item_brand AS item__brand,
item.item_category AS item__category,
item.item_category2 AS item__category2,
item.item_category3 AS item__category3,
item.item_variant AS item__variant,
item.price AS item__price,
item.quantity AS item__quantity,
item.coupon AS item__coupon,
item.item_list_name AS item__list_name,
item.item_list_index AS item__list_index,
-- Calculate item revenue
COALESCE(item.price, 0) * COALESCE(item.quantity, 1) AS item__revenue
FROM events_with_items e,
UNNEST(e.items__array) AS item

This model intentionally uses the Cartesian UNNEST pattern (FROM events_with_items e, UNNEST(e.items__array) AS item) rather than a correlated subquery. The row expansion is exactly what you want: one row per item per event.

The Event Filter

Not all GA4 events carry item data. The filter event__name IN ('purchase', 'add_to_cart', ...) does two things:

  1. Reduces the input size significantly — only ecommerce events have items
  2. Makes the model’s purpose explicit

The ARRAY_LENGTH(items__array) > 0 guard prevents UNNEST from producing rows with a null item when the array is empty. Some GA4 implementations send ecommerce events without item arrays — this prevents those from appearing in the model as rows with all-null item fields.

Connecting Items Back to Sessions

The item model preserves event__key and session__key from the base model. This makes joins natural:

-- Item revenue by session channel
SELECT
s.session__channel_grouping,
SUM(i.item__revenue) AS total_revenue,
COUNT(DISTINCT i.transaction__id) AS transactions
FROM mrt__analytics__sessions s
JOIN int__ga4__event_items i
ON s.session__key = i.session__key
WHERE i.event__name = 'purchase'
GROUP BY 1
ORDER BY 2 DESC

The sessionized events model aggregates item data back to event level:

-- In int__ga4__events_sessionized
event_items AS (
SELECT
event__key,
COUNT(*) AS event__items,
SUM(item__revenue) AS event__items_revenue
FROM {{ ref('int__ga4__event_items') }}
GROUP BY event__key
)

This pre-aggregation avoids repeated joins downstream. The sessionized table carries event__items and event__items_revenue as simple integer and float columns on every purchase event — no join required for standard ecommerce metrics.

Item Categories in GA4

GA4 supports a five-level product category hierarchy (item_category through item_category5). In practice, most implementations only use the first two or three levels. The model extracts all three meaningful levels; adjust based on what your tracking implementation actually populates.

Since October 2023, GA4 also supports item_params — a nested repeated field inside each item for custom product dimensions. This adds another layer of UNNEST complexity if your implementation uses custom item parameters. Handle it with a separate extraction CTE or an additional model if needed.

Materialization Choice: View

The item model is a view, not an incremental table. This is intentional:

  • Storage cost avoided: The base model is already materialized as an incremental table. A view on top adds no storage cost.
  • Freshness automatic: The view always reflects the latest state of the base model. No incremental logic needed.
  • Query cost acceptable: Ecommerce item queries are typically run on bounded date ranges. BigQuery’s partition pruning on the base model limits scan costs regardless of the view layer.

If item-level queries become expensive at scale, materialize the view as an incremental table with the same lookback pattern as the base model. But start with a view — the simplicity is worth it.