ServicesAboutNotesContact Get in touch →
EN FR
Note

GA4 CROSS JOIN versus LEFT JOIN UNNEST

Why the comma syntax in FROM table, UNNEST(array) silently drops rows — and when to use LEFT JOIN UNNEST to preserve events without array data.

Planted
ga4bigqueryanalyticsdata modeling

The comma syntax in FROM table, UNNEST(array) is syntactic sugar for CROSS JOIN UNNEST. The consequence is that CROSS JOIN silently drops rows where the array is empty or NULL.

This matters more than you’d expect. Not every GA4 event has every array populated. If you’re writing a query that touches events with and without a particular nested array, CROSS JOIN will quietly filter out events you intended to keep.

The Problem in Practice

Consider a query that tries to see both purchases and session starts:

-- Implicit CROSS JOIN: Events without items disappear
SELECT event_name, item.item_name
FROM `project.dataset.events_*`,
UNNEST(items) AS item
WHERE event_name IN ('purchase', 'session_start')

You expect rows for both event types. You only get purchases. session_start events have no items array, so the CROSS JOIN excludes them entirely. No error, no warning — just missing data.

This is a CROSS JOIN doing exactly what CROSS JOIN does: producing the Cartesian product of two sets. When one set is empty, the product is empty. The rows vanish.

LEFT JOIN Preserves All Rows

When you need to keep events regardless of whether the array is populated, use LEFT JOIN UNNEST explicitly:

-- LEFT JOIN: All events appear, NULL for missing items
SELECT event_name, item.item_name
FROM `project.dataset.events_*`
LEFT JOIN UNNEST(items) AS item
WHERE event_name IN ('purchase', 'session_start')

Now session_start events appear with NULL in the item.item_name column. Every event in your WHERE clause is represented in the output.

The syntax difference is small — replace the comma with LEFT JOIN and remove the comma before UNNEST. But the semantic difference is significant.

Which JOIN Type for Which Array

GA4 has three main nested arrays. The right JOIN type depends on which one you’re working with:

event_params: Use CROSS JOIN (comma syntax). Every GA4 event has parameters. An event without event_params would be a data corruption issue, not a normal state. The CROSS JOIN is safe here because the array is never empty in practice.

That said, most event_params extraction uses correlated subqueries rather than FROM clause UNNEST anyway. Correlated subqueries don’t have this JOIN type issue because they operate within the SELECT clause, not the FROM clause.

items: Default to LEFT JOIN. Only e-commerce events (purchase, add_to_cart, view_item, begin_checkout, etc.) populate the items array. If your query filters to a single e-commerce event type, CROSS JOIN is fine — every purchase event should have items. But the moment you mix event types or write a query that might encounter events without items, LEFT JOIN prevents silent row loss.

The items UNNEST pattern in dbt uses CROSS JOIN intentionally because it pre-filters to ecommerce events and includes an ARRAY_LENGTH(items) > 0 guard. That’s the right approach for a dedicated items model. For ad-hoc queries across mixed event types, LEFT JOIN is safer.

user_properties: Default to LEFT JOIN. Not every event carries user properties. If a user has no custom user properties set, the array might be empty. CROSS JOIN would drop those events.

The Mental Model

Think of it this way: if you’re querying a specific array because you want data from that array, CROSS JOIN is appropriate — you’re deliberately filtering to events that have that data. If you’re querying events and optionally enriching with array data, LEFT JOIN is correct — you want all events regardless.

The distinction maps to intent:

  • “Show me product details for purchases” → CROSS JOIN is fine (you only want events that have items)
  • “Show me all events with product details where available” → LEFT JOIN required (you want every event, with items as bonus context)

Combining with Correlated Subqueries

The cleanest pattern for mixed queries combines LEFT JOIN for one array with correlated subqueries for another. This avoids both the silent row loss problem and the Cartesian product problem of unnesting multiple arrays:

-- Safe: One row per item, params extracted via subquery
SELECT
item.item_name,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'transaction_id') AS transaction_id
FROM `project.dataset.events_*`,
UNNEST(items) AS item
WHERE event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'

Here, CROSS JOIN on items is safe because we filter to purchase events. The event_params extraction uses a correlated subquery rather than a second UNNEST in the FROM clause. No Cartesian product, no row loss.

Common Debugging Scenario

If your GA4 query returns fewer rows than expected, check your UNNEST join type first. The symptom — correct query, plausible results, just fewer than expected — makes this hard to catch. You don’t get an error. You get a number that looks reasonable but is wrong.

The quick diagnostic: temporarily replace the comma UNNEST with LEFT JOIN UNNEST. If your row count jumps, you were silently losing rows to empty arrays.