GA4 BigQuery Query Patterns

Querying GA4 data in BigQuery efficiently requires understanding a few patterns that differ from standard partitioned table queries. GA4 uses date-sharded tables (not native partitioned tables), stores most analytical data in nested arrays, and exposes cost traps that are easy to hit accidentally.

Always Filter `_TABLE_SUFFIX`

GA4 uses date-sharded tables: separate tables named events_YYYYMMDD rather than a single partitioned table. The wildcard events_* syntax queries all of them at once. Without a _TABLE_SUFFIX filter, you scan your entire history — potentially years of data.

-- Without filter: scans everything, expensive
SELECT event_name, COUNT(*) AS cnt
FROM `project.analytics_123456789.events_*`
GROUP BY event_name

-- With filter: scans only the specified range
SELECT event_name, COUNT(*) AS cnt
FROM `project.analytics_123456789.events_*`
WHERE _TABLE_SUFFIX BETWEEN
  FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
  AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY event_name

The FORMAT_DATE('%Y%m%d', ...) pattern generates the YYYYMMDD string format that matches the table suffix. DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) excludes today’s data, which is still in the intraday table and may not be complete in the daily tables.

For single-day queries:

WHERE _TABLE_SUFFIX = '20260127'

Note the timezone caveat: _TABLE_SUFFIX follows Pacific Time, not your property timezone. See GA4 BigQuery Timezone Handling for the full implications and the pattern of widening your _TABLE_SUFFIX range by one day when working with non-US timezones.

Two UNNEST Approaches

GA4’s event_params, user_properties, and items fields are REPEATED RECORDs — arrays. Extracting data from them requires UNNEST, but there are two distinct patterns with different use cases.

Inline UNNEST (Correlated Subquery)

The inline pattern extracts specific parameters while keeping one row per event:

SELECT
  event_name,
  event_timestamp,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS page_title,
  (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
  user_pseudo_id,
  device.category,
  geo.country
FROM `project.analytics_123456789.events_*`
WHERE _TABLE_SUFFIX = '20260127'

Each subquery reaches into the event_params array, finds the matching key, and returns the value. An event with 20 parameters produces one output row. Use this for most queries where you know which parameters you want.

BigQuery’s query optimizer can collapse multiple correlated subqueries on the same array into a single pass, so extracting 10 parameters isn’t 10x the cost of extracting 1.

FROM Clause UNNEST (Row Expansion)

The FROM clause pattern expands the array into separate rows — one row per array element:

SELECT
  ep.key,
  COUNT(*) AS occurrences
FROM `project.analytics_123456789.events_*`,
  UNNEST(event_params) AS ep
WHERE _TABLE_SUFFIX = '20260127'
GROUP BY ep.key
ORDER BY occurrences DESC

An event with 15 parameters becomes 15 rows. Use this for parameter discovery (what parameters exist and how often do they appear?) and for analysis that operates across the array structure itself.

Warning on multiple arrays: If you UNNEST two arrays from the same table in the FROM clause, you get a Cartesian product. An event with 15 event_params and 3 items becomes 45 rows. This is almost never what you want. When you need data from both arrays, use inline UNNEST for one and FROM clause UNNEST for the other, or use separate CTEs.

Reusable dbt Macros

In a dbt project, parameter extraction repeats constantly. Encoding it in macros reduces repetition and makes the type assumption explicit:

-- macros/ga4_param_string.sql
{% macro ga4_param_string(column, param_name) %}
(SELECT value.string_value FROM UNNEST({{ column }}) WHERE key = '{{ param_name }}')
{% endmacro %}

-- macros/ga4_param_int.sql
{% macro ga4_param_int(column, param_name) %}
(SELECT value.int_value FROM UNNEST({{ column }}) WHERE key = '{{ param_name }}')
{% endmacro %}

Usage in a model:

SELECT
  {{ ga4_param_int('event_params', 'ga_session_id') }} AS session_id,
  {{ ga4_param_string('event_params', 'page_location') }} AS page_location,
  {{ ga4_param_string('event_params', 'page_title') }} AS page_title
FROM {{ source('ga4', 'events') }}
WHERE _TABLE_SUFFIX = '{{ var("run_date") }}'

The macro name signals the expected type. When a parameter’s type is uncertain or historically inconsistent, use a COALESCE variant — see GA4 Event Params Type Detection for that pattern.

Handling Both Daily and Intraday Tables

When your pipeline needs to query against the most recent available data (intraday for today, daily for previous days):

-- Query daily tables for history, intraday for today
SELECT * FROM `project.analytics_123456789.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20260120' AND '20260126'

UNION ALL

SELECT * FROM `project.analytics_123456789.events_intraday_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', CURRENT_DATE())

Remember that intraday tables lack traffic_source, user_ltv, and is_active_user. If your query references these fields, the UNION ALL will produce nulls from the intraday portion. Handle with COALESCE if needed, or document that these fields are only available for completed days.

Cost Control Practices

Avoid SELECT * on nested tables. GA4 events tables contain deeply nested structures. Selecting all columns forces BigQuery to scan the full nested structure even if you only need a few fields. Be explicit about what you need.

Dry run before executing large queries. In BigQuery Console, the bytes processed estimate appears before you run. For queries spanning months of GA4 data, estimate cost (bytes_processed / 1e12 * 5 for on-demand pricing) before committing.

Materialize frequent aggregations. If the same date range query runs multiple times daily (dashboard refreshes, BI tool queries), a scheduled query or incremental dbt model that pre-aggregates the data eliminates repeated scans.

Use APPROX_COUNT_DISTINCT for user counts. When exact precision isn’t required, APPROX_COUNT_DISTINCT(user_pseudo_id) is significantly cheaper than COUNT(DISTINCT user_pseudo_id) at scale. It also matches GA4’s own counting methodology more closely. See GA4 BigQuery Number Discrepancies for why exact counts and GA4 interface counts will never agree perfectly anyway.

Partition your intermediate models. When materializing a base or intermediate GA4 model in dbt, partition by event_date and cluster by session_key or user_pseudo_id. Downstream queries that filter by date or user will scan only the relevant partitions instead of the full table.

{{ config(
    materialized='incremental',
    incremental_strategy='insert_overwrite',
    partition_by={
      'field': 'event_date',
      'data_type': 'date'
    },
    cluster_by=['session_key', 'event_name']
) }}

For the broader BigQuery cost optimization picture, see BigQuery Cost Model and BigQuery Partition Pruning Patterns.

Always Filter _TABLE_SUFFIX