GA4 Flattened Events Materialization

Pre-unnesting GA4 parameters into a flat table reduces repeated query cost and improves BI tool compatibility. The pattern: unnest the parameters needed into a single materialized table, then point all downstream queries at the flat version. For ad-hoc exploration, querying the raw nested tables is sufficient; materialization pays off when multiple consumers query the same data repeatedly.

The Core Pattern

CREATE OR REPLACE TABLE `project.dataset.ga4_events_flat`
PARTITION BY event_date
CLUSTER BY event_name, session_id
AS (
  SELECT
    PARSE_DATE('%Y%m%d', event_date) AS event_date,
    TIMESTAMP_MICROS(event_timestamp) AS event_timestamp,
    event_name,
    user_pseudo_id,
    CONCAT(
      user_pseudo_id, '-',
      (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
    ) AS session_id,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS page_title,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS engagement_time_msec,
    traffic_source.source,
    traffic_source.medium,
    traffic_source.name AS campaign,
    device.category AS device_category,
    device.operating_system,
    device.web_info.browser,
    geo.country,
    geo.city,
    ecommerce.transaction_id,
    ecommerce.purchase_revenue
  FROM `project.dataset.events_*`
  WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY))
);

This creates a partitioned, clustered table with the most commonly queried parameters extracted as flat columns. Downstream queries against ga4_events_flat are standard SQL — no UNNEST, no correlated subqueries, no nested field access.

Why This Works

The cost savings compound because:

The UNNEST happens once. Ten dashboard panels hitting the same date range don’t each pay the nested query cost. They query flat columns.
Partitioning enables pruning. Queries filtered by event_date scan only the relevant partitions, same as native partitioned tables.
Clustering on event_name and session_id accelerates the most common query patterns: filtering by event type and grouping by session.
BI tools work better with flat schemas. Looker Studio, Metabase, and most BI tools struggle with nested/repeated fields. A flat table eliminates the need for custom SQL in every visualization.

The tradeoff is storage. You’re duplicating data that already exists in the raw tables. For 90 days of a medium-traffic site, expect the flat table to be 30-50% the size of the raw tables (you’re extracting a subset of parameters, not all of them). At BigQuery’s storage pricing, this is usually negligible compared to the query cost savings.

Scheduling the Refresh

The CREATE OR REPLACE approach above rebuilds the entire 90-day window every run. For a daily scheduled query, this works but isn’t optimal — you’re reprocessing 89 days of unchanged data.

Better: use INSERT with a date filter for daily appends, plus a periodic full rebuild to catch late-arriving data:

-- Daily append (run daily via scheduled query or Cloud Scheduler)
INSERT INTO `project.dataset.ga4_events_flat`
SELECT
  PARSE_DATE('%Y%m%d', event_date) AS event_date,
  TIMESTAMP_MICROS(event_timestamp) AS event_timestamp,
  -- ... same columns as above ...
FROM `project.dataset.events_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

The daily append processes only yesterday’s data. Run a weekly full rebuild (DELETE + INSERT for the last 90 days) to handle GA4’s 72-hour late data processing window and any schema corrections.

The dbt Version

This exact pattern is what a dbt incremental model formalizes. The base layer in a dbt project does the same thing with better guardrails:

{{
  config(
    materialized='incremental',
    partition_by={'field': 'event_date', 'data_type': 'date'},
    incremental_strategy='insert_overwrite',
    cluster_by=['event_name']
  )
}}

The insert_overwrite strategy replaces entire partitions, which handles late-arriving data cleanly. The lookback window (typically 3 days) re-processes recent partitions to catch data that arrived after the initial export. This is more robust than the manual CREATE OR REPLACE approach and integrates with the rest of your dbt DAG.

If you’re already running dbt, use an incremental model. If you’re not running dbt, the scheduled query approach works. The key insight is the same either way: unnest once, query flat.

What to Include in the Flat Table

Be selective. Include parameters you query frequently; exclude parameters you rarely need. The goal isn’t to flatten everything — it’s to flatten the 80% of parameters that drive 95% of your queries.

Common candidates:

page_location, page_title, page_referrer (content analysis)
ga_session_id, session_engaged, engagement_time_msec (session analysis)
entrances (landing page identification)
Device, geo, and traffic source fields (segmentation)
ecommerce.transaction_id, ecommerce.purchase_revenue (e-commerce)

Keep the original event_params and items arrays in the flat table if you need occasional access to parameters you didn’t extract. This adds storage cost but prevents the need to join back to the raw tables for edge-case queries.

If you’re in a dbt project, this is the base model’s job. The intermediate and mart layers then build on the flat base without ever touching nested arrays again.

When NOT to Materialize

For one-off analysis or exploration, querying the raw nested tables directly is fine. The materialization overhead (storage, scheduling, maintenance) only pays off when:

Multiple consumers query the same data repeatedly
BI tools connect directly to the table
Query costs have become a line item worth optimizing
You’re building a multi-layer pipeline where upstream flattening benefits every downstream model

If you’re the only analyst and you run ad-hoc queries a few times a week, the raw tables with good _TABLE_SUFFIX filtering are sufficient. Materialize when repeated cost or BI tool compatibility forces the issue.