This hub traces the path from a raw GA4 BigQuery export to a working set of dbt models that preserve event-level granularity while making session-level analysis trivial. It’s the entry point for anyone building GA4 dbt models for the first time.
The core idea: don’t aggregate to sessions immediately. Build a wide intermediate model that enriches every event with session context (conversion flags, timing metrics, attribution data) without losing event-level granularity. Aggregation happens only in the final mart layer, when you actually need it.
These notes were decomposed from Your First GA4 dbt Models: From Raw Events to Sessions.
Understanding the GA4 Schema
The GA4 BigQuery schema is structurally different from Universal Analytics. Understanding it is a prerequisite before writing any dbt models.
GA4 Event Data Structure — The event-centric model: one row per event, nested event_params arrays, and the shift from UA’s session-centric architecture. This is the prerequisite for everything else.
GA4 BigQuery Export Table Types — Daily tables vs intraday tables, their timing, and why you should use daily tables for production models.
GA4 BigQuery Timezone Handling — The three timezone contexts (event_timestamp in UTC, event_date in property timezone, _TABLE_SUFFIX in Pacific Time) and why mixing them silently breaks date-range queries.
The Session Identity Problem
GA4 Session Key Construction — Why ga_session_id alone fails as a unique identifier (it’s a Unix timestamp, so multiple users can share the same value) and how to build the correct composite key from user_pseudo_id + ga_session_id.
The Design Philosophy
Event-Grain Sessionization — Why enriching events with session context beats building session-grain tables directly. The pattern that makes downstream analysis flexible: event-sequence funnels, time-between-events metrics, and experimental attribution models all require event-level data.
Building the Models
GA4 Sharded-to-Partitioned Base Model — The base model that converts GA4’s date-sharded export into a properly partitioned dbt model. Covers the _TABLE_SUFFIX filtering pattern, insert_overwrite strategy, and the 3-day lookback for late-arriving events.
GA4 Parameter Extraction Macro — The reusable dbt macro for extracting event_params values via correlated subqueries. Why correlated subqueries over CROSS JOIN UNNEST, and the numeric variant for handling type ambiguity.
GA4 Events Sessionized Model — The intermediate model that adds session context to every event: landing page, traffic source, conversion flags, session duration, event sequencing. The workhorse of the entire project.
GA4 Traffic Source Fields — The four traffic source locations in GA4’s export, their scopes, and which to use for session attribution. Critical for the intermediate model’s attribution columns.
GA4 Acquisition Performance Mart — The mart model that aggregates sessionized events to daily x source/medium grain. Pre-calculated conversion rates and revenue metrics, ready for dashboards.
Common Pitfalls
The tutorial covers several traps that the individual notes address in depth:
- Using
ga_session_idalone → GA4 Session Key Construction - Timezone drift between
event_timestampandevent_date→ GA4 BigQuery Timezone Handling - Querying intraday tables for historical analysis → GA4 BigQuery Export Table Types
- Aggregating too early → Event-Grain Sessionization
- Not filtering with
_TABLE_SUFFIX→ GA4 BigQuery Query Patterns - Extracting the wrong value type from
event_params→ GA4 Parameter Extraction Macro
What Comes Next
This tutorial builds the foundation. For a production-ready project with testing, documentation, and channel grouping, see the GA4 dbt Project Template Hub. For the complete sessionization reference including custom session boundaries, see the GA4 Sessionization Hub.