ServicesAboutNotesContact Get in touch →
EN FR
Note

GA4 First dbt Models Tutorial

Hub note for building your first GA4 dbt models — from understanding the raw event schema through base, intermediate, and mart layers.

Planted
ga4dbtbigquerydata engineeringdata modeling

This hub traces the path from a raw GA4 BigQuery export to a working set of dbt models that preserve event-level granularity while making session-level analysis trivial. It’s the entry point for anyone building GA4 dbt models for the first time.

The core idea: don’t aggregate to sessions immediately. Build a wide intermediate model that enriches every event with session context (conversion flags, timing metrics, attribution data) without losing event-level granularity. Aggregation happens only in the final mart layer, when you actually need it.

These notes were decomposed from Your First GA4 dbt Models: From Raw Events to Sessions.

Understanding the GA4 Schema

The GA4 BigQuery schema is structurally different from Universal Analytics. Understanding it is a prerequisite before writing any dbt models.

GA4 Event Data Structure — The event-centric model: one row per event, nested event_params arrays, and the shift from UA’s session-centric architecture. This is the prerequisite for everything else.

GA4 BigQuery Export Table Types — Daily tables vs intraday tables, their timing, and why you should use daily tables for production models.

GA4 BigQuery Timezone Handling — The three timezone contexts (event_timestamp in UTC, event_date in property timezone, _TABLE_SUFFIX in Pacific Time) and why mixing them silently breaks date-range queries.

The Session Identity Problem

GA4 Session Key Construction — Why ga_session_id alone fails as a unique identifier (it’s a Unix timestamp, so multiple users can share the same value) and how to build the correct composite key from user_pseudo_id + ga_session_id.

The Design Philosophy

Event-Grain Sessionization — Why enriching events with session context beats building session-grain tables directly. The pattern that makes downstream analysis flexible: event-sequence funnels, time-between-events metrics, and experimental attribution models all require event-level data.

Building the Models

GA4 Sharded-to-Partitioned Base Model — The base model that converts GA4’s date-sharded export into a properly partitioned dbt model. Covers the _TABLE_SUFFIX filtering pattern, insert_overwrite strategy, and the 3-day lookback for late-arriving events.

GA4 Parameter Extraction Macro — The reusable dbt macro for extracting event_params values via correlated subqueries. Why correlated subqueries over CROSS JOIN UNNEST, and the numeric variant for handling type ambiguity.

GA4 Events Sessionized Model — The intermediate model that adds session context to every event: landing page, traffic source, conversion flags, session duration, event sequencing. The workhorse of the entire project.

GA4 Traffic Source Fields — The four traffic source locations in GA4’s export, their scopes, and which to use for session attribution. Critical for the intermediate model’s attribution columns.

GA4 Acquisition Performance Mart — The mart model that aggregates sessionized events to daily x source/medium grain. Pre-calculated conversion rates and revenue metrics, ready for dashboards.

Common Pitfalls

The tutorial covers several traps that the individual notes address in depth:

What Comes Next

This tutorial builds the foundation. For a production-ready project with testing, documentation, and channel grouping, see the GA4 dbt Project Template Hub. For the complete sessionization reference including custom session boundaries, see the GA4 Sessionization Hub.