ServicesAboutNotesContact Get in touch →
EN FR
Note

GA4 Sessionization Hub

Hub note connecting all concepts involved in building session tables from GA4 BigQuery event data.

Planted
ga4bigquerydbtdata modelinganalytics

Sessionization is the process of adding session context to GA4’s raw event data. Unlike Universal Analytics, which provided pre-aggregated session tables, GA4 exports events with no session-level structure. The session definition must be constructed in the transformation layer.

This hub connects the concepts involved in building a sessionized events table from GA4 BigQuery exports.

Core Concepts

Event-Grain Sessionization — The design philosophy: enrich events with session context instead of aggregating events into sessions. Preserves event-level detail while making session analysis trivial. The central pattern for the sessionized events model.

GA4 Session Key Construction — Why ga_session_id alone fails as a session identifier and how to build the correct composite key from user_pseudo_id + ga_session_id. Edge cases: consent rejection, Measurement Protocol, cross-device limitations.

GA4 Event Ordering with Batch Fields — Deterministic event sequencing using batch_event_index, batch_ordering_id, and batch_page_id. Resolves timestamp ties so funnel analysis and path analysis produce reliable results.

GA4 Traffic Source Fields — The four traffic source locations in the export (traffic_source, collected_traffic_source, session_traffic_source_last_click, event_params legacy keys), their scopes, and when to use each. Includes the gclid correction pattern for Google Ads.

Supporting Concepts

GA4 Event Data Structure — The underlying event model: nested schemas, parameter extraction, the shift from UA’s session-centric to GA4’s event-centric architecture. Prerequisite knowledge for sessionization work.

Window Function Patterns for Analytics SQL — The SQL toolkit for sessionization: FIRST_VALUE for attribution propagation, ROW_NUMBER for event sequencing, named windows for performance. These patterns are the implementation mechanism for event-grain sessionization.

Custom Sessionization Patterns — Building session definitions from scratch using LAG + running SUM when GA4’s 30-minute timeout doesn’t match your business needs. An alternative to using ga_session_id.

Implementation Patterns

Incremental Models in dbt — The sessionized events model is typically an incremental dbt model using insert_overwrite with date partitioning. Lookback windows handle late-arriving data and ensure window functions recalculate correctly for sessions spanning partition boundaries.

dbt Intermediate Layer Patterns — The sessionized events model sits in the intermediate layer: it preserves event grain while adding session context. Downstream session marts aggregate this for dashboards.

dbt Mart Layer Patterns — Session-grain marts derive from the sessionized events table with a simple GROUP BY. One source of truth (the enriched event table), multiple output shapes.

BigQuery Partition Pruning Patterns — Partition by event_date and cluster by session_key for optimal query and window function performance. Always filter on _TABLE_SUFFIX in base models to avoid full-history scans.

Known Limitations

GA4 BigQuery Number Discrepancies — Why BigQuery numbers differ from the GA4 interface by 1-5%: HyperLogLog++ probabilistic counting, Consent Mode behavioral modeling, Google Signals cross-device deduplication, and data processing delays.

Article Source

These notes were decomposed from Building Session Tables from GA4 Event Data, which provides the complete end-to-end implementation including the full SQL model, dbt configuration, schema tests, and the session mart derivation pattern.