Sessionization is the process of adding session context to GA4’s raw event data. Unlike Universal Analytics, which provided pre-aggregated session tables, GA4 exports events with no session-level structure. The session definition must be constructed in the transformation layer.
This hub connects the concepts involved in building a sessionized events table from GA4 BigQuery exports.
Core Concepts
Event-Grain Sessionization — The design philosophy: enrich events with session context instead of aggregating events into sessions. Preserves event-level detail while making session analysis trivial. The central pattern for the sessionized events model.
GA4 Session Key Construction — Why ga_session_id alone fails as a session identifier and how to build the correct composite key from user_pseudo_id + ga_session_id. Edge cases: consent rejection, Measurement Protocol, cross-device limitations.
GA4 Event Ordering with Batch Fields — Deterministic event sequencing using batch_event_index, batch_ordering_id, and batch_page_id. Resolves timestamp ties so funnel analysis and path analysis produce reliable results.
GA4 Traffic Source Fields — The four traffic source locations in the export (traffic_source, collected_traffic_source, session_traffic_source_last_click, event_params legacy keys), their scopes, and when to use each. Includes the gclid correction pattern for Google Ads.
Supporting Concepts
GA4 Event Data Structure — The underlying event model: nested schemas, parameter extraction, the shift from UA’s session-centric to GA4’s event-centric architecture. Prerequisite knowledge for sessionization work.
Window Function Patterns for Analytics SQL — The SQL toolkit for sessionization: FIRST_VALUE for attribution propagation, ROW_NUMBER for event sequencing, named windows for performance. These patterns are the implementation mechanism for event-grain sessionization.
Custom Sessionization Patterns — Building session definitions from scratch using LAG + running SUM when GA4’s 30-minute timeout doesn’t match your business needs. An alternative to using ga_session_id.
Implementation Patterns
Incremental Models in dbt — The sessionized events model is typically an incremental dbt model using insert_overwrite with date partitioning. Lookback windows handle late-arriving data and ensure window functions recalculate correctly for sessions spanning partition boundaries.
dbt Intermediate Layer Patterns — The sessionized events model sits in the intermediate layer: it preserves event grain while adding session context. Downstream session marts aggregate this for dashboards.
dbt Mart Layer Patterns — Session-grain marts derive from the sessionized events table with a simple GROUP BY. One source of truth (the enriched event table), multiple output shapes.
BigQuery Partition Pruning Patterns — Partition by event_date and cluster by session_key for optimal query and window function performance. Always filter on _TABLE_SUFFIX in base models to avoid full-history scans.
Known Limitations
GA4 BigQuery Number Discrepancies — Why BigQuery numbers differ from the GA4 interface by 1-5%: HyperLogLog++ probabilistic counting, Consent Mode behavioral modeling, Google Signals cross-device deduplication, and data processing delays.
Article Source
These notes were decomposed from Building Session Tables from GA4 Event Data, which provides the complete end-to-end implementation including the full SQL model, dbt configuration, schema tests, and the session mart derivation pattern.