The common approach to sessionization is building a session-grain table: one row per session with aggregated metrics like duration, page views, and revenue. This works for basic reporting but discards event-level detail.
Once aggregated to session grain, the sequence of events, timing between interactions, and specific pages viewed all collapse into summary metrics. Answering questions like “what pages do users visit before purchasing?” requires joining back to raw events. “How long after landing do users add to cart?” requires the event timestamps that were aggregated away.
The Event-Grain Alternative
Instead of building sessions as the output, add session context as columns on every event. Each row remains an individual event, but now carries its session identity, the session’s traffic source, the landing page, and its position within the session sequence.
The pattern uses window functions to propagate session-level values across every event in a session. FIRST_VALUE grabs the traffic source from the first event. MIN and MAX establish session boundaries. ROW_NUMBER assigns sequence positions. All of this happens without changing the grain — every event row stays intact.
SELECT event_date, event_timestamp, event_name, session_key,
-- Session context added via window functions MIN(event_timestamp) OVER (PARTITION BY session_key) AS session__started_at, MAX(event_timestamp) OVER (PARTITION BY session_key) AS session__ended_at, FIRST_VALUE(source IGNORE NULLS) OVER w AS session__source, FIRST_VALUE(landing_page IGNORE NULLS) OVER w AS session__landing_page, ROW_NUMBER() OVER w AS event__number_in_session
WINDOW w AS (PARTITION BY session_key ORDER BY event_timestamp)The result is an enriched event table where every row carries its full session context as additional columns.
What This Enables
Queries that require complex joins against a session-grain table become straightforward:
- “Show me all events in sessions that converted” — a WHERE clause filtering on events where
session_keyappears alongside a purchase event. - “What’s the average time from session start to first add-to-cart?” —
event_timestamp - session__started_atfiltered toevent_name = 'add_to_cart'. - “Which landing pages lead to the longest session engagement?” — a GROUP BY on
session__landing_pagewith event-level timestamps.
The Session Mart Becomes Trivial
The session-grain table you might still need for dashboards becomes a simple derivative. All session context already exists on every row, so aggregation requires no joins:
SELECT session_key, ANY_VALUE(session__source) AS session__source, ANY_VALUE(session__medium) AS session__medium, ANY_VALUE(session__landing_page) AS session__landing_page, MIN(event_timestamp) AS session__started_at, MAX(event_timestamp) AS session__ended_at, COUNT(*) AS session__events, COUNTIF(event_name = 'page_view') AS session__page_views, COUNTIF(event_name = 'purchase') AS session__purchasesFROM sessionized_eventsGROUP BY session_keyANY_VALUE works for session-scoped dimensions because the window functions already propagated the same value across all events in a session. No joins, no subqueries, just a GROUP BY on the session key.
This follows the mart layer principle: the enriched event table is an intermediate model that preserves event grain, and the session mart aggregates it for dashboard consumption. One source of truth, multiple output shapes.
The Trade-Off: Storage
The trade-off is storage. Adding session columns to every event row means a wider table. On a high-traffic property, that’s meaningful additional bytes. But BigQuery’s columnar storage means queries only read the columns they reference, so unused session columns don’t add query cost. And partition pruning limits how many days get scanned regardless of row width.
The alternative — maintaining both a session-grain table and going back to raw events for ad-hoc analysis — costs more in engineering time and introduces consistency risks when two tables disagree.
When Session-Grain Is Enough
Not every project needs event-grain sessionization. If your analytics requirements are limited to standard session metrics (counts, duration, bounce rate, conversion rate) and nobody is asking event-sequence questions, a session-grain table is simpler and sufficient.
The decision comes down to the questions you anticipate. If “what happened before the conversion?” or “what’s the event sequence in sessions from paid search?” are on the roadmap, start with event-grain. Migrating from session-grain to event-grain later means rebuilding the entire pipeline.
Implementation Flow
The practical implementation in dbt follows this lineage:
- Base model (
base__ga4__events): Extract from raw export, clean types, filter invalid records - Intermediate model (
int__ga4__events_sessionized): Add session key, session context via window functions, event positioning - Mart model (
mrt__ga4__sessions): GROUP BY session key for dashboard consumption
The intermediate model is the workhorse. It’s where sessionization happens, and it’s where incremental processing matters most because event tables grow fast. The mart is just an aggregation layer on top.
When requirements change, adding a new session-level metric means adding it to the mart’s SELECT. Changing session attribution logic means changing the intermediate model once; the mart inherits the fix automatically.