ServicesAboutNotesContact Get in touch →
EN FR
Note

Event-Grain Sessionization

Why enriching events with session context beats building session-grain tables, and how the pattern enables flexible downstream analysis.

Planted
ga4bigquerydata modelinganalytics

The common approach to sessionization is building a session-grain table: one row per session with aggregated metrics like duration, page views, and revenue. This works for basic reporting but discards event-level detail.

Once aggregated to session grain, the sequence of events, timing between interactions, and specific pages viewed all collapse into summary metrics. Answering questions like “what pages do users visit before purchasing?” requires joining back to raw events. “How long after landing do users add to cart?” requires the event timestamps that were aggregated away.

The Event-Grain Alternative

Instead of building sessions as the output, add session context as columns on every event. Each row remains an individual event, but now carries its session identity, the session’s traffic source, the landing page, and its position within the session sequence.

The pattern uses window functions to propagate session-level values across every event in a session. FIRST_VALUE grabs the traffic source from the first event. MIN and MAX establish session boundaries. ROW_NUMBER assigns sequence positions. All of this happens without changing the grain — every event row stays intact.

SELECT
event_date,
event_timestamp,
event_name,
session_key,
-- Session context added via window functions
MIN(event_timestamp) OVER (PARTITION BY session_key) AS session__started_at,
MAX(event_timestamp) OVER (PARTITION BY session_key) AS session__ended_at,
FIRST_VALUE(source IGNORE NULLS) OVER w AS session__source,
FIRST_VALUE(landing_page IGNORE NULLS) OVER w AS session__landing_page,
ROW_NUMBER() OVER w AS event__number_in_session
WINDOW w AS (PARTITION BY session_key ORDER BY event_timestamp)

The result is an enriched event table where every row carries its full session context as additional columns.

What This Enables

Queries that require complex joins against a session-grain table become straightforward:

  • “Show me all events in sessions that converted” — a WHERE clause filtering on events where session_key appears alongside a purchase event.
  • “What’s the average time from session start to first add-to-cart?”event_timestamp - session__started_at filtered to event_name = 'add_to_cart'.
  • “Which landing pages lead to the longest session engagement?” — a GROUP BY on session__landing_page with event-level timestamps.

The Session Mart Becomes Trivial

The session-grain table you might still need for dashboards becomes a simple derivative. All session context already exists on every row, so aggregation requires no joins:

SELECT
session_key,
ANY_VALUE(session__source) AS session__source,
ANY_VALUE(session__medium) AS session__medium,
ANY_VALUE(session__landing_page) AS session__landing_page,
MIN(event_timestamp) AS session__started_at,
MAX(event_timestamp) AS session__ended_at,
COUNT(*) AS session__events,
COUNTIF(event_name = 'page_view') AS session__page_views,
COUNTIF(event_name = 'purchase') AS session__purchases
FROM sessionized_events
GROUP BY session_key

ANY_VALUE works for session-scoped dimensions because the window functions already propagated the same value across all events in a session. No joins, no subqueries, just a GROUP BY on the session key.

This follows the mart layer principle: the enriched event table is an intermediate model that preserves event grain, and the session mart aggregates it for dashboard consumption. One source of truth, multiple output shapes.

The Trade-Off: Storage

The trade-off is storage. Adding session columns to every event row means a wider table. On a high-traffic property, that’s meaningful additional bytes. But BigQuery’s columnar storage means queries only read the columns they reference, so unused session columns don’t add query cost. And partition pruning limits how many days get scanned regardless of row width.

The alternative — maintaining both a session-grain table and going back to raw events for ad-hoc analysis — costs more in engineering time and introduces consistency risks when two tables disagree.

When Session-Grain Is Enough

Not every project needs event-grain sessionization. If your analytics requirements are limited to standard session metrics (counts, duration, bounce rate, conversion rate) and nobody is asking event-sequence questions, a session-grain table is simpler and sufficient.

The decision comes down to the questions you anticipate. If “what happened before the conversion?” or “what’s the event sequence in sessions from paid search?” are on the roadmap, start with event-grain. Migrating from session-grain to event-grain later means rebuilding the entire pipeline.

Implementation Flow

The practical implementation in dbt follows this lineage:

  1. Base model (base__ga4__events): Extract from raw export, clean types, filter invalid records
  2. Intermediate model (int__ga4__events_sessionized): Add session key, session context via window functions, event positioning
  3. Mart model (mrt__ga4__sessions): GROUP BY session key for dashboard consumption

The intermediate model is the workhorse. It’s where sessionization happens, and it’s where incremental processing matters most because event tables grow fast. The mart is just an aggregation layer on top.

When requirements change, adding a new session-level metric means adding it to the mart’s SELECT. Changing session attribution logic means changing the intermediate model once; the mart inherits the fix automatically.