The sessionized events model is the workhorse of a GA4 dbt project. It takes the cleaned, typed base events and adds session context to every row — landing page, traffic source, conversion flags, session duration — via window functions. Everything downstream reads from this model.
The philosophy is event-grain sessionization: preserve the one-row-per-event grain while making session-level analysis trivial by pre-computing session context. This note covers the implementation specifics.
Model Configuration
-- models/intermediate/ga4/int__ga4__events_sessionized.sql
{{ config( materialized='incremental', incremental_strategy='insert_overwrite', partition_by={ "field": "event__date", "data_type": "date", "granularity": "day" }, cluster_by=['user__pseudo_id', 'session__key', 'event__name'] )}}
{% set lookback_days = var('ga4_static_incremental_days', 3) %}The same static lookback pattern applies here. insert_overwrite replaces date partitions atomically. The 3-day lookback ensures window functions re-execute over sessions that may span the partition boundary — a session that starts on day N-3 and has events on day N-2 and N-1 needs all its events present for correct session context.
Clustering by user__pseudo_id, session__key, and event__name improves query performance for the most common access patterns: per-user analysis, per-session analysis, and event-type filtering.
CTE Structure
The model uses a five-stage CTE pipeline:
events → with_clean_urls → with_session_metrics → with_channel → final ↑ event_items (joined in final)Stage 1: Select from Base
WITH events AS ( SELECT event__key, session__key, event__date, event__timestamp_utc, user__pseudo_id, user__id, session__ga_id, session__number, event__name, page__location, page__title, event__source, event__medium, event__campaign, session__source, session__medium, session__campaign, event__engagement_time_msec, device__category, geo__country, transaction__revenue FROM {{ ref('base__ga4__events') }}
{% if is_incremental() %} WHERE event__date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{ lookback_days }} DAY) {% endif %})Explicit column selection rather than SELECT * makes the model’s contract clear. Adding a new column to the base model won’t silently appear in downstream models.
Stage 2: URL Cleaning
with_clean_urls AS ( SELECT *, REGEXP_REPLACE( SPLIT(page__location, '?')[SAFE_OFFSET(0)], r'#.*$', '' ) AS page__path FROM events)page__path strips query parameters and fragments from page__location. Landing page analysis using full URLs is noisy — the same /blog/post visited via different UTM combinations would appear as separate pages. Using the path alone makes landing page aggregations meaningful.
SPLIT(url, '?')[SAFE_OFFSET(0)] gets everything before the ?. The subsequent REGEXP_REPLACE removes fragments (#section-name). SAFE_OFFSET prevents errors when the URL has no ?.
Stage 3: Session Metrics via Window Functions
with_session_metrics AS ( SELECT e.*,
-- Navigation FIRST_VALUE(page__path IGNORE NULLS) OVER w AS session__landing_page, FIRST_VALUE(page__location IGNORE NULLS) OVER w AS session__landing_page_full, LAST_VALUE(page__path IGNORE NULLS) OVER w AS session__exit_page,
-- Traffic source (first non-null in session) FIRST_VALUE( COALESCE(event__source, session__source) IGNORE NULLS ) OVER w AS session__source_final, FIRST_VALUE( COALESCE(event__medium, session__medium) IGNORE NULLS ) OVER w AS session__medium_final, FIRST_VALUE( COALESCE(event__campaign, session__campaign) IGNORE NULLS ) OVER w AS session__campaign_final,
-- Engagement counts COUNT(*) OVER p AS session__events, COUNT(CASE WHEN event__name = 'page_view' THEN 1 END) OVER p AS session__pageviews, SUM(event__engagement_time_msec) OVER p AS session__engagement_time_msec,
-- Session duration TIMESTAMP_DIFF( MAX(event__timestamp_utc) OVER p, MIN(event__timestamp_utc) OVER p, SECOND ) AS session__duration_seconds,
-- Conversion flags MAX(CASE WHEN event__name = 'purchase' THEN 1 ELSE 0 END) OVER p AS session__has_purchase, MAX(CASE WHEN event__name = 'add_to_cart' THEN 1 ELSE 0 END) OVER p AS session__has_add_to_cart, MAX(CASE WHEN event__name = 'begin_checkout' THEN 1 ELSE 0 END) OVER p AS session__has_checkout, MAX(CASE WHEN event__name = 'sign_up' THEN 1 ELSE 0 END) OVER p AS session__has_signup, MAX(CASE WHEN event__name = 'generate_lead' THEN 1 ELSE 0 END) OVER p AS session__has_lead,
-- Revenue SUM(transaction__revenue) OVER p AS session__revenue,
-- Event sequencing ROW_NUMBER() OVER w AS event__sequence_number, CASE WHEN ROW_NUMBER() OVER w = 1 THEN TRUE ELSE FALSE END AS event__is_session_start, CASE WHEN ROW_NUMBER() OVER ( PARTITION BY session__key ORDER BY event__timestamp_utc DESC ) = 1 THEN TRUE ELSE FALSE END AS event__is_session_end,
-- Time between events TIMESTAMP_DIFF( event__timestamp_utc, LAG(event__timestamp_utc) OVER w, SECOND ) AS event__seconds_since_previous
FROM with_clean_urls e
WINDOW w AS ( PARTITION BY session__key ORDER BY event__timestamp_utc ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ), p AS (PARTITION BY session__key))Two named windows encode the two types of session-scoped computation:
w(ordered) — for FIRST_VALUE, LAST_VALUE, ROW_NUMBER, LAG. Requires ordering and the full frame.p(partition only) — for MAX, COUNT, SUM. No order needed; scans the entire partition naturally.
See GA4 Window Function Pitfalls for why the ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING frame is required for LAST_VALUE.
Stage 4: Channel Grouping
with_channel AS ( SELECT *, {{ default_channel_grouping('session__source_final', 'session__medium_final') }} AS session__channel_grouping FROM with_session_metrics)Channel grouping is a separate CTE because the macro generates a multi-line CASE statement. Keeping it isolated makes the model readable and makes it easy to swap in a custom channel grouping macro.
Stage 5: Item Aggregation Join
event_items AS ( SELECT event__key, COUNT(*) AS event__items, SUM(item__revenue) AS event__items_revenue FROM {{ ref('int__ga4__event_items') }} {% if is_incremental() %} WHERE event__date >= DATE_SUB(CURRENT_DATE(), INTERVAL {{ lookback_days }} DAY) {% endif %} GROUP BY event__key),
final AS ( SELECT e.*, COALESCE(i.event__items, 0) AS event__items, COALESCE(i.event__items_revenue, 0) AS event__items_revenue FROM with_channel e LEFT JOIN event_items i ON e.event__key = i.event__key)
SELECT * FROM finalItem data is aggregated to event grain before the join, avoiding row expansion. The COALESCE ensures non-ecommerce events show 0 items rather than null.
Traffic Source Resolution
The COALESCE(event__source, session__source) pattern handles two different locations where GA4 stores source information:
event__source— fromevent_params, present on events tagged with UTM parameterssession__source— fromcollected_traffic_source, GA4’s session-level source
Modern GA4 implementations (July 2024+) use session_traffic_source_last_click for clean session attribution. For compatibility with older data or properties that haven’t been updated, the COALESCE approach picks up whichever source field is populated. See GA4 Traffic Source Fields for the full picture of GA4’s attribution field landscape.
What Downstream Models Get
With int__ga4__events_sessionized as their source, mart models need only simple aggregations:
-- Session mart: just GROUP BYSELECT session__key, ANY_VALUE(session__landing_page) AS session__landing_page, ANY_VALUE(session__channel_grouping) AS session__channel_grouping, ANY_VALUE(session__has_purchase) AS session__has_purchase, COUNT(*) AS session__eventsFROM int__ga4__events_sessionizedGROUP BY session__keyEvent-level questions that would require joins against separate session tables now work as simple filters:
-- Events in converting sessionsSELECT * FROM int__ga4__events_sessionizedWHERE session__has_purchase = 1 AND event__name != 'purchase'ORDER BY session__key, event__sequence_numberOne intermediate model with pre-computed session context enables flexible event-level and session-level analysis without joins in downstream models.