GA4 sends raw events to BigQuery with no session-level aggregations. The sessionization logic must be built manually — grouping events by user, detecting session boundaries based on inactivity thresholds, and calculating metrics like duration and event counts.
This logic involves timestamp arithmetic (GA4’s microsecond timestamps), window functions to calculate time gaps, configurable inactivity thresholds, and aggregation from event-level to session-level records. Bugs in this logic produce incorrect conversion rates, session durations, and bounce rates.
Testing Session Boundary Detection
The core of sessionization is deciding when one session ends and another begins — typically after 30 minutes of inactivity. Here’s a model that implements this:
-- models/intermediate/int__ga4_sessions.sqlwith events as ( select user_pseudo_id, ga_session_id, event_timestamp, event_name, timestamp_diff( timestamp_micros(event_timestamp), lag(timestamp_micros(event_timestamp)) over ( partition by user_pseudo_id order by event_timestamp ), minute ) as minutes_since_last_event from {{ ref('base__ga4__events') }}),
sessionized as ( select *, case when minutes_since_last_event > 30 or minutes_since_last_event is null then 1 else 0 end as is_new_session from events)
select user_pseudo_id, ga_session_id, concat(user_pseudo_id, '_', ga_session_id) as session_key, min(event_timestamp) as session_start, max(event_timestamp) as session_end, timestamp_diff( timestamp_micros(max(event_timestamp)), timestamp_micros(min(event_timestamp)), second ) as session_duration_seconds, count(*) as event_countfrom sessionizedgroup by 1, 2, 3unit_tests: - name: test_int_ga4_sessions_boundaries model: int__ga4_sessions description: "Sessions should break after 30 minutes of inactivity" given: - input: ref('base__ga4__events') rows: # User 1, Session 1: events within 30 min - {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717200000000000, event_name: "page_view"} - {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717201800000000, event_name: "scroll"} # +30 min # User 1, Session 2: gap > 30 min - {user_pseudo_id: "user_1", ga_session_id: 1002, event_timestamp: 1717207200000000, event_name: "page_view"} # +90 min from first # User 2, Session 1 - {user_pseudo_id: "user_2", ga_session_id: 2001, event_timestamp: 1717200000000000, event_name: "page_view"} expect: rows: - {session_key: "user_1_1001", session_duration_seconds: 1800, event_count: 2} - {session_key: "user_1_1002", session_duration_seconds: 0, event_count: 1} - {session_key: "user_2_2001", session_duration_seconds: 0, event_count: 1}This test verifies three core behaviors:
- Session 1001: Two events 30 minutes apart (exactly at the boundary) belong to the same session. The duration is 1800 seconds. This tests the boundary condition — is your threshold
> 30or>= 30? - Session 1002: A third event 90 minutes after the first triggers a new session (gap exceeds 30 minutes from session 1001’s last event). Duration is 0 seconds for a single-event session.
- Session 2001: User 2’s single event creates its own session with 0-second duration. This verifies session key construction and partition isolation — User 1’s events don’t affect User 2.
The microsecond timestamps (1717200000000000) are GA4’s native format. You’ll need to calculate these values based on your test scenarios. 1717200000000000 is approximately 2024-06-01 00:00:00 UTC in microseconds.
Testing Cross-Midnight Sessions
A subtle but important edge case: sessions that span midnight.
unit_tests: - name: test_int_ga4_sessions_cross_midnight model: int__ga4_sessions description: "Sessions spanning midnight should not break artificially" given: - input: ref('base__ga4__events') rows: - {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717199400000000, event_name: "page_view"} # 23:50 - {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717200600000000, event_name: "purchase"} # 00:10 next day expect: rows: - {session_key: "user_1_1001", session_duration_seconds: 1200, event_count: 2}This catches a real bug: some implementations accidentally break sessions at date boundaries because they partition by date instead of by user. A user browsing at 11:50 PM who purchases at 12:10 AM should have a single 20-minute session, not two separate sessions.
If your model uses sharded GA4 tables (events_YYYYMMDD) and partitions by the table date, the cross-midnight test might fail legitimately — in which case the test documents this known limitation.
What to Test Beyond the Basics
For a production sessionization model, also consider testing:
- Very short gaps: Two events 29 minutes apart should be the same session. Two events 31 minutes apart should not.
- Multiple users at the same timestamp: User 1 and User 2 both have events at the same microsecond. Partition isolation must hold.
- Single-event sessions: A user with exactly one event should have a session with 0-second duration and event_count of 1.
- Configurable thresholds: If your inactivity threshold is a variable (
var('session_timeout_minutes', 30)), test with overridden values to verify the variable is actually used.
Each of these tests a specific assumption your sessionization logic makes. When the logic changes — say, you switch from 30 to 20 minutes, or add a page-based session boundary — the tests immediately show which assumptions still hold.