ServicesAboutNotesContact Get in touch →
EN FR
Note

Unit Testing GA4 Sessionization

How to unit test GA4 sessionization logic in dbt — session boundary detection, cross-midnight sessions, microsecond timestamps, and single-event sessions.

Planted
dbtga4bigquerytestinganalytics

GA4 sends raw events to BigQuery with no session-level aggregations. The sessionization logic must be built manually — grouping events by user, detecting session boundaries based on inactivity thresholds, and calculating metrics like duration and event counts.

This logic involves timestamp arithmetic (GA4’s microsecond timestamps), window functions to calculate time gaps, configurable inactivity thresholds, and aggregation from event-level to session-level records. Bugs in this logic produce incorrect conversion rates, session durations, and bounce rates.

Testing Session Boundary Detection

The core of sessionization is deciding when one session ends and another begins — typically after 30 minutes of inactivity. Here’s a model that implements this:

-- models/intermediate/int__ga4_sessions.sql
with events as (
select
user_pseudo_id,
ga_session_id,
event_timestamp,
event_name,
timestamp_diff(
timestamp_micros(event_timestamp),
lag(timestamp_micros(event_timestamp)) over (
partition by user_pseudo_id
order by event_timestamp
),
minute
) as minutes_since_last_event
from {{ ref('base__ga4__events') }}
),
sessionized as (
select
*,
case
when minutes_since_last_event > 30 or minutes_since_last_event is null
then 1
else 0
end as is_new_session
from events
)
select
user_pseudo_id,
ga_session_id,
concat(user_pseudo_id, '_', ga_session_id) as session_key,
min(event_timestamp) as session_start,
max(event_timestamp) as session_end,
timestamp_diff(
timestamp_micros(max(event_timestamp)),
timestamp_micros(min(event_timestamp)),
second
) as session_duration_seconds,
count(*) as event_count
from sessionized
group by 1, 2, 3
unit_tests:
- name: test_int_ga4_sessions_boundaries
model: int__ga4_sessions
description: "Sessions should break after 30 minutes of inactivity"
given:
- input: ref('base__ga4__events')
rows:
# User 1, Session 1: events within 30 min
- {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717200000000000, event_name: "page_view"}
- {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717201800000000, event_name: "scroll"} # +30 min
# User 1, Session 2: gap > 30 min
- {user_pseudo_id: "user_1", ga_session_id: 1002, event_timestamp: 1717207200000000, event_name: "page_view"} # +90 min from first
# User 2, Session 1
- {user_pseudo_id: "user_2", ga_session_id: 2001, event_timestamp: 1717200000000000, event_name: "page_view"}
expect:
rows:
- {session_key: "user_1_1001", session_duration_seconds: 1800, event_count: 2}
- {session_key: "user_1_1002", session_duration_seconds: 0, event_count: 1}
- {session_key: "user_2_2001", session_duration_seconds: 0, event_count: 1}

This test verifies three core behaviors:

  • Session 1001: Two events 30 minutes apart (exactly at the boundary) belong to the same session. The duration is 1800 seconds. This tests the boundary condition — is your threshold > 30 or >= 30?
  • Session 1002: A third event 90 minutes after the first triggers a new session (gap exceeds 30 minutes from session 1001’s last event). Duration is 0 seconds for a single-event session.
  • Session 2001: User 2’s single event creates its own session with 0-second duration. This verifies session key construction and partition isolation — User 1’s events don’t affect User 2.

The microsecond timestamps (1717200000000000) are GA4’s native format. You’ll need to calculate these values based on your test scenarios. 1717200000000000 is approximately 2024-06-01 00:00:00 UTC in microseconds.

Testing Cross-Midnight Sessions

A subtle but important edge case: sessions that span midnight.

unit_tests:
- name: test_int_ga4_sessions_cross_midnight
model: int__ga4_sessions
description: "Sessions spanning midnight should not break artificially"
given:
- input: ref('base__ga4__events')
rows:
- {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717199400000000, event_name: "page_view"} # 23:50
- {user_pseudo_id: "user_1", ga_session_id: 1001, event_timestamp: 1717200600000000, event_name: "purchase"} # 00:10 next day
expect:
rows:
- {session_key: "user_1_1001", session_duration_seconds: 1200, event_count: 2}

This catches a real bug: some implementations accidentally break sessions at date boundaries because they partition by date instead of by user. A user browsing at 11:50 PM who purchases at 12:10 AM should have a single 20-minute session, not two separate sessions.

If your model uses sharded GA4 tables (events_YYYYMMDD) and partitions by the table date, the cross-midnight test might fail legitimately — in which case the test documents this known limitation.

What to Test Beyond the Basics

For a production sessionization model, also consider testing:

  • Very short gaps: Two events 29 minutes apart should be the same session. Two events 31 minutes apart should not.
  • Multiple users at the same timestamp: User 1 and User 2 both have events at the same microsecond. Partition isolation must hold.
  • Single-event sessions: A user with exactly one event should have a session with 0-second duration and event_count of 1.
  • Configurable thresholds: If your inactivity threshold is a variable (var('session_timeout_minutes', 30)), test with overridden values to verify the variable is actually used.

Each of these tests a specific assumption your sessionization logic makes. When the logic changes — say, you switch from 30 to 20 minutes, or add a page-based session boundary — the tests immediately show which assumptions still hold.