GA4 defines sessions with a 30-minute inactivity timeout. That’s a reasonable default, but your business might need something different. High-engagement apps might need 15-minute timeouts to distinguish browsing sessions from purchase sessions. Content sites might want 45-minute timeouts because readers take breaks mid-article. Some teams need campaign-based session breaks where new UTM parameters always trigger a new session, regardless of timing.
Custom sessionization gives you full control over what constitutes a “session” in your data. The technique uses window functions — specifically LAG for gap detection and a running SUM for session numbering — to build sessions from raw event-level data.
The Core Pattern: GAP Detection + Running Sum
The pattern works in two steps. First, LAG identifies session boundaries by comparing each event’s timestamp to the previous event. Then a running sum of the boundary flags assigns session numbers.
WITH events_with_gaps AS ( SELECT user_pseudo_id, event_timestamp, event_name, traffic_source.source AS traffic__source, traffic_source.medium AS traffic__medium, CASE WHEN LAG(event_timestamp) OVER ( PARTITION BY user_pseudo_id ORDER BY event_timestamp ) IS NULL THEN 1 -- First event = new session WHEN TIMESTAMP_DIFF( event_timestamp, LAG(event_timestamp) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp), MINUTE ) > 30 THEN 1 -- Gap > 30 min = new session ELSE 0 END AS event__is_new_session FROM `project.analytics_XXXXX.events_*` WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'),
sessionized AS ( SELECT user_pseudo_id, event_timestamp, event_name, traffic__source, traffic__medium, event__is_new_session, SUM(event__is_new_session) OVER ( PARTITION BY user_pseudo_id ORDER BY event_timestamp ) AS session_number FROM events_with_gaps)
SELECT user_pseudo_id, event_timestamp, event_name, traffic__source, traffic__medium, session_number, CONCAT(user_pseudo_id, '_', session_number) AS custom_session_idFROM sessionized;The first CTE identifies boundaries: the very first event for a user (where LAG returns NULL) and any event that follows more than 30 minutes of inactivity. The second CTE uses SUM(event__is_new_session) as a running total — every time the flag is 1, the session number increments by one. Events between boundaries share the same cumulative value, which becomes the session identifier.
The CONCAT(user_pseudo_id, '_', session_number) creates a globally unique session ID. This is critical — session_number alone is not unique across users.
Configuring the Timeout
The 30-minute timeout is just a constant. Change it to match your business:
-- 15-minute timeout for high-engagement appsWHEN TIMESTAMP_DIFF(..., MINUTE) > 15 THEN 1
-- 45-minute timeout for content-heavy sitesWHEN TIMESTAMP_DIFF(..., MINUTE) > 45 THEN 1
-- 2-hour timeout for B2B platforms with long workflowsWHEN TIMESTAMP_DIFF(..., HOUR) > 2 THEN 1How to pick the right timeout: look at the distribution of inter-event gaps in your data. Plot a histogram of time between consecutive events per user. You’ll typically see a bimodal distribution — a cluster of short gaps (within-session activity) and a cluster of long gaps (between sessions). Set your timeout at the valley between those clusters.
-- Analyze inter-event gap distributionSELECT CASE WHEN gap_minutes <= 5 THEN '0-5 min' WHEN gap_minutes <= 15 THEN '5-15 min' WHEN gap_minutes <= 30 THEN '15-30 min' WHEN gap_minutes <= 60 THEN '30-60 min' WHEN gap_minutes <= 120 THEN '60-120 min' ELSE '120+ min' END AS gap_bucket, COUNT(*) AS event_countFROM ( SELECT TIMESTAMP_DIFF( event_timestamp, LAG(event_timestamp) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp), MINUTE ) AS gap_minutes FROM events)WHERE gap_minutes IS NOT NULLGROUP BY 1ORDER BY 1;If the data shows a clear drop-off between 20 and 40 minutes, a 30-minute timeout makes sense. If activity clusters tightly under 10 minutes with a long tail, 15 minutes might be more appropriate.
Campaign-Based Session Splitting
Sometimes timing isn’t the only session boundary. A user who clicks a new ad (different UTM parameters) is arguably starting a new session even if it’s been only 5 minutes since their last event. This matters for attribution modeling because it determines which session gets credit for a conversion.
Add campaign-based splitting to the boundary detection:
CASE WHEN LAG(event_timestamp) OVER (w) IS NULL THEN 1 WHEN TIMESTAMP_DIFF(event_timestamp, LAG(event_timestamp) OVER (w), MINUTE) > 30 THEN 1 -- New UTM source/medium = new session WHEN traffic_source.source != LAG(traffic_source.source) OVER (w) OR traffic_source.medium != LAG(traffic_source.medium) OVER (w) THEN 1 ELSE 0END AS event__is_new_sessionBe careful with NULL handling here. If traffic_source.source is NULL for organic traffic, the != comparison returns NULL (not TRUE). Use IFNULL or COALESCE to handle missing attribution data:
WHEN IFNULL(traffic_source.source, 'none') != IFNULL(LAG(traffic_source.source) OVER (w), 'none') THEN 1Session Metrics
Once you have sessions, aggregate event-level data into session-level metrics in a subsequent CTE:
, session_metrics AS ( SELECT custom_session_id, user_pseudo_id, MIN(event_timestamp) AS session__started_at, MAX(event_timestamp) AS session__ended_at, TIMESTAMP_DIFF(MAX(event_timestamp), MIN(event_timestamp), SECOND) AS session__duration_seconds, COUNT(*) AS session__events, COUNTIF(event_name = 'page_view') AS session__pageviews FROM sessionized GROUP BY custom_session_id, user_pseudo_id)Common session-level metrics to compute:
- Duration:
TIMESTAMP_DIFF(MAX, MIN, SECOND)— note this is 0 for single-event sessions, which is a GA4 quirk too - Pageviews:
COUNTIF(event_name = 'page_view') - Engagement: Define your own rule (e.g., duration > 10 seconds OR pageviews > 1 OR conversion event present)
- Bounce: The inverse of your engagement definition
- Entry page: Use
FIRST_VALUE(page_location) OVER (PARTITION BY session_id ORDER BY event_timestamp)before aggregating - Exit page: Same with
LAST_VALUEand an explicitROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGframe
Why Not Just Use ga_session_id?
GA4’s ga_session_id (extracted from event_params) works for many cases, but it has limitations:
- It uses Google’s 30-minute timeout, which you can’t change
- It resets at midnight in the user’s timezone in some implementations, splitting sessions that span midnight
- It’s not globally unique — multiple users starting sessions at the same second share the same
ga_session_id - You can’t add custom boundary rules like campaign-based splitting
If GA4’s session definition matches your needs, use CONCAT(user_pseudo_id, '.', ga_session_id) as your session key and skip custom sessionization. Build your own only when the business logic demands it.
Cross-Device Session Stitching
When users have a user_id (logged-in state), you can build sessions across devices:
PARTITION BY COALESCE(user_id, user_pseudo_id) ORDER BY event_timestampThis groups events by authenticated user when available, falling back to anonymous user when not. The tradeoff: anonymous sessions for the same person on different devices remain separate (you can’t stitch what you can’t identify), and the moment a user logs in, their pre-login events on that device join the authenticated session.
For more sophisticated identity resolution — matching anonymous sessions to known users retroactively — you need a separate identity graph, which is beyond what sessionization alone can handle.
Production Considerations
Materialization: Custom sessionization queries are expensive because they scan all events within the date range and compute window functions across them. Materialize as an incremental dbt model with insert_overwrite on the event date partition. Reprocess the last 2-3 days to handle late-arriving events.
Testing: Validate that your custom sessions produce reasonable metrics. Compare session counts, average duration, and bounce rates against GA4’s built-in session metrics. Large discrepancies (>20%) suggest a timeout or boundary rule that doesn’t match user behavior.
Naming: Distinguish your custom sessions from GA4 sessions in column names. Use custom_session_id instead of session_id, and document the timeout and boundary rules in your model’s YAML description. Future you will thank present you when debugging a dashboard discrepancy.