The most common mistake in GA4 sessionization is using ga_session_id alone as the session identifier. This field is stored in the event_params array as an integer value, and it represents the Unix timestamp (in seconds) when the session started. The problem: multiple users can start sessions at the exact same second.
With any reasonable traffic volume, timestamp collisions are guaranteed. Using only ga_session_id groups events from different users into the same “session,” corrupting every downstream metric. Attribution becomes wrong when a session contains events from users who arrived via different channels. Session duration inflates. Conversion rates distort.
The Correct Pattern
The fix is mechanical: concatenate user_pseudo_id with ga_session_id to create a truly unique session key.
CONCAT( user_pseudo_id, '.', CAST( (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING )) AS session_keyThis session_key becomes the foundation for all window functions in the sessionized model. Every session-scoped calculation — timestamps, attribution, event ordering — partitions by this field.
The dot separator is a convention, not a requirement. Some teams use underscores or hyphens. What matters is that the concatenation produces a value that’s unique across all users and sessions.
The int_value Trap
ga_session_id lives in value.int_value, not value.string_value. This trips up people who reflexively extract event parameters as strings. Extracting from string_value returns nulls, and those nulls cascade through your session key construction, producing either null session keys (which get filtered out) or sessions where all null-key events merge into one giant “session.”
-- WRONG: Returns NULL(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
-- CORRECT: Returns the integer session ID(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')Always cast to STRING after extraction for the concatenation.
Edge Cases That Produce Null Sessions
Events can arrive without a ga_session_id in several scenarios:
Consent rejection. When users reject tracking consent, both ga_session_id and user_pseudo_id become null. These events carry no session identity and must be filtered out of your sessionized model. They’re not recoverable — there’s no identifier to group them by.
Measurement Protocol hits. Server-side events sent via Measurement Protocol may lack session context if the implementation doesn’t pass session_id and client_id. This is an implementation gap, not a GA4 limitation.
Subproperty filtering. GA4 subproperties can filter out session_start events while keeping other events. The session ID still propagates to non-session_start events, but testing for edge cases in your specific property configuration is important.
Filter these events in your base model to prevent null session keys from propagating downstream:
WHERE (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') IS NOT NULLThis filter belongs in the earliest possible model. Null session keys break every window function that partitions by session_key, producing meaningless results rather than errors.
Cross-Device Sessions
The user_pseudo_id is device-specific. It’s a first-party cookie identifier tied to the browser or app instance where it was set. The same person browsing on their phone and later purchasing on their laptop creates two separate pseudo IDs and therefore two separate sessions.
These sessions can’t be linked at the sessionization layer. Connecting them requires identity resolution — matching anonymous device identifiers to a known user identity when authentication events occur. That’s a separate problem from sessionization and typically involves a user stitching model that maps user_pseudo_id values to user_id values.
Until a user authenticates on both devices, their sessions remain separate. This is a fundamental limitation of device-bound identifiers, not a flaw in the session key construction.
Validating Your Session Key
A quick validation query confirms your session key is working correctly:
SELECT session_key, COUNT(DISTINCT user_pseudo_id) AS distinct_usersFROM sessionized_eventsGROUP BY session_keyHAVING COUNT(DISTINCT user_pseudo_id) > 1This should return zero rows. If it returns results, your session key is allowing events from multiple users to share a session — likely because you’re using ga_session_id alone or your concatenation logic has a bug.
A second check confirms sessions have reasonable characteristics:
SELECT APPROX_QUANTILES(event_count, 100)[OFFSET(50)] AS median_events, APPROX_QUANTILES(event_count, 100)[OFFSET(95)] AS p95_events, APPROX_QUANTILES(duration_seconds, 100)[OFFSET(50)] AS median_duration, APPROX_QUANTILES(duration_seconds, 100)[OFFSET(95)] AS p95_durationFROM ( SELECT session_key, COUNT(*) AS event_count, (MAX(event_timestamp) - MIN(event_timestamp)) / 1000000 AS duration_seconds FROM sessionized_events GROUP BY session_key)Unreasonably high event counts or durations suggest session key collisions or missing filters. A session with 10,000 events spanning 24 hours is almost certainly multiple sessions merged together.
Relationship to Custom Sessionization
The composite key described here works with GA4’s built-in session definition (30-minute inactivity timeout). If you need custom session boundaries — different timeouts, campaign-based splits — you’ll build your own session identifiers using the LAG + running SUM pattern. In that case, ga_session_id becomes just a reference point, and your custom session key replaces the composite key for all downstream analysis.
For most GA4 analytics work, the composite user_pseudo_id.ga_session_id key is sufficient. Custom sessionization is only necessary when the GA4 session definition doesn’t match business requirements.