Counting session_start events to measure session counts produces incorrect results. GA4’s session_start event is unreliable enough that GA4’s own reporting interface does not use it as the basis for session counts.
Why session_start Is Unreliable
The session_start event is evaluated client-side, triggered when an event includes the _ss parameter (a GA4-internal session signal). This client-side evaluation creates two distinct failure modes:
Duplicate session_start events within a single session. Under certain conditions — network interruptions, page reloads during a session, rapid navigation — GA4 fires multiple session_start events for what is logically one session. If you’re counting session_start events, you’re overcounting sessions.
Missing session_start events. In some configurations, particularly GA4 sub-properties, the session_start event may be filtered out while other session events pass through. The ga_session_id still propagates to those events (so the session still exists in the data), but there’s no corresponding session_start to count.
The sub-property case is particularly insidious: your session count from session_start will be systematically lower than the actual number of unique sessions in the data, with no obvious error signal.
The Correct Approach: Count Distinct Session IDs
GA4’s own interface uses a different approach — it estimates session counts from unique session identifiers rather than from the session_start event. Do the same in your BigQuery queries.
The session key is a composite of user_pseudo_id and ga_session_id:
-- Correct: count distinct session IDsSELECT COUNT(DISTINCT CONCAT( user_pseudo_id, CAST( (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING ) )) AS sessionsFROM `project.analytics_123456789.events_*`WHERE _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'This counts each unique combination of user and session start timestamp — the actual definition of a session — regardless of whether a session_start event fired, fired multiple times, or was filtered.
For production models, extract the session key at the base layer so downstream queries don’t need to repeat the UNNEST logic:
-- In your base events modelCONCAT( user_pseudo_id, '.', CAST( (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING )) AS session_keyThen counting sessions downstream is a simple COUNT(DISTINCT session_key).
What About session_start Events At All?
They’re not entirely useless. session_start events can serve as a signal for when a session began — filtering to these events gives you landing page, referrer, and initial traffic source in a convenient way. But never use their count as your session metric.
If you need session-level metrics (first page, landing traffic source, session start time), build them with window functions partitioned by session_key rather than relying on session_start events being present and accurate.
The Event-Grain Sessionization pattern handles this correctly: session attributes are derived from FIRST_VALUE() OVER (PARTITION BY session_key ORDER BY event_timestamp), which works regardless of whether session_start exists in the partition.
Why This Matters for Reporting Consistency
If you’re building dashboards from BigQuery and your stakeholders also look at the GA4 interface, using the distinct-session-ID approach aligns your methodology with GA4’s own. You’ll still see small discrepancies between your BigQuery numbers and the interface — those are architectural and unavoidable — but you won’t introduce an additional systematic bias from counting unreliable events.
The session count from distinct IDs is an auditable definition — “unique session identifiers” — that does not depend on any single event firing correctly.