ServicesAboutNotesContact Get in touch →
EN FR
Note

GA4 Consent Mode Orphaned Events

How Consent Mode creates rows in GA4 BigQuery exports with null user_pseudo_id and session identifiers — what they are, how they affect counts, and same-page backstitching behavior.

Planted
ga4bigqueryanalyticsdata quality

When users deny analytics_storage consent under Google’s Advanced Consent Mode, GA4 continues to collect events as “cookieless pings.” These events appear in your BigQuery export but with two critical fields stripped: user_pseudo_id is null and ga_session_id (in event_params) is null.

The result is events with no identity and no session association — orphaned rows that can’t be attributed to users or grouped into sessions.

What Orphaned Events Look Like

A standard GA4 event row has:

  • user_pseudo_id: a device identifier like 1234567.8901234
  • ga_session_id in event_params.int_value: a Unix timestamp like 1706400000

An orphaned cookieless ping has:

  • user_pseudo_id: NULL
  • ga_session_id in event_params: the key doesn’t exist or returns null

These rows exist in your export. They inflate your raw event count while being useless for any user-level or session-level analysis. If your event counting queries don’t filter them out, you’re overcounting events relative to your attributed analysis.

The privacy_info RECORD tells you the consent state of each row:

privacy_info.analytics_storage -- 'Yes', 'No', or 'Unset'
privacy_info.uses_transient_token -- TRUE if using cookieless measurement

Filter or segment on these fields to separate consented from non-consented events:

-- Count only consented events
SELECT COUNT(*) AS consented_events
FROM `project.analytics_123456789.events_*`
WHERE _TABLE_SUFFIX = '20260127'
AND privacy_info.analytics_storage = 'Yes'
-- Count all events including orphaned pings
SELECT COUNT(*) AS total_events
FROM `project.analytics_123456789.events_*`
WHERE _TABLE_SUFFIX = '20260127'

The difference between these two numbers is your consent-rejected traffic volume in raw events.

Impact on Session Models

Orphaned events are the primary reason null session key filtering belongs in the earliest possible model layer. A null ga_session_id makes session key construction produce a null or malformed result, which then breaks every window function that partitions by session key.

The standard filter in your base model:

WHERE (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') IS NOT NULL
AND user_pseudo_id IS NOT NULL

This excludes orphaned consent events entirely from your sessionized models. They’re not recoverable — there’s no identifier to group them by — and including them corrupts session-level calculations.

Same-Page Backstitching

GA4 implements a specific behavior when a user denies consent but then grants it later in the same page session: the system “backstitches” consent to previously denied hits on that page.

You’ll observe this as user_pseudo_id becoming populated on events that, in an earlier state of processing, had null identifiers. This is a positive signal — more events become attributable — but it creates a pipeline timing consideration.

If you’re running incremental models with a tight lookback window, same-day backstitching can mean today’s processing run sees different data than tomorrow’s processing run for the same events. A one-day lookback is insufficient; GA4 recommends the 72-hour late data window, and for consent-related data, this is one of the strongest reasons to honor that window.

Configure your incremental model accordingly:

{{ config(
materialized='incremental',
incremental_strategy='insert_overwrite',
partition_by={'field': 'event_date', 'data_type': 'date'}
) }}
{% if is_incremental() %}
WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
{% endif %}

A 7-day lookback covers late-arriving data from any source, including consent backstitching.

Why Your BigQuery Numbers Are Lower Than the GA4 Interface

This is one of the primary causes of the systematic discrepancy between BigQuery and GA4 interface numbers, particularly for European properties with strict consent banners.

The GA4 interface includes modeled data from Consent Mode. When users deny consent, GA4 uses machine learning to estimate their likely behavior based on patterns from consenting users. These modeled conversions and sessions appear in the interface’s reports.

This modeled data never appears in BigQuery. Your BigQuery export contains only observed events from consenting users (plus orphaned pings from denied users, which you filter out).

The gap is proportional to your consent rejection rate. A European e-commerce site with aggressive consent management might see 35% consent rejection. The GA4 interface reports 35% more modeled sessions than BigQuery can account for. This isn’t an error in either system — it’s the designed behavior.

There is no way to reconstruct GA4’s behavioral model from BigQuery data. Document this expectation for stakeholders: “BigQuery reports consented traffic only. The GA4 interface includes modeled estimates for non-consenting users. The difference reflects your consent rejection rate, not a data quality issue.”

The orphaned event behavior only occurs under Advanced Consent Mode. Under Basic Consent Mode, GA4 doesn’t fire any events when consent is denied — tags don’t fire at all until consent is granted.

With Basic Mode, your BigQuery export is cleaner: no orphaned rows, no null identifiers. The trade-off is that you lose the signal about denied-consent traffic volume entirely. Advanced Mode gives you the orphaned rows (which are analytically useless for sessionization but tell you consent was denied), plus the ability for GA4 to model that traffic in the interface.

Most modern implementations use Advanced Mode, which means most GA4 BigQuery exports have orphaned events. The null-filter at the base model layer is standard practice, not an edge case.