GA4’s BigQuery export ships raw event-level data: user_pseudo_id (the device cookie) and user_id (your business identifier, when you’ve set it). When a user browses anonymously then logs in on a different device days later, GA4 treats them as separate people. Building your own stitching layer in BigQuery is the only way to recover the connected picture.
The appropriate technique depends on whether backward-in-time resolution is needed, how conservative the false-match tolerance is, and what downstream analysis will consume the stitched data.
The four techniques
Last-touch attribution (window function approach)
Last-touch assigns the most recent known user_id to all historical events from the same user_pseudo_id. This handles the typical flow: anonymous browsing followed by a login event.
WITH events_with_last_touch AS ( SELECT event_date, user_pseudo_id, user_id, event_name, event_timestamp, LAST_VALUE(user_id IGNORE NULLS) OVER ( PARTITION BY user_pseudo_id ORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS stitched_user_id FROM `project.analytics_XXXXX.events_*` WHERE _table_suffix BETWEEN '20250101' AND '20250131')
SELECT event_date, user_pseudo_id, COALESCE(stitched_user_id, user_pseudo_id) AS resolved_user_id, event_name, TIMESTAMP_MICROS(event_timestamp) AS event_timeFROM events_with_last_touchThe IGNORE NULLS clause is essential. Without it, LAST_VALUE returns the current row’s value — null whenever user_id is absent. The window frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ensures you’re looking at all prior events in the partition.
The critical limitation: this only stitches events before the identification moment, within the date range you’re querying. If a user authenticated in December and you’re querying January, their January anonymous events won’t get stitched unless December data is in the same query. For true cross-session backstitching, you need the mapping table approach.
Full backstitching via mapping table
For complete identity resolution — applying a user_id to all events from a device regardless of when authentication occurred — build a mapping table first, then join it to events. This is what GA4 User Backstitching covers in depth.
The pattern:
- Extract all
(user_pseudo_id, user_id)pairs observed in your data whereuser_id IS NOT NULL - Resolve conflicts (multiple
user_idvalues per device) with a consistent decision rule - Join the resolved mapping back to your full events table
The result: every event from a device that ever authenticated carries the resolved user_id, regardless of whether that event happened before or after the login. Past, present, and future sessions all get connected.
First-touch attribution
First-touch uses FIRST_VALUE instead of LAST_VALUE, assigning the earliest known user_id to all events. The window frame must span the entire partition:
FIRST_VALUE(user_id IGNORE NULLS) OVER ( PARTITION BY user_pseudo_id ORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_known_user_idUse first-touch when you’re analyzing acquisition: you want to attribute a user to their original authenticated identity, not a later account change. If someone changes their email (and therefore their user_id) six months in, first-touch ties all their events to the original identity, which is usually more useful for cohort analysis.
Last-touch is better when current identity matters — personalization, support case context, current account status.
Session-scoped stitching
Session-scoped stitching limits identity resolution to events within the same session. It’s the most conservative approach and a good starting point for validating your logic before committing to full backstitching.
WITH session_events AS ( SELECT *, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id FROM `project.analytics_XXXXX.events_*`),
stitched AS ( SELECT *, FIRST_VALUE(user_id IGNORE NULLS) OVER ( PARTITION BY user_pseudo_id, ga_session_id ORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS session_user_id FROM session_events)
SELECT *, COALESCE(session_user_id, user_pseudo_id) AS resolved_user_id, CONCAT( COALESCE(session_user_id, user_pseudo_id), '_', CAST(ga_session_id AS STRING) ) AS unique_session_idFROM stitchedThis handles the case where a user logs in partway through a session — the login event provides the user_id, and it’s applied to all events in that session including those that came before. Note the unique_session_id construction: ga_session_id alone is not globally unique (it’s a Unix timestamp), so it must always be combined with the user identifier. See GA4 Session Key Construction for why this matters.
Choosing the right technique
| Technique | Scope | Risk level | Best for |
|---|---|---|---|
| Last-touch (window) | Forward-looking within query window | Low | Basic analytics, recent behavior, ad hoc analysis |
| First-touch | All events | Medium | Acquisition attribution, cohort analysis |
| Full backstitch | All events | Higher | Cross-device analysis, LTV calculations, Customer 360 |
| Session-scoped | Single session | Lowest | Testing stitching logic, conservative reporting |
Start with session-scoped stitching to validate your SQL and check for shared device issues. Once you’ve confirmed data quality, move to full backstitching for production.
The risk rating reflects data integrity risk, not implementation complexity. Full backstitching is “higher risk” not because it’s harder to write, but because an incorrect mapping table (from logout bugs, shared devices, or garbage user_id values) silently corrupts every downstream model that uses the stitched identity. The user_id data quality checks belong before you build the mapping table, not after.
The COALESCE fallback pattern
All four techniques share a common finishing step: when stitching fails (no user_id was ever observed for a device), fall back to user_pseudo_id as the identity:
COALESCE(resolved_user_id, user_pseudo_id) AS final_user_idThis ensures every event has a non-null identity. Anonymous users remain separate identities — tracked by their device ID rather than a business ID — rather than being collapsed into a single null group. Don’t force probabilistic matching to close this gap; the false positives from fingerprinting with GA4’s available data (device category, browser, country) are worse than accepting a lower identification rate. See Probabilistic Matching Limitations in GA4 for why.
Where stitching lives in the dbt DAG
All four techniques produce a resolved_user_id column that replaces raw user_id in downstream models. In a production dbt pipeline, the stitching layer sits between your base GA4 events model and your session/attribution marts:
base__ga4__events ↓int__ga4__identity_mapping (the mapping table — build this first) ↓int__ga4__events_stitched (apply mapping to events) ↓int__ga4__sessionsmrt__analytics__user_journeyThe identity mapping model uses merge incremental strategy (small table, updates frequently). The stitched events model uses insert_overwrite by date partition (large table, reprocess recent dates).