The user mart answers the question that session-level analysis can’t: what does a user’s complete journey across all their sessions look like? First acquisition channel, lifetime revenue, recency, session count — these require aggregating session data to user grain and resolving the identity problem that comes with GA4’s device-bound identifiers.
The Identity Problem
GA4’s user_pseudo_id is a device-level identifier — a first-party cookie tied to the browser or app instance. The same person using their phone and laptop creates two user_pseudo_id values. Sessions from different devices are separate in GA4’s data model.
When users authenticate (login, purchase, signup), GA4 can record a user_id alongside the user_pseudo_id. This authenticated identifier provides the bridge between devices. A user who browses on mobile and purchases on desktop appears as two anonymous device IDs until authentication links them.
The user mart handles two scenarios:
- Unauthenticated users: Only
user_pseudo_id. Each device is a separate “user” in the mart. - Authenticated users: A
user_idcan be stitched across multipleuser_pseudo_idvalues.
The Model
-- models/marts/ga4/mrt__analytics__users.sql
{{ config( materialized='table', cluster_by=['user__first_seen_at'] )}}
WITH user_sessions AS (
SELECT user__pseudo_id, user__id, session__key, event__date, session__started_at, session__channel_grouping, device__category, geo__country, session__has_purchase, session__revenue FROM {{ ref('mrt__analytics__sessions') }}
),
user_first_last AS (
SELECT user__pseudo_id,
-- First seen MIN(event__date) AS user__first_seen_at, FIRST_VALUE(session__channel_grouping) OVER ( PARTITION BY user__pseudo_id ORDER BY event__date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS user__first_channel, FIRST_VALUE(device__category) OVER ( PARTITION BY user__pseudo_id ORDER BY event__date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS user__first_device,
-- Last seen MAX(event__date) AS user__last_seen_at, LAST_VALUE(session__channel_grouping) OVER ( PARTITION BY user__pseudo_id ORDER BY event__date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS user__last_channel,
-- Aggregates COUNT(DISTINCT session__key) AS user__sessions, MAX(session__has_purchase) AS user__has_purchased, SUM(session__revenue) AS user__lifetime_revenue
FROM user_sessions GROUP BY user__pseudo_id
),
-- Stitch authenticated identityuser_identity AS (
SELECT DISTINCT user__pseudo_id, FIRST_VALUE(user__id IGNORE NULLS) OVER ( PARTITION BY user__pseudo_id ORDER BY event__date DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS user__stitched_id FROM user_sessions WHERE user__id IS NOT NULL
)
SELECT u.user__pseudo_id, i.user__stitched_id AS user__id, u.user__first_seen_at, u.user__last_seen_at, u.user__first_channel, u.user__first_device, u.user__last_channel, u.user__sessions, u.user__has_purchased, u.user__lifetime_revenue,
-- Derived metrics DATE_DIFF(u.user__last_seen_at, u.user__first_seen_at, DAY) AS user__lifespan_days, DATE_DIFF(CURRENT_DATE(), u.user__last_seen_at, DAY) AS user__days_since_last_visit
FROM user_first_last uLEFT JOIN user_identity i ON u.user__pseudo_id = i.user__pseudo_idThe Identity Stitching Logic
The user_identity CTE implements a simple form of user backstitching:
FIRST_VALUE(user__id IGNORE NULLS) OVER ( PARTITION BY user__pseudo_id ORDER BY event__date DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS user__stitched_idORDER BY event__date DESC with FIRST_VALUE means “take the most recent user_id associated with this user_pseudo_id.” IGNORE NULLS skips sessions where the user wasn’t authenticated. The DISTINCT in the outer query deduplicates the result to one row per user_pseudo_id.
This approach handles the case where a user authenticates with a user_id partway through their history — all their sessions, even earlier anonymous ones, get the stitched identity.
The LEFT JOIN back to user_first_last ensures users who never authenticated still appear in the mart, with a null user__id. You’re not losing unauthenticated users — they just don’t have a stitched identifier.
First and Last Touch Attribution
The window functions on user__first_channel and user__last_channel provide the two most common attribution questions at user grain:
user__first_channel: What channel originally acquired this user? Useful for acquisition cohort analysis — how do users from Paid Search compare to Organic over their lifetime?user__last_channel: What channel drove the most recent visit? Useful for re-engagement analysis.
Both use FIRST_VALUE and LAST_VALUE respectively, with explicit full-partition framing. The pattern mirrors the session-level landing/exit page logic from the sessionized model.
Note: these are first-touch and last-touch at the session level — the first/last session channel, not the first/last event UTM parameter. For finer-grained attribution, you’d need to look at event-level sources from the sessionized table.
Derived Metrics
Two derived columns make the mart immediately useful for retention analysis:
DATE_DIFF(u.user__last_seen_at, u.user__first_seen_at, DAY) AS user__lifespan_daysUsers with user__lifespan_days = 0 are one-session users — they visited once and never returned. High lifespan with low session count suggests infrequent but loyal users. High lifespan with high session count is your engaged core.
DATE_DIFF(CURRENT_DATE(), u.user__last_seen_at, DAY) AS user__days_since_last_visitThis is recency in an RFM (Recency, Frequency, Monetary) framework. Combined with user__sessions (frequency) and user__lifetime_revenue (monetary), the mart has the three components for RFM segmentation without any additional joins.
Limitations
One mart per user_pseudo_id, not per person. The mart groups by user_pseudo_id, which is device-bound. A single person using three devices appears as three rows unless they authenticated on all three. Cross-device identity resolution beyond user_id stitching requires probabilistic matching or first-party data not available in GA4 exports.
The mart snapshots current state. With materialized='table', the mart rebuilds entirely on each run. Lifetime metrics are always current. If you need point-in-time snapshots (what was a user’s lifetime value on a specific date?), you’d need to add date-based incremental logic or maintain history via dbt snapshots.
No SCD tracking. User attributes like user__first_channel are computed from historical data and stable. But user__last_channel and user__days_since_last_visit change on every run. If attribution changes are important to track, implement a Type 2 SCD via dbt snapshots.