ServicesAboutNotesContact Get in touch →
EN FR
Note

Identity Resolution Monitoring

Key metrics and anomaly detection SQL for monitoring a GA4 identity stitching pipeline — stitch rate, consolidation rate, shared device exposure, and week-over-week change alerts.

Planted
ga4bigquerydbtanalyticsdata quality

Identity resolution bugs produce no pipeline errors. A tracking change that introduces the string 'null' bug causes the stitch rate to degrade or spike without any system alert. Monitoring covers two layers: daily health metrics that establish the normal operating state, and anomaly detection that surfaces changes week over week.

Daily health metrics

Build a dashboard query that tracks these four metrics daily:

WITH daily_stats AS (
SELECT
event_date,
COUNT(*) AS total_events,
COUNTIF(is_identified) AS identified_events,
COUNTIF(is_shared_device) AS shared_device_events,
COUNT(DISTINCT resolved_user_id) AS unique_users,
COUNT(DISTINCT user_pseudo_id) AS unique_devices,
COUNT(DISTINCT session_id) AS unique_sessions
FROM {{ ref('int__ga4__events_stitched') }}
WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY event_date
)
SELECT
event_date,
total_events,
unique_users,
unique_devices,
unique_sessions,
-- Stitch rate: % of events with a resolved identity
ROUND(identified_events * 100.0 / total_events, 2) AS stitch_rate_pct,
-- Consolidation rate: how much stitching reduces your apparent user count
ROUND((unique_devices - unique_users) * 100.0 / unique_devices, 2) AS consolidation_rate_pct,
-- Shared device exposure: events that may have misattributed identity
ROUND(shared_device_events * 100.0 / total_events, 2) AS shared_device_pct
FROM daily_stats
ORDER BY event_date DESC

Stitch rate measures what percentage of events carry a resolved user_id. Your baseline depends on how much of your traffic is authenticated. For a SaaS product where most sessions require login, expect 60-80%. For a content site with optional authentication, expect 10-30%. What matters is consistency — a sudden drop signals that user_id implementation broke somewhere; a sudden spike signals the string null bug.

Consolidation rate tells you how much stitching reduces the raw device count. If you have 10,000 unique devices and 8,000 unique resolved users, you’ve consolidated 20% of devices into known users. This metric should grow slowly as more devices authenticate. A sudden large change indicates either a data quality issue or a significant change in user behavior.

Shared device percentage tells you how much of your data carries the risk of misattribution. For most consumer sites this is small (under 2%). For B2B tools used in shared environments or retail applications, it may be higher.

Anomaly detection

Set up alerts for week-over-week changes that exceed your tolerance:

-- Alert: stitch rate dropped more than 10% week-over-week
WITH weekly AS (
SELECT
DATE_TRUNC(event_date, WEEK) AS week,
COUNTIF(is_identified) * 100.0 / COUNT(*) AS stitch_rate
FROM {{ ref('int__ga4__events_stitched') }}
WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 8 WEEK)
GROUP BY 1
)
SELECT
week,
stitch_rate,
LAG(stitch_rate) OVER (ORDER BY week) AS prev_week_rate,
stitch_rate - LAG(stitch_rate) OVER (ORDER BY week) AS week_over_week_change
FROM weekly
ORDER BY week DESC

Run this as a dbt test with severity warn when the absolute change exceeds 5 points and error when it exceeds 10 points. A 10-point stitch rate drop in a week is almost always an implementation issue.

Data quality checks on the identity mapping

Two checks catch problems in the mapping table before they propagate:

-- Check 1: Devices with implausibly many user_id values
SELECT
'Excessive user_ids per device' AS issue,
COUNT(*) AS affected_devices
FROM {{ ref('int__ga4__identity_mapping') }}
WHERE user_id_count > 50
UNION ALL
-- Check 2: Suspicious high-cardinality user_ids in source data
SELECT
'Suspiciously common user_id' AS issue,
COUNT(DISTINCT user_pseudo_id) AS affected_devices
FROM {{ source('ga4', 'events') }}
WHERE user_id IN (
SELECT user_id
FROM {{ source('ga4', 'events') }}
WHERE user_id IS NOT NULL
GROUP BY user_id
HAVING COUNT(DISTINCT user_pseudo_id) > 500
)

The first check fires when the dbt schema test on user_id_count doesn’t catch it (the test uses an accepted_range of 100; this check looks for the pattern at a lower threshold to catch gradual accumulation). The second check scans source data for the string null pattern before it enters the pipeline.

What the metrics reveal

Stitch rate drops sharplyuser_id stopped being sent on login pages. Check for a recent deployment that broke the gtag or dataLayer push. Also check whether a new authentication flow is missing the user_id call.

Stitch rate spikes sharply → The string null bug appeared. Someone is now sending 'null' or another garbage value. Query for user_id values with implausibly high device counts.

Consolidation rate drops → More devices are mapping to unique users rather than consolidating. Could be new devices from an expanding user base (good) or cookie reset issues reducing recognized devices (bad). Check your fragmentation rate query alongside this.

Shared device percentage spikes → More devices are mapping to multiple user_id values. Could be a legitimate business reason (new retail kiosk deployment, shared team accounts) or a tracking bug affecting how user_id is set on logout.

User count diverges from device count in unexpected direction → If unique users exceeds unique devices, your session ID construction has a bug that’s creating session collisions. A session should never contain events from more than one device. Run the session key validation query to diagnose.

Connecting to your broader data quality infrastructure

These metrics belong in your dbt testing infrastructure alongside your other data quality checks. The stitch rate anomaly detection can run as a dbt test using dbt-utils expression_is_true or as a custom generic test:

# In your schema.yml
models:
- name: int__ga4__events_stitched
tests:
- dbt_utils.expression_is_true:
expression: "stitch_rate_pct > 0"
condition: "event_date = CURRENT_DATE() - 1"

For teams using Elementary, the elementary.test_anomalies test applied to stitch_rate_pct in your monitoring mart gives you automatic detection without writing threshold logic yourself.

The goal is to catch identity resolution degradation within 24-48 hours, before it corrupts downstream attribution models, LTV calculations, or Customer 360 tables that consume the stitched data.