A Customer 360 model unifies data about a single customer from multiple source systems into one row. The challenge isn’t the final wide table — it’s building the identity bridge that connects systems with no shared key. The standard three-layer pattern (base, intermediate, mart) needs an explicit identity resolution layer to make this work.
The DAG structure
The architecture adds an identity layer between base and mart models:
Sources: ga4.events → base__ga4__events crm.contacts → base__crm__contacts crm.deals → base__crm__deals
Identity: base__ga4 + base__crm → int__identity_resolved → int__identity_device_bridged
Marts: int__identity + base__ga4 → mrt__marketing__web_sessions mrt__web_sessions + base__deals → mrt__marketing__attributed_touchpoints all sources + identity → mrt__core__customer_360This is the same three layers, not four. The identity models are intermediate models — they join base models together and add computed fields without changing the fundamental grain (one row per identity pair). They just happen to sit in a subdirectory that makes their special purpose obvious.
Base models
Base models do the standard work: flatten the GA4 event_params (UNNEST patterns for UTMs, page locations, session IDs), rename CRM columns to business-friendly names, and filter _fivetran_deleted records. One model per source table, materialized as views or tables.
For GA4 events specifically, the base model extracts the key fields needed for identity resolution:
-- base__ga4__events.sqlSELECT event_date, event_timestamp, event_name, user_pseudo_id, user_id, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id, privacy_info.analytics_storage AS consent__analytics_storage, -- ... other fieldsFROM {{ source('ga4', 'events') }}WHERE (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') IS NOT NULLNote the privacy_info.analytics_storage extraction. This becomes critical when your models need to respect consent requirements — only events with granted consent should flow into identity resolution that links to CRM data.
The identity layer
This is the new piece that standard dbt projects don’t have. Two models handle it:
int__identity_resolved joins form submission data (where user_pseudo_id was captured alongside an email), login events (where user_id was set), and CRM contact records. The output is a mapping table: one row per known identity pair.
-- int__identity_resolved.sql{{ config( materialized='incremental', incremental_strategy='merge', unique_key=['identifier_type', 'identifier_value']) }}
WITH form_identities AS ( -- user_pseudo_id → email → crm_contact_id SELECT DISTINCT user_pseudo_id, crm.contact_id AS crm_contact_id, 'form_submission' AS identity_source, form.submitted_at AS identified_at FROM {{ ref('base__form_submissions') }} form INNER JOIN {{ ref('base__crm__contacts') }} crm ON LOWER(TRIM(form.email)) = LOWER(TRIM(crm.email)) WHERE form.user_pseudo_id IS NOT NULL),
login_identities AS ( -- user_id → user_pseudo_id, linked via backstitching SELECT DISTINCT events.user_pseudo_id, crm.contact_id AS crm_contact_id, 'login' AS identity_source, events.event_timestamp AS identified_at FROM {{ ref('base__ga4__events') }} events INNER JOIN {{ ref('base__crm__contacts') }} crm ON events.user_id = crm.external_id WHERE events.user_id IS NOT NULL),
all_identities AS ( SELECT * FROM form_identities UNION ALL SELECT * FROM login_identities)
SELECT user_pseudo_id, crm_contact_id, identity_source, identified_atFROM all_identitiesQUALIFY ROW_NUMBER() OVER ( PARTITION BY user_pseudo_id ORDER BY identified_at DESC) = 1int__identity_device_bridged fans this out so you can go from any identifier to the canonical customer_id. If you use dbt_utils.generate_surrogate_key(), hash a combination of source system and source ID:
{{ dbt_utils.generate_surrogate_key(['source_system', 'source_id']) }} AS customer__surrogate_keyThese models use the merge incremental strategy since identity mappings are relatively small (at most one row per known identity pair) but update frequently as new form submissions and logins create new mappings.
Mart models
Mart models consume the identity bridge. The key model is mrt__core__customer_360 — the wide table that combines contact information from the CRM, deal aggregates, engagement metrics, and web analytics summaries into a single row per customer.
-- mrt__core__customer_360.sql{{ config( materialized='incremental', incremental_strategy='merge', unique_key='customer_id') }}
SELECT identity.crm_contact_id AS customer_id,
-- CRM contact fields (priority-based resolution) COALESCE(crm.email, product.email) AS customer__email, COALESCE(crm.company_name, product.company_name) AS customer__company_name, crm.job_title AS customer__job_title,
-- Deal aggregates deals.total_deal_value, deals.deals_won, deals.latest_deal_closed_at,
-- Web analytics summaries web.total_sessions, web.total_page_views, web.first_seen_at, web.last_seen_at, web.days_active
FROM {{ ref('int__identity_device_bridged') }} identityLEFT JOIN {{ ref('base__crm__contacts') }} crm ON identity.crm_contact_id = crm.contact_idLEFT JOIN {{ ref('int__deal_aggregates') }} deals ON identity.crm_contact_id = deals.contact_idLEFT JOIN {{ ref('int__web_analytics_summary') }} web ON identity.user_pseudo_id = web.user_pseudo_idThe conflict resolution patterns (COALESCE for priority-based, recency comparisons, source-specific fields) all happen in this mart or in the intermediate models that feed it.
mrt__core__customer_360 also uses merge, keyed on the canonical customer_id. It’s a wide table but rarely exceeds a few million rows, so full refreshes are an option if merge costs become a concern.
Materialization choices
The different model types in the DAG have different materialization needs:
| Model | Strategy | Rationale |
|---|---|---|
base__ga4__events | insert_overwrite, 3-day lookback | GA4 events arrive late; re-processing last 3 partitions catches stragglers |
int__identity_resolved | merge on identifier pair | Small table, frequent updates, needs deduplication |
int__identity_device_bridged | merge on identifier pair | Same pattern as identity_resolved |
mrt__core__customer_360 | merge on customer_id | Wide but small; full refresh is viable fallback |
mrt__marketing__web_sessions | insert_overwrite on session date | Time-partitioned fact table |
For CRM mart tables, partition by date fields (created_at, closed_at) and cluster by frequently filtered columns (account_id, owner_id). Clustering alone can reduce scanned data by 88% or more for incremental models on BigQuery.
The JaffleGaggle reference pattern
The dbt Labs “JaffleGaggle” reference implementation follows a similar architecture: define entities (contacts, accounts), extract email domains, flag personal vs. corporate emails, use seed files for edge cases that need human judgment (like shared email domains that aren’t really corporate), then aggregate usage data per entity into wide mart tables.
The seed file pattern is worth noting. Some identity resolution edge cases — gmail.com is personal, acme.com is corporate, but freelancer.com could be either — require human judgment that doesn’t belong in SQL. A seed file with these classifications keeps the logic auditable and maintainable.
Connecting to attribution
With identity resolution in place, you can build a touchpoint timeline per user from GA4 session data (with source, medium, and campaign from the traffic source fields), then join to CRM deal conversions using the resolved_user_id from your identity bridge and a conversion window.
The mrt__marketing__attributed_touchpoints model bridges web sessions and CRM deals:
SELECT sessions.resolved_user_id, sessions.session_date, sessions.session_source, sessions.session_medium, deals.deal_id, deals.deal_amount, deals.closed_atFROM {{ ref('mrt__marketing__web_sessions') }} sessionsINNER JOIN {{ ref('int__identity_device_bridged') }} identity ON sessions.user_pseudo_id = identity.user_pseudo_idINNER JOIN {{ ref('base__crm__deals') }} deals ON identity.crm_contact_id = deals.contact_idWHERE sessions.session_date BETWEEN DATE_SUB(deals.closed_at, INTERVAL 90 DAY) AND deals.closed_atApply attribution models (first-touch, last-touch, linear, time-decay) on the touchpoint table this produces, aggregate by channel or campaign, and you get ROI by source.