ServicesAboutNotesContact Get in touch →
EN FR
Note

Privacy Constraints for Linked Analytics Data

GDPR and CNIL implications when linking GA4 cookie identifiers to CRM contact records — consent exemption loss, right to deletion cascades, and the architectural requirements for compliant Customer 360 models.

Planted
ga4bigquerydbtanalyticsdata engineering

For teams operating in the EU — or processing data from EU residents — the privacy implications of linking analytics data to CRM records shape the architecture. GDPR compliance affects what can be built, how it is materialized, and which signals must be checked before processing.

GDPR is explicit about this. Recital 30 of the regulation specifically names online identifiers — including cookies — as personal data when they can be used to identify a natural person. A user_pseudo_id from GA4 is a cookie-based identifier. A hashed email is pseudonymized, not anonymous, and remains subject to GDPR requirements.

The moment you link a user_pseudo_id to a CRM contact record, you’ve created a processing activity that connects browsing behavior to an identified person. You’ve moved from “anonymous analytics” to “behavioral profiling of identified individuals.” These are fundamentally different categories under GDPR, with different legal bases, different documentation requirements, and different risk profiles.

For teams subject to CNIL guidelines (the French data protection authority), the implications are even more specific.

CNIL grants an exemption from prior cookie consent for analytics cookies, but only if the data generates anonymous statistical output and is not combined with other processing. Many European sites rely on this exemption to collect GA4 data without a consent banner blocking analytics. The exemption allows analytics-only cookies because the data stays anonymous — aggregate traffic statistics that can’t identify individuals.

If you link analytics cookies with CRM data, the consent exemption does not apply. Prior consent becomes mandatory. The analytics data is no longer anonymous — it’s associated with a known person. Every page view, every session, every scroll event becomes part of that person’s behavioral profile.

This is not a theoretical risk. CNIL issued combined fines exceeding 139 million euros between December 2022 and December 2024 for cookie-related violations. The enforcement is active and the penalties are material.

The practical implementation chain connects your consent management to your data models:

CMP (Consent Management Platform)
→ Google Consent Mode
→ GA4 cookie behavior
→ BigQuery export: privacy_info.analytics_storage
→ dbt base models: filter on consent status
→ Identity resolution: only consented events participate

Your Consent Management Platform feeds consent signals to Google Consent Mode, which tells GA4 whether to set cookies. The BigQuery export includes privacy_info.analytics_storage, which reflects the user’s consent status at the time of the event.

Your dbt base models should filter on this field when building Customer 360 models that link to CRM data:

-- In base__ga4__events or a downstream identity model
WHERE privacy_info.analytics_storage = 'Yes'

Events where analytics_storage is not granted should not participate in identity resolution. The user did not consent to having their browsing behavior linked to their identity. Processing those events through the identity bridge violates the consent the user gave (or withheld).

With Consent Mode v2 in Advanced mode, GA4 sends cookieless pings even when consent is denied. These pings appear in the BigQuery export but with limited identifiers — no user_pseudo_id cookie value, no ga_session_id. They’re designed for Google’s behavioral modeling in the GA4 interface, not for warehouse-level identity resolution.

Do not attempt to link these cookieless events to CRM data. They lack the identifiers needed for deterministic matching, and using behavioral signals to attempt probabilistic matching on denied-consent events would be a compliance violation.

Right to deletion changes your architecture

When a customer exercises their right to deletion under GDPR (Article 17), you need to cascade that deletion across every table in your identity graph. This means:

  • The identity mapping that links their user_pseudo_id to their CRM contact must be removed
  • All downstream models that contain their PII — the Customer 360 row, attributed touchpoints, session histories — must either remove or anonymize the records
  • Incremental models that contain historical data with their identifiers need to be rebuilt

A deletion_requests pattern

One approach is a deletion_requests table that your dbt models check during each run:

-- seed or source: deletion_requests
-- Columns: contact_id, requested_at, processed_at
-- In your Customer 360 model:
SELECT
c.*,
CASE
WHEN dr.contact_id IS NOT NULL THEN NULL
ELSE c.customer__email
END AS customer__email,
CASE
WHEN dr.contact_id IS NOT NULL THEN NULL
ELSE c.customer__name
END AS customer__name,
-- ... null out all PII fields
dr.contact_id IS NOT NULL AS customer__is_deleted
FROM {{ ref('int__customer__enriched') }} c
LEFT JOIN {{ ref('deletion_requests') }} dr
ON c.customer_id = dr.contact_id
AND dr.processed_at IS NULL

This NULLs out PII fields in downstream models when a matching deletion request exists. It preserves non-PII aggregate data (deal counts, revenue figures) for business reporting while removing identifying information.

For incremental models, deletion is harder. A record NULLed in today’s run might still exist with PII in yesterday’s partition. Options:

  • Full refresh affected models after processing deletions (simplest, most expensive)
  • Merge with deletion using the unique_key — the incremental run overwrites the old record with the anonymized version
  • Partition-level rebuild targeting only the date ranges where the affected user has data

CNIL data retention limits

CNIL specifies a maximum cookie lifespan of 13 months and data retention of 25 months for analytics data. Your identity mapping should respect these limits:

-- Filter identity mappings to the retention window
WHERE identified_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 25 MONTH)

Identities older than 25 months should age out of your identity bridge, and the associated browsing data should be anonymized or deleted. This prevents your Customer 360 from accumulating historical browsing profiles that exceed the retention period.

Architectural recommendations

Separate consented and unconsented paths

Don’t filter consent at the mart layer. Build the consent filter into the identity resolution layer so that non-consented events never enter the identity graph:

base__ga4__events (all events, consent status as a column)
├── int__ga4__analytics_only (unconsented, anonymous aggregates only)
└── int__ga4__consented_events (consented events only)
→ int__identity_resolved (links to CRM)
→ mrt__core__customer_360

This makes the consent boundary visible in the DAG. Anyone reviewing the project can see exactly where personal data linkage begins and which models participate.

Document the processing activity

GDPR requires maintaining a Record of Processing Activities (ROPA). Your Customer 360 model is a processing activity. Document:

  • Purpose: Marketing attribution, customer analytics, revenue analysis
  • Legal basis: Consent (for cookie-to-CRM linkage in EEA)
  • Data categories: Browsing behavior, contact details, transaction history
  • Retention period: Per CNIL guidelines or your own data retention policy
  • Recipients: Which teams and systems access the Customer 360 data

This documentation lives outside your dbt project (in your DPO’s records), but reference it in your model descriptions so developers understand the compliance context.

The business tradeoff

Consent requirements mean some visitors will never appear in your Customer 360. In markets with strict enforcement (France, Germany, Italy), consent rates for analytics cookies typically range from 40% to 70%. That means 30-60% of your web traffic is invisible to identity resolution.

The Customer 360 covers only customers who consented to the data processing that makes it possible. A compliant partial Customer 360 is preferable to a non-compliant comprehensive one.

For the non-consented portion, you can still build anonymous aggregate analytics — traffic patterns, page performance, conversion funnels — as long as the output remains statistical and cannot identify individuals. These aggregate insights complement the identified Customer 360 without crossing consent boundaries.