ServicesAboutNotesContact Get in touch →
EN FR
Note

GA4 Channel Grouping Macro

A dbt macro that encapsulates Google's default channel grouping logic as reusable SQL, with the regex patterns and edge cases you need to know.

Planted
ga4dbtbigqueryanalyticsdata modeling

Channel grouping translates raw UTM source/medium combinations into business-readable categories: Organic Search, Paid Social, Direct, Email. GA4’s interface applies these groupings automatically using Google’s default rules. In a dbt project, you need to replicate this logic — and a macro is the right abstraction.

Why a Macro

Channel grouping is needed in at least two places: the sessionized events intermediate model and potentially any mart that includes channel attribution. Repeating the CASE statement would mean multiple maintenance points. A macro calls the logic once and generates the SQL wherever referenced.

The macro takes column references (not values) as arguments, which is the standard dbt pattern for SQL-generating macros:

-- macros/ga4/default_channel_grouping.sql
{% macro default_channel_grouping(source_column, medium_column) %}
CASE
-- Direct
WHEN {{ source_column }} IS NULL
OR {{ source_column }} = '(direct)'
OR {{ source_column }} = '(none)'
THEN 'Direct'
-- Paid Search
WHEN REGEXP_CONTAINS({{ medium_column }}, r'^(cpc|ppc|paidsearch)$')
OR ({{ source_column }} = 'google' AND {{ medium_column }} = 'sem')
THEN 'Paid Search'
-- Paid Social
WHEN REGEXP_CONTAINS({{ medium_column }}, r'^(paidsocial|paid-social|paid_social)$')
OR (REGEXP_CONTAINS({{ source_column }}, r'facebook|instagram|linkedin|twitter|tiktok')
AND REGEXP_CONTAINS({{ medium_column }}, r'^(cpc|ppc|paid)'))
THEN 'Paid Social'
-- Display
WHEN REGEXP_CONTAINS({{ medium_column }}, r'^(display|cpm|banner)$')
THEN 'Display'
-- Organic Search
WHEN {{ medium_column }} = 'organic'
OR REGEXP_CONTAINS({{ source_column }}, r'google|bing|yahoo|duckduckgo|baidu')
AND {{ medium_column }} = 'organic'
THEN 'Organic Search'
-- Organic Social
WHEN REGEXP_CONTAINS({{ source_column }}, r'facebook|instagram|linkedin|twitter|tiktok|pinterest|youtube')
AND ({{ medium_column }} IS NULL OR {{ medium_column }} IN ('social', 'referral', '(none)'))
THEN 'Organic Social'
-- Email
WHEN REGEXP_CONTAINS({{ medium_column }}, r'^(email|e-mail|e_mail)$')
THEN 'Email'
-- Affiliate
WHEN REGEXP_CONTAINS({{ medium_column }}, r'^(affiliate|affiliates)$')
THEN 'Affiliates'
-- Referral
WHEN {{ medium_column }} = 'referral'
THEN 'Referral'
-- Other
ELSE 'Other'
END
{% endmacro %}

Usage in the sessionized model:

{{ default_channel_grouping('session__source_final', 'session__medium_final') }}
AS session__channel_grouping

The Case Order Matters

The CASE statement is evaluated top to bottom, first match wins. The ordering here encodes priority decisions:

  1. Direct first — Null source is always Direct, before any medium-based logic
  2. Paid before organiccpc medium gets classified as Paid Search before the Organic Search rule can catch a Google source
  3. Paid Social medium first, then source + medium — Teams using paidsocial as medium get classified directly; teams using cpc medium with social network sources get caught by the second condition
  4. Organic Social checks source + non-paid medium — Catches facebook/instagram with social/referral medium, but not if they have a paid medium (already caught above)
  5. Referral last — Falls through after all other channel classifications

Changing this order changes the output. If you put Referral before Organic Social, some social referral traffic will be classified as Referral instead.

Where Channel Grouping Diverges from GA4’s Interface

Your macro will not match GA4’s interface exactly. The interface uses Google’s full channel grouping ruleset which is more complex and includes platform-specific logic Google doesn’t publish completely. Expect some differences in the ‘Other’ bucket.

The most common divergence points:

Branded paid search vs non-branded. Google Ads can tag campaigns with branded or non-branded signals, and the interface may distinguish these as separate channels. Your macro classifies both as Paid Search.

Cross-network and Performance Max. Newer campaign types may use medium values not covered by the standard regex patterns. When you see traffic landing in ‘Other’ that you know is paid, check the raw source/medium values and add matching conditions.

YouTube. YouTube can appear as both Organic Social (organic YouTube views) and Paid Video (paid YouTube ads with video medium). The regex catches YouTube in Organic Social; make sure your paid YouTube campaigns use a medium like cpc or paid to be caught by the Paid Social or Display rules first.

Aligning with Your UTM Conventions

The macro encodes assumptions about your UTM taxonomy. If your team uses paid-social as a medium convention for social ads, the regex r'^(paidsocial|paid-social|paid_social)$' covers it. If you use something non-standard like social_paid, it’ll fall through to ‘Other’.

Before deploying, run a validation query:

SELECT
session__source,
session__medium,
session__channel_grouping,
COUNT(*) AS sessions
FROM mrt__analytics__sessions
WHERE session__channel_grouping = 'Other'
GROUP BY 1, 2, 3
ORDER BY 4 DESC
LIMIT 50

Any significant session volumes in ‘Other’ with recognizable sources deserve investigation. Either your UTM naming doesn’t match the expected patterns (a naming convention problem) or the macro needs an additional condition.

Testing the Output

Add an accepted_values test to the column in your YAML:

- name: session__channel_grouping
tests:
- accepted_values:
values:
- 'Direct'
- 'Organic Search'
- 'Paid Search'
- 'Paid Social'
- 'Organic Social'
- 'Email'
- 'Referral'
- 'Display'
- 'Affiliates'
- 'Other'

This test will warn (or error, depending on severity) if the macro generates an unexpected channel name. Useful as a guard against future modifications that introduce a typo or new channel category without updating the accepted values list.