GA4’s BigQuery export contains traffic source data in four different locations. Each has different scope and semantics. Using the wrong one produces incorrect attribution reports — and the errors are silent. Your queries return results that look reasonable but assign conversions to the wrong channels.
Understanding which source to use for which question is the prerequisite for any attribution work on GA4 data.
The Four Traffic Source Locations
traffic_source (User Scope, First-Touch)
The traffic_source struct contains user-scoped, first-touch attribution. It records the source, medium, and campaign that originally acquired the user, regardless of how many subsequent sessions they’ve had.
traffic_source.source -- e.g., 'google'traffic_source.medium -- e.g., 'organic'traffic_source.name -- campaign nameUse this for: User acquisition analysis. “Which channels bring us new users?” and “What was the first touchpoint for our highest-value customers?”
Do not use this for: Session attribution. A user acquired via paid search six months ago who returns via email today still shows google / cpc in traffic_source. That’s historically accurate but analytically misleading for session-level reporting.
This struct never appears in intraday export tables.
collected_traffic_source (Event Scope, Raw Collection)
The collected_traffic_source struct contains event-scoped raw collection data. It captures UTM parameters and click IDs exactly as GA4 collected them, without any attribution model applied.
collected_traffic_source.manual_source -- UTM sourcecollected_traffic_source.manual_medium -- UTM mediumcollected_traffic_source.manual_campaign_name -- UTM campaigncollected_traffic_source.gclid -- Google Ads click IDcollected_traffic_source.dclid -- Display & Video 360 click IDcollected_traffic_source.srsltid -- Google Merchant Center click IDUse this for: Building custom attribution models where you need raw touchpoint data. Also useful for debugging attribution discrepancies — seeing exactly what GA4 collected before it applied any modeling.
Do not use this for: Standard session attribution. The raw data requires additional logic to determine session-level values (you’d need to apply FIRST_VALUE patterns yourself).
session_traffic_source_last_click (Session Scope, July 2024+)
The session_traffic_source_last_click struct contains session-scoped attribution with GA4’s last-non-direct model applied. This is the field that matches what you see in the GA4 interface.
session_traffic_source_last_click.manual_campaign.source -- e.g., 'google'session_traffic_source_last_click.manual_campaign.medium -- e.g., 'cpc'session_traffic_source_last_click.manual_campaign.campaign_name -- e.g., 'spring_sale'Use this for: Session attribution in any analysis or dashboard. It handles last-non-direct logic automatically — if a session starts as direct traffic, GA4 looks back up to 90 days for a previous non-direct source and credits that instead.
Availability: July 2024 onward. Data before this date does not have this field populated.
event_params Legacy Keys (Inconsistent)
The event_params array sometimes contains legacy source and medium keys from earlier GA4 implementations.
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS mediumUse this for: Fallback only, when other fields are unavailable. These keys are inconsistently populated and should not be your primary attribution source.
Practical Session Attribution
For data from July 2024 forward, session_traffic_source_last_click is the clear choice for session-level attribution. Extract it using FIRST_VALUE to propagate across all events in the session:
FIRST_VALUE(session_traffic_source_last_click.manual_campaign.source IGNORE NULLS) OVER (PARTITION BY session_key ORDER BY event_timestamp) AS session_source,FIRST_VALUE(session_traffic_source_last_click.manual_campaign.medium IGNORE NULLS) OVER (PARTITION BY session_key ORDER BY event_timestamp) AS session_medium,FIRST_VALUE(session_traffic_source_last_click.manual_campaign.campaign_name IGNORE NULLS) OVER (PARTITION BY session_key ORDER BY event_timestamp) AS session_campaignThe IGNORE NULLS clause is critical. Not every event in a session carries attribution data. Without it, the first event might have a null source (if the session_start event was filtered), and that null propagates to every row.
Wrap the results in COALESCE for clean output:
COALESCE( FIRST_VALUE(session_traffic_source_last_click.manual_campaign.source IGNORE NULLS) OVER (PARTITION BY session_key ORDER BY event_timestamp), '(direct)') AS session_sourceThe Google Ads gclid Problem
A known issue affects Google Ads traffic. Sessions with gclid parameters sometimes appear as organic or direct because GA4 doesn’t always decode the click ID to source and medium in the export. If accurate paid search attribution matters, add a correction:
CASE WHEN collected_traffic_source.gclid IS NOT NULL THEN 'google' ELSE session_traffic_source_last_click.manual_campaign.sourceEND AS session_source,CASE WHEN collected_traffic_source.gclid IS NOT NULL THEN 'cpc' ELSE session_traffic_source_last_click.manual_campaign.mediumEND AS session_mediumThis override uses collected_traffic_source.gclid as a signal that the session came from a paid Google click, regardless of what the attribution fields say. For complete Google Ads campaign data (campaign name, ad group, keywords), consider joining with Google Ads Data Transfer exports using the gclid field.
Historical Data Before July 2024
For data before July 2024, session_traffic_source_last_click is empty. You have two options:
-
Build session attribution from
collected_traffic_sourceusing the FIRST_VALUE pattern. This gives you raw UTM data without the last-non-direct model, so direct sessions remain direct even if the user was previously acquired via a non-direct channel. -
Start your analysis from July 2024. This avoids maintaining two code paths and is often the pragmatic choice unless historical comparison is a hard requirement.
If you maintain both code paths, consider isolating the logic in a dbt macro that switches behavior based on the event date:
{% macro ga4_session_source(session_key, event_timestamp) %} CASE WHEN event_date >= '2024-07-01' THEN FIRST_VALUE(session_traffic_source_last_click.manual_campaign.source IGNORE NULLS) OVER (PARTITION BY {{ session_key }} ORDER BY {{ event_timestamp }}) ELSE FIRST_VALUE(collected_traffic_source.manual_source IGNORE NULLS) OVER (PARTITION BY {{ session_key }} ORDER BY {{ event_timestamp }}) END{% endmacro %}Choosing the Right Source
| Question | Use This Field |
|---|---|
| What channel acquired this user originally? | traffic_source |
| What brought the user to this specific session? | session_traffic_source_last_click |
| What raw UTM parameters were collected? | collected_traffic_source |
| What did GA4 show in the interface? | session_traffic_source_last_click |
| Building a custom attribution model? | collected_traffic_source |
| Data before July 2024? | collected_traffic_source (with manual logic) |