Platform attribution is convenient until you need to trust it. Meta can’t see your Google touchpoints. Google’s Data-Driven Attribution is a black box that silently falls back to last-click when data thresholds aren’t met. And with iOS ATT opt-in rates hovering around 35%, a significant chunk of mobile interactions go completely untracked.
Building attribution in your data warehouse solves these problems. You control the methodology, see every touchpoint, and can audit the logic. And you can integrate data from every platform into a unified customer journey. For a broader look at why warehouse-based attribution matters and how these models fit together, see my attribution guide.
This article covers the three foundational attribution models: first-touch, last-touch, and linear.
Why these models still matter
Research shows only 14% of marketers believe last-click attribution is effective, yet it remains the default across most platforms. Simple models work fine when you understand their assumptions. They fail when applied blindly.
First-touch attribution gives 100% credit to the initial touchpoint. It answers: “What channels bring people into our funnel?” This matters for brand awareness campaigns, new customer acquisition, and top-of-funnel optimization. The bias: it ignores everything that happened between discovery and conversion.
Last-touch attribution gives 100% credit to the final touchpoint before conversion. It answers: “What closes deals?” This works for direct response campaigns, short purchase cycles, and retargeting evaluation. The bias: it ignores discovery and nurturing channels entirely.
Linear attribution distributes credit equally across all touchpoints. It answers: “What touchpoints contribute to the journey?” This provides a balanced multi-touch view when you genuinely believe all touchpoints matter. The bias: it may over-credit low-impact touchpoints in long journeys.
None of these models is “correct.” They’re lenses, each revealing different aspects of your marketing effectiveness.
Data requirements before you start
Attribution requires touchpoint-level data with specific fields. Here’s what you need:
| Field | Purpose | Example |
|---|---|---|
| user_id | User identifier (device or authenticated) | user_pseudo_id, customer_id |
| session_id | Session identifier | ga_session_id |
| timestamp | When the touchpoint occurred | event_timestamp |
| channel/source/medium | Marketing classification | ”paid search”, “email” |
| campaign | Campaign identifier | UTM campaign value |
| conversion_id | Links touchpoint to conversion | order_id |
| revenue | Conversion value | 149.99 |
The user_id field deserves special attention. GA4’s user_pseudo_id is device-based, so the same person on mobile, laptop, and desktop appears as three separate users without identity resolution. For accurate attribution, you’ll eventually need to join authenticated user_id values with anonymous device identifiers.
Lookback windows by industry
The lookback window determines how far back to consider touchpoints before a conversion. This varies dramatically by purchase cycle:
| Industry | Recommended Window |
|---|---|
| E-commerce (impulse) | 7-14 days |
| E-commerce (considered) | 30-45 days |
| B2B Mid-market | 90-180 days |
| B2B Enterprise | 180+ days |
Setting the wrong lookback window either excludes relevant touchpoints (too short) or attributes credit to unrelated marketing (too long).
First-touch attribution in SQL
First-touch attribution assigns full credit to whichever touchpoint came first in the customer journey. The SQL pattern uses ROW_NUMBER() to identify the earliest touchpoint per conversion.
WITH touchpoints_numbered AS ( SELECT user_id, transaction_id, channel, revenue, touchpoint_timestamp, conversion_timestamp, ROW_NUMBER() OVER ( PARTITION BY user_id, transaction_id ORDER BY touchpoint_timestamp ASC ) AS touch_position FROM touchpoints WHERE touchpoint_timestamp >= TIMESTAMP_SUB( conversion_timestamp, INTERVAL 30 DAY ))SELECT user_id, transaction_id, channel, revenue AS attributed_revenueFROM touchpoints_numberedWHERE touch_position = 1A few things worth noting:
PARTITION BY user_id, transaction_idgroups touchpoints by conversionORDER BY touchpoint_timestamp ASCputs earliest touchpoints firstWHERE touch_position = 1selects only the first touch- The lookback window filter (30 days here) scopes which touchpoints qualify
This gives you one row per conversion, with full revenue attributed to the discovering channel.
Last-touch attribution in SQL
Last-touch attribution is nearly identical. Change ASC to DESC in the ordering:
WITH touchpoints_numbered AS ( SELECT user_id, transaction_id, channel, revenue, touchpoint_timestamp, conversion_timestamp, ROW_NUMBER() OVER ( PARTITION BY user_id, transaction_id ORDER BY touchpoint_timestamp DESC ) AS touch_position FROM touchpoints WHERE touchpoint_timestamp >= TIMESTAMP_SUB( conversion_timestamp, INTERVAL 30 DAY ))SELECT user_id, transaction_id, channel, revenue AS attributed_revenueFROM touchpoints_numberedWHERE touch_position = 1That single change (DESC instead of ASC) switches from “first touchpoint” to “last touchpoint before conversion.”
Last-touch remains the default in most platforms because it correlates well with direct response performance. For short purchase cycles where the conversion trigger matters most, last-touch is often the right choice.
Linear attribution in SQL
Linear attribution distributes revenue equally across all touchpoints in the journey. Instead of picking one winner, every touchpoint gets the same share.
WITH touchpoints_counted AS ( SELECT user_id, transaction_id, channel, revenue, touchpoint_timestamp, conversion_timestamp, COUNT(*) OVER ( PARTITION BY user_id, transaction_id ) AS total_touches FROM touchpoints WHERE touchpoint_timestamp >= TIMESTAMP_SUB( conversion_timestamp, INTERVAL 30 DAY ))SELECT user_id, transaction_id, channel, revenue / total_touches AS attributed_revenueFROM touchpoints_countedHow the division works:
COUNT(*) OVER (PARTITION BY ...)calculates total touchpoints per conversionrevenue / total_touchesdivides the conversion value equally- Every touchpoint gets its own row with fractional credit
A customer who saw five touchpoints before a $100 purchase would result in five rows, each with $20 attributed revenue.
Aggregating for channel-level reporting
The queries above produce touchpoint-level attribution. For channel performance reporting, aggregate the results:
SELECT channel, COUNT(DISTINCT transaction_id) AS conversions_touched, SUM(attributed_revenue) AS total_attributed_revenue, SUM(attributed_revenue) / COUNT(DISTINCT transaction_id) AS avg_attributed_valueFROM linear_attribution_resultsGROUP BY channelORDER BY total_attributed_revenue DESCFor first-touch and last-touch models, conversions_touched equals the actual conversion count since each conversion maps to exactly one channel. For linear attribution, conversions_touched represents how many conversions that channel participated in, which is useful for understanding “assisted” conversions.
Building the dbt layer structure
These attribution models fit naturally into dbt’s base, intermediate, marts pattern:
models/├── base/│ └── base__ga4__events.sql # Raw event cleaning├── intermediate/│ ├── int__touchpoints_mapped.sql # Marketing touchpoint extraction│ └── int__conversion_paths_joined.sql # User journey construction├── marts/attribution/│ ├── mrt__attribution__first_touch.sql│ ├── mrt__attribution__last_touch.sql│ ├── mrt__attribution__linear.sql│ └── mrt__attribution__comparison.sqlBase layer
The base layer cleans raw events. For GA4 BigQuery exports, this means unnesting the nested structures and standardizing field names (see building GA4 dbt models for a deeper walkthrough):
-- base__ga4__events.sqlSELECT user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id, event_timestamp, event_name, traffic_source.source, traffic_source.medium, traffic_source.name AS campaignFROM {{ source('ga4', 'events') }}Intermediate layer
The intermediate layer builds the touchpoint table that attribution models consume. This is where you define what counts as a “touchpoint” and join touchpoint events to conversions from the same base model:
-- int__touchpoints_mapped.sqlWITH touchpoint_events AS ( SELECT user_pseudo_id, session_id, event_timestamp, COALESCE(source, '(direct)') AS source, COALESCE(medium, '(none)') AS medium, CONCAT(COALESCE(source, '(direct)'), ' / ', COALESCE(medium, '(none)')) AS channel FROM {{ ref('base__ga4__events') }} WHERE session_id IS NOT NULL),
conversion_events AS ( SELECT user_pseudo_id, event_timestamp, transaction_id, revenue FROM {{ ref('base__ga4__events') }} WHERE event_name = 'purchase')
SELECT t.user_pseudo_id AS user_id, c.transaction_id, t.session_id, t.event_timestamp AS touchpoint_timestamp, c.event_timestamp AS conversion_timestamp, t.source, t.medium, t.channel, c.revenueFROM touchpoint_events tJOIN conversion_events c ON t.user_pseudo_id = c.user_pseudo_id AND t.event_timestamp <= c.event_timestamp AND t.event_timestamp >= TIMESTAMP_SUB(c.event_timestamp, INTERVAL 30 DAY)Marts layer
The marts layer applies attribution logic. Each model gets its own file for clarity:
-- mrt__attribution__first_touch.sqlWITH touchpoints_numbered AS ( SELECT user_id, transaction_id, channel, source, medium, touchpoint_timestamp, revenue, ROW_NUMBER() OVER ( PARTITION BY user_id, transaction_id ORDER BY touchpoint_timestamp ASC ) AS touch_position FROM {{ ref('int__touchpoints_mapped') }})SELECT user_id, transaction_id, channel, source, medium, revenue AS attributed_revenue, 'first_touch' AS attribution_modelFROM touchpoints_numberedWHERE touch_position = 1Testing that attribution sums correctly
Add a singular dbt test to verify attribution credits sum to actual revenue. This test returns rows only when the totals diverge, which dbt treats as a failure:
-- tests/attribution_revenue_sums_correctly.sqlWITH attribution_totals AS ( SELECT SUM(attributed_revenue) AS total_attributed FROM {{ ref('mrt__attribution__linear') }}),actual_totals AS ( SELECT SUM(revenue) AS total_actual FROM {{ ref('base__ga4__events') }} WHERE event_name = 'purchase')SELECT total_attributed, total_actualFROM attribution_totalsCROSS JOIN actual_totalsWHERE ABS(total_attributed - total_actual) >= 0.01Comparing models side by side
Create a comparison view that unions all models with an identifier:
-- mrt__attribution__comparison.sqlSELECT *, 'first_touch' AS model FROM {{ ref('mrt__attribution__first_touch') }}UNION ALLSELECT *, 'last_touch' AS model FROM {{ ref('mrt__attribution__last_touch') }}UNION ALLSELECT *, 'linear' AS model FROM {{ ref('mrt__attribution__linear') }}This enables dashboard filters that let stakeholders toggle between models and see how channel credit shifts. When email looks great under last-touch but weak under first-touch, you’ve learned something about its role in the funnel.
Beyond the basics
First-touch, last-touch, and linear are building blocks. Once you have this foundation:
Position-based and time-decay models give weighted credit rather than equal credit. Position-based (like the 40-20-40 “U-shaped” model) emphasizes first and last touchpoints, while time-decay assigns more credit to recent interactions. I cover both in the next article in this series.
Data-driven models use Markov chains or Shapley values to calculate credit based on actual conversion probabilities. These require more data (hundreds of conversions minimum) but provide the most nuanced view of channel contribution. See building Markov chains and Shapley values for the implementation.
The code patterns here (window functions for positioning, aggregations for counting, joins for path construction) apply to all these more sophisticated approaches. Master these fundamentals and the advanced models become straightforward extensions.
Key takeaways
Building attribution in SQL gives you transparency and control that platform tools can’t match. Start with these three models, understand what each reveals about your marketing, and let that inform which model fits your business context.
The code is simpler than you might expect. A few window functions, a well-structured touchpoint table, and you have attribution that you actually understand.