First-Touch, Last-Touch, and Linear Attribution in SQL

Platform attribution is convenient until you need to trust it. Meta can’t see your Google touchpoints. Google’s Data-Driven Attribution is a black box that silently falls back to last-click when data thresholds aren’t met. And with iOS ATT opt-in rates hovering around 35%, a significant chunk of mobile interactions go completely untracked.

Building attribution in your data warehouse solves these problems. You control the methodology, see every touchpoint, and can audit the logic. And you can integrate data from every platform into a unified customer journey. For a broader look at why warehouse-based attribution matters and how these models fit together, see my attribution guide.

This article covers the three foundational attribution models: first-touch, last-touch, and linear.

Why these models still matter

Research shows only 14% of marketers believe last-click attribution is effective, yet it remains the default across most platforms. Simple models work fine when you understand their assumptions. They fail when applied blindly.

First-touch attribution gives 100% credit to the initial touchpoint. It answers: “What channels bring people into our funnel?” This matters for brand awareness campaigns, new customer acquisition, and top-of-funnel optimization. The bias: it ignores everything that happened between discovery and conversion.

Last-touch attribution gives 100% credit to the final touchpoint before conversion. It answers: “What closes deals?” This works for direct response campaigns, short purchase cycles, and retargeting evaluation. The bias: it ignores discovery and nurturing channels entirely.

Linear attribution distributes credit equally across all touchpoints. It answers: “What touchpoints contribute to the journey?” This provides a balanced multi-touch view when you genuinely believe all touchpoints matter. The bias: it may over-credit low-impact touchpoints in long journeys.

None of these models is “correct.” They’re lenses, each revealing different aspects of your marketing effectiveness.

Data requirements before you start

Attribution requires touchpoint-level data with specific fields. Here’s what you need:

Field	Purpose	Example
user_id	User identifier (device or authenticated)	user_pseudo_id, customer_id
session_id	Session identifier	ga_session_id
timestamp	When the touchpoint occurred	event_timestamp
channel/source/medium	Marketing classification	”paid search”, “email”
campaign	Campaign identifier	UTM campaign value
conversion_id	Links touchpoint to conversion	order_id
revenue	Conversion value	149.99

The user_id field deserves special attention. GA4’s user_pseudo_id is device-based, so the same person on mobile, laptop, and desktop appears as three separate users without identity resolution. For accurate attribution, you’ll eventually need to join authenticated user_id values with anonymous device identifiers.

Lookback windows by industry

The lookback window determines how far back to consider touchpoints before a conversion. This varies dramatically by purchase cycle:

Industry	Recommended Window
E-commerce (impulse)	7-14 days
E-commerce (considered)	30-45 days
B2B Mid-market	90-180 days
B2B Enterprise	180+ days

Setting the wrong lookback window either excludes relevant touchpoints (too short) or attributes credit to unrelated marketing (too long).

First-touch attribution in SQL

First-touch attribution assigns full credit to whichever touchpoint came first in the customer journey. The SQL pattern uses ROW_NUMBER() to identify the earliest touchpoint per conversion.

WITH touchpoints_numbered AS (
  SELECT
    user_id,
    transaction_id,
    channel,
    revenue,
    touchpoint_timestamp,
    conversion_timestamp,
    ROW_NUMBER() OVER (
      PARTITION BY user_id, transaction_id
      ORDER BY touchpoint_timestamp ASC
    ) AS touch_position
  FROM touchpoints
  WHERE touchpoint_timestamp >= TIMESTAMP_SUB(
    conversion_timestamp,
    INTERVAL 30 DAY
  )
)
SELECT
  user_id,
  transaction_id,
  channel,
  revenue AS attributed_revenue
FROM touchpoints_numbered
WHERE touch_position = 1

A few things worth noting:

PARTITION BY user_id, transaction_id groups touchpoints by conversion
ORDER BY touchpoint_timestamp ASC puts earliest touchpoints first
WHERE touch_position = 1 selects only the first touch
The lookback window filter (30 days here) scopes which touchpoints qualify

This gives you one row per conversion, with full revenue attributed to the discovering channel.

Last-touch attribution in SQL

Last-touch attribution is nearly identical. Change ASC to DESC in the ordering:

WITH touchpoints_numbered AS (
  SELECT
    user_id,
    transaction_id,
    channel,
    revenue,
    touchpoint_timestamp,
    conversion_timestamp,
    ROW_NUMBER() OVER (
      PARTITION BY user_id, transaction_id
      ORDER BY touchpoint_timestamp DESC
    ) AS touch_position
  FROM touchpoints
  WHERE touchpoint_timestamp >= TIMESTAMP_SUB(
    conversion_timestamp,
    INTERVAL 30 DAY
  )
)
SELECT
  user_id,
  transaction_id,
  channel,
  revenue AS attributed_revenue
FROM touchpoints_numbered
WHERE touch_position = 1

That single change (DESC instead of ASC) switches from “first touchpoint” to “last touchpoint before conversion.”

Last-touch remains the default in most platforms because it correlates well with direct response performance. For short purchase cycles where the conversion trigger matters most, last-touch is often the right choice.

Linear attribution in SQL

Linear attribution distributes revenue equally across all touchpoints in the journey. Instead of picking one winner, every touchpoint gets the same share.

WITH touchpoints_counted AS (
  SELECT
    user_id,
    transaction_id,
    channel,
    revenue,
    touchpoint_timestamp,
    conversion_timestamp,
    COUNT(*) OVER (
      PARTITION BY user_id, transaction_id
    ) AS total_touches
  FROM touchpoints
  WHERE touchpoint_timestamp >= TIMESTAMP_SUB(
    conversion_timestamp,
    INTERVAL 30 DAY
  )
)
SELECT
  user_id,
  transaction_id,
  channel,
  revenue / total_touches AS attributed_revenue
FROM touchpoints_counted

How the division works:

COUNT(*) OVER (PARTITION BY ...) calculates total touchpoints per conversion
revenue / total_touches divides the conversion value equally
Every touchpoint gets its own row with fractional credit

A customer who saw five touchpoints before a $100 purchase would result in five rows, each with $20 attributed revenue.

Aggregating for channel-level reporting

The queries above produce touchpoint-level attribution. For channel performance reporting, aggregate the results:

SELECT
  channel,
  COUNT(DISTINCT transaction_id) AS conversions_touched,
  SUM(attributed_revenue) AS total_attributed_revenue,
  SUM(attributed_revenue) / COUNT(DISTINCT transaction_id) AS avg_attributed_value
FROM linear_attribution_results
GROUP BY channel
ORDER BY total_attributed_revenue DESC

For first-touch and last-touch models, conversions_touched equals the actual conversion count since each conversion maps to exactly one channel. For linear attribution, conversions_touched represents how many conversions that channel participated in, which is useful for understanding “assisted” conversions.

Building the dbt layer structure

These attribution models fit naturally into dbt’s base, intermediate, marts pattern:

models/
├── base/
│   └── base__ga4__events.sql               # Raw event cleaning
├── intermediate/
│   ├── int__touchpoints_mapped.sql         # Marketing touchpoint extraction
│   └── int__conversion_paths_joined.sql    # User journey construction
├── marts/attribution/
│   ├── mrt__attribution__first_touch.sql
│   ├── mrt__attribution__last_touch.sql
│   ├── mrt__attribution__linear.sql
│   └── mrt__attribution__comparison.sql

Base layer

The base layer cleans raw events. For GA4 BigQuery exports, this means unnesting the nested structures and standardizing field names (see building GA4 dbt models for a deeper walkthrough):

-- base__ga4__events.sql
SELECT
  user_pseudo_id,
  (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
  event_timestamp,
  event_name,
  traffic_source.source,
  traffic_source.medium,
  traffic_source.name AS campaign
FROM {{ source('ga4', 'events') }}

Intermediate layer

The intermediate layer builds the touchpoint table that attribution models consume. This is where you define what counts as a “touchpoint” and join touchpoint events to conversions from the same base model:

-- int__touchpoints_mapped.sql
WITH touchpoint_events AS (
  SELECT
    user_pseudo_id,
    session_id,
    event_timestamp,
    COALESCE(source, '(direct)') AS source,
    COALESCE(medium, '(none)') AS medium,
    CONCAT(COALESCE(source, '(direct)'), ' / ', COALESCE(medium, '(none)')) AS channel
  FROM {{ ref('base__ga4__events') }}
  WHERE session_id IS NOT NULL
),

conversion_events AS (
  SELECT
    user_pseudo_id,
    event_timestamp,
    transaction_id,
    revenue
  FROM {{ ref('base__ga4__events') }}
  WHERE event_name = 'purchase'
)

SELECT
  t.user_pseudo_id AS user_id,
  c.transaction_id,
  t.session_id,
  t.event_timestamp AS touchpoint_timestamp,
  c.event_timestamp AS conversion_timestamp,
  t.source,
  t.medium,
  t.channel,
  c.revenue
FROM touchpoint_events t
JOIN conversion_events c
  ON t.user_pseudo_id = c.user_pseudo_id
  AND t.event_timestamp <= c.event_timestamp
  AND t.event_timestamp >= TIMESTAMP_SUB(c.event_timestamp, INTERVAL 30 DAY)

Marts layer

The marts layer applies attribution logic. Each model gets its own file for clarity:

-- mrt__attribution__first_touch.sql
WITH touchpoints_numbered AS (
  SELECT
    user_id,
    transaction_id,
    channel,
    source,
    medium,
    touchpoint_timestamp,
    revenue,
    ROW_NUMBER() OVER (
      PARTITION BY user_id, transaction_id
      ORDER BY touchpoint_timestamp ASC
    ) AS touch_position
  FROM {{ ref('int__touchpoints_mapped') }}
)
SELECT
  user_id,
  transaction_id,
  channel,
  source,
  medium,
  revenue AS attributed_revenue,
  'first_touch' AS attribution_model
FROM touchpoints_numbered
WHERE touch_position = 1

Testing that attribution sums correctly

Add a singular dbt test to verify attribution credits sum to actual revenue. This test returns rows only when the totals diverge, which dbt treats as a failure:

-- tests/attribution_revenue_sums_correctly.sql
WITH attribution_totals AS (
  SELECT SUM(attributed_revenue) AS total_attributed
  FROM {{ ref('mrt__attribution__linear') }}
),
actual_totals AS (
  SELECT SUM(revenue) AS total_actual
  FROM {{ ref('base__ga4__events') }}
  WHERE event_name = 'purchase'
)
SELECT
  total_attributed,
  total_actual
FROM attribution_totals
CROSS JOIN actual_totals
WHERE ABS(total_attributed - total_actual) >= 0.01

Comparing models side by side

Create a comparison view that unions all models with an identifier:

-- mrt__attribution__comparison.sql
SELECT *, 'first_touch' AS model FROM {{ ref('mrt__attribution__first_touch') }}
UNION ALL
SELECT *, 'last_touch' AS model FROM {{ ref('mrt__attribution__last_touch') }}
UNION ALL
SELECT *, 'linear' AS model FROM {{ ref('mrt__attribution__linear') }}

This enables dashboard filters that let stakeholders toggle between models and see how channel credit shifts. When email looks great under last-touch but weak under first-touch, you’ve learned something about its role in the funnel.

Beyond the basics

First-touch, last-touch, and linear are building blocks. Once you have this foundation:

Position-based and time-decay models give weighted credit rather than equal credit. Position-based (like the 40-20-40 “U-shaped” model) emphasizes first and last touchpoints, while time-decay assigns more credit to recent interactions. I cover both in the next article in this series.

Data-driven models use Markov chains or Shapley values to calculate credit based on actual conversion probabilities. These require more data (hundreds of conversions minimum) but provide the most nuanced view of channel contribution. See building Markov chains and Shapley values for the implementation.

The code patterns here (window functions for positioning, aggregations for counting, joins for path construction) apply to all these more sophisticated approaches. Master these fundamentals and the advanced models become straightforward extensions.

Key takeaways

Building attribution in SQL gives you transparency and control that platform tools can’t match. Start with these three models, understand what each reveals about your marketing, and let that inform which model fits your business context.

The code is simpler than you might expect. A few window functions, a well-structured touchpoint table, and you have attribution that you actually understand.