Position-Based and Time-Decay Attribution in BigQuery

Linear attribution treats every touchpoint equally. That’s a reasonable starting point, but it ignores something obvious: the first touchpoint that discovered a customer and the last touchpoint that closed the deal probably matter more than the random email they opened in between.

Position-based and time-decay models address this by assigning different weights to touchpoints based on where they fall in the journey or how recently they occurred. These models require slightly more complex SQL but produce attribution that better reflects how marketing actually works.

Position-based models: weighting by journey stage

Position-based models assign credit based on a touchpoint’s location in the customer journey. The most common is the U-shaped model, which emphasizes first and last touches while still crediting the middle.

The U-shaped model (40-20-40)

The U-shaped model distributes credit as follows:

First touchpoint: 40%
Last touchpoint: 40%
All middle touchpoints: 20% split equally

The formula:

Credit(first) = 0.40
Credit(last) = 0.40
Credit(middle_i) = 0.20 / (n - 2)

This model works well for most B2C and B2B scenarios where you want to value both the channel that drove initial awareness and the channel that triggered conversion.

The W-shaped model (30-30-30-10)

B2B companies with longer sales cycles often use W-shaped attribution, which adds weight to a key middle touchpoint like lead creation or demo request:

Credit(first) = 0.30
Credit(key_middle) = 0.30
Credit(last) = 0.30
Credit(other) = 0.10 / (n - 3)

W-shaped requires identifying that key middle moment in your data, which adds implementation complexity.

Edge cases

Single-touchpoint conversions get 100% credit to that touchpoint. Two-touchpoint conversions split credit 50/50. Your SQL needs to handle these explicitly, or you’ll get incorrect results.

U-shaped attribution in BigQuery

A complete implementation of the 40-20-40 U-shaped model:

WITH touchpoints_positioned AS (
  SELECT
    user_id,
    transaction_id,
    channel,
    revenue,
    touchpoint_timestamp,
    ROW_NUMBER() OVER (
      PARTITION BY user_id, transaction_id
      ORDER BY touchpoint_timestamp ASC
    ) AS position,
    COUNT(*) OVER (
      PARTITION BY user_id, transaction_id
    ) AS total_touches
  FROM touchpoints
  WHERE touchpoint_timestamp >= TIMESTAMP_SUB(
    conversion_timestamp,
    INTERVAL 30 DAY
  )
)
SELECT
  user_id,
  transaction_id,
  channel,
  CASE
    WHEN total_touches = 1 THEN 1.0
    WHEN total_touches = 2 THEN 0.5
    WHEN position = 1 THEN 0.4
    WHEN position = total_touches THEN 0.4
    ELSE 0.2 / (total_touches - 2)
  END * revenue AS attributed_revenue
FROM touchpoints_positioned

What each piece does:

ROW_NUMBER() identifies each touchpoint’s position in the journey
COUNT(*) OVER() gives us the total touchpoints per conversion
The CASE statement handles edge cases first, then applies the 40-20-40 split
Multiplying the weight by revenue produces the attributed amount

For a customer with 5 touchpoints and a $100 conversion:

Touch 1: $40 (40%)
Touch 2: $6.67 (20% ÷ 3)
Touch 3: $6.67 (20% ÷ 3)
Touch 4: $6.67 (20% ÷ 3)
Touch 5: $40 (40%)

Industry weight variations

The 40-20-40 split isn’t universal. Adjust based on your business:

Industry	Recommended Weights
E-commerce/retail	Standard 40-20-40
B2B SaaS	W-shaped (30-30-30-10) with demo request
High-consideration products	45-10-45 to emphasize endpoints

Time-decay attribution: weighting by recency

Time-decay attribution assigns more credit to touchpoints that occurred closer to the conversion. A touchpoint from yesterday matters more than one from two weeks ago.

The exponential decay formula

Time-decay uses exponential decay with a half-life parameter:

Weight(touchpoint) = 2^(-days_before_conversion / half_life)

The half-life determines how quickly credit diminishes. With a 7-day half-life:

Day 0 (conversion day): Weight = 1.0 (100%)
Day 7: Weight = 0.5 (50%)
Day 14: Weight = 0.25 (25%)
Day 21: Weight = 0.125 (12.5%)

Google Analytics uses a 7-day half-life by default. This works reasonably well for e-commerce but may be too aggressive for B2B.

Choosing the right half-life

Half-life should roughly match your typical sales cycle length:

Industry	Half-Life	Lookback Window
B2C E-commerce (impulse)	3-7 days	7-14 days
B2C E-commerce (considered)	7-14 days	30-45 days
B2B Mid-Market	14-30 days	90-180 days
B2B Enterprise	30-45 days	180+ days

A half-life that’s too short under-credits early-funnel channels. One that’s too long approaches linear attribution, defeating the purpose.

Time-decay attribution in BigQuery

The implementation calculates raw weights, then normalizes so they sum to 1:

WITH decay_weights AS (
  SELECT
    user_id,
    transaction_id,
    channel,
    revenue,
    conversion_timestamp,
    touchpoint_timestamp,
    POW(
      0.5,
      TIMESTAMP_DIFF(conversion_timestamp, touchpoint_timestamp, MINUTE) / (7.0 * 24 * 60)
    ) AS raw_weight
  FROM touchpoints
  WHERE touchpoint_timestamp >= TIMESTAMP_SUB(
    conversion_timestamp,
    INTERVAL 30 DAY
  )
),
normalized AS (
  SELECT
    *,
    SUM(raw_weight) OVER (
      PARTITION BY user_id, transaction_id
    ) AS total_weight
  FROM decay_weights
)
SELECT
  user_id,
  transaction_id,
  channel,
  (raw_weight / total_weight) * revenue AS attributed_revenue
FROM normalized

What each piece does:

TIMESTAMP_DIFF(..., MINUTE) calculates the time gap in minutes for precision
7.0 * 24 * 60 converts the 7-day half-life to minutes (10,080 minutes)
POW(0.5, time_ratio) applies exponential decay
Dividing by total_weight normalizes so all weights sum to 1

For a customer with touchpoints at day -14, day -7, and day 0:

Day -14: raw_weight = 0.25, normalized = 0.25/1.75 = 14.3%
Day -7: raw_weight = 0.5, normalized = 0.5/1.75 = 28.6%
Day 0: raw_weight = 1.0, normalized = 1.0/1.75 = 57.1%

Parameterizing the half-life

Hard-coding the half-life makes experimentation difficult. Use a variable or configuration table:

DECLARE half_life_days FLOAT64 DEFAULT 7.0;

WITH decay_weights AS (
  SELECT
    user_id,
    transaction_id,
    channel,
    revenue,
    conversion_timestamp,
    touchpoint_timestamp,
    POW(
      0.5,
      TIMESTAMP_DIFF(conversion_timestamp, touchpoint_timestamp, HOUR) / (half_life_days * 24)
    ) AS raw_weight
  FROM touchpoints
  WHERE touchpoint_timestamp >= TIMESTAMP_SUB(
    conversion_timestamp,
    INTERVAL 30 DAY
  )
)
-- ... rest of query

This lets you test different half-life values and compare results.

Building both models in dbt

Add these models to your existing attribution marts structure:

models/
├── intermediate/
│   └── int__touchpoints.sql
├── marts/attribution/
│   ├── mrt__attribution_first_touch.sql
│   ├── mrt__attribution_last_touch.sql
│   ├── mrt__attribution_linear.sql
│   ├── mrt__attribution_position_based.sql   # New
│   └── mrt__attribution_time_decay.sql       # New

Position-based model with configurable weights

Use dbt variables to make weights configurable:

-- mrt__attribution_position_based.sql
{% set first_touch_weight = var('attribution_first_weight', 0.4) %}
{% set last_touch_weight = var('attribution_last_weight', 0.4) %}
{% set middle_weight = 1.0 - first_touch_weight - last_touch_weight %}

WITH touchpoints_positioned AS (
  SELECT
    user_id,
    transaction_id,
    channel,
    source,
    medium,
    revenue,
    touchpoint_timestamp,
    ROW_NUMBER() OVER (
      PARTITION BY user_id, transaction_id
      ORDER BY touchpoint_timestamp ASC
    ) AS position,
    COUNT(*) OVER (
      PARTITION BY user_id, transaction_id
    ) AS total_touches
  FROM {{ ref('int__touchpoints') }}
)
SELECT
  user_id,
  transaction_id,
  channel,
  source,
  medium,
  touchpoint_timestamp,
  CASE
    WHEN total_touches = 1 THEN 1.0
    WHEN total_touches = 2 THEN 0.5
    WHEN position = 1 THEN {{ first_touch_weight }}
    WHEN position = total_touches THEN {{ last_touch_weight }}
    ELSE {{ middle_weight }} / (total_touches - 2)
  END * revenue AS attributed_revenue,
  'position_based' AS attribution_model
FROM touchpoints_positioned

Configure weights in dbt_project.yml:

vars:
  attribution_first_weight: 0.4
  attribution_last_weight: 0.4
  attribution_half_life_days: 7

Time-decay model with configurable half-life

-- mrt__attribution_time_decay.sql
{% set half_life_days = var('attribution_half_life_days', 7) %}

WITH decay_weights AS (
  SELECT
    user_id,
    transaction_id,
    channel,
    source,
    medium,
    revenue,
    touchpoint_timestamp,
    conversion_timestamp,
    POW(
      0.5,
      TIMESTAMP_DIFF(conversion_timestamp, touchpoint_timestamp, HOUR) / ({{ half_life_days }} * 24.0)
    ) AS raw_weight
  FROM {{ ref('int__touchpoints') }}
),
normalized AS (
  SELECT
    *,
    SUM(raw_weight) OVER (
      PARTITION BY user_id, transaction_id
    ) AS total_weight
  FROM decay_weights
)
SELECT
  user_id,
  transaction_id,
  channel,
  source,
  medium,
  touchpoint_timestamp,
  (raw_weight / total_weight) * revenue AS attributed_revenue,
  'time_decay' AS attribution_model
FROM normalized

Testing weighted models

Your tests should verify that weights sum correctly:

-- tests/attribution_weights_sum_to_revenue.sql
WITH model_totals AS (
  SELECT
    transaction_id,
    SUM(attributed_revenue) AS total_attributed
  FROM {{ ref('mrt__attribution_position_based') }}
  GROUP BY transaction_id
),
actual_revenue AS (
  SELECT
    transaction_id,
    MAX(revenue) AS actual_revenue
  FROM {{ ref('int__touchpoints') }}
  GROUP BY transaction_id
)
SELECT
  m.transaction_id,
  m.total_attributed,
  a.actual_revenue
FROM model_totals m
JOIN actual_revenue a
  ON m.transaction_id = a.transaction_id
WHERE ABS(m.total_attributed - a.actual_revenue) > 0.01

This returns rows only when attribution doesn’t match actual revenue, making it easy to spot calculation errors.

Choosing between position-based and time-decay

These models make different assumptions about what matters in a customer journey.

Use position-based when:

You believe first and last touches are genuinely more impactful
Your funnel has clear “discovery” and “conversion” moments
You want to credit both awareness and closing channels
Sales cycles are relatively consistent in length

Use time-decay when:

Recent touchpoints demonstrably influence conversion more
You’re optimizing for immediate conversion impact
Sales cycles vary widely in length
You want to weight urgency and recency into the model

Consider running both and comparing results. When position-based and time-decay produce similar channel rankings, you have higher confidence in the findings. When they diverge significantly, that’s a signal to investigate why.

Moving toward data-driven models

Position-based and time-decay are improvements over simple first/last-touch, but they still rely on assumptions about how credit should distribute. The weights are business decisions, not derived from your actual data.

Data-driven models like Markov chains and Shapley values calculate weights based on observed conversion patterns. They answer: “Given our historical data, how much did each channel actually contribute to conversions?”

These models require more data (hundreds of conversions minimum) and more complex implementation, but they provide the most defensible attribution when you have sufficient volume.

For most companies, building solid first-touch, last-touch, linear, position-based, and time-decay models provides a strong foundation. Run them in parallel, compare results across models, and use the divergence to understand where your attribution has high confidence versus uncertainty.