Data-Driven Attribution: Building Markov Chains and Shapley Values

First-touch, last-touch, linear, position-based, time-decay: all of these attribution models share a fundamental limitation. They assign credit based on assumptions about which touchpoints matter, not measurements of which touchpoints actually drive conversions. Data-driven attribution takes a different approach: instead of assuming position determines importance, it calculates what happens to conversion probability when you remove a channel entirely.

Why Heuristic Models Fall Short

Position-based models assume first and last touches deserve more credit. That might be true for your business, or it might not. Linear models assume all touches contribute equally (also an assumption). Time-decay assumes recency correlates with influence. These are reasonable hypotheses, but they’re not validated against your actual data.

Data-driven attribution asks a fundamentally different question: if this channel didn’t exist, how many fewer conversions would we see? This is the removal effect, and it’s the foundation of both Markov chain and Shapley value attribution.

Consider what this means for budget decisions. A channel that appears in many converting journeys might seem valuable under heuristic models. But if removing it doesn’t change conversion probability (because users would have found another path), its true contribution is lower than position-based credit suggests.

Markov Chains for Attribution: The Concept

A Markov chain models customer journeys as a sequence of states where the probability of moving to the next state depends only on the current state. This is called the “Markov property” or memorylessness. The model doesn’t care how you got to your current state, only where you are now.

For attribution, states are channels plus three special states: START (beginning of journey), CONVERSION (successful outcome), and NULL (journey ends without converting). Each observed journey contributes to transition probabilities between states.

Consider a simple example with three channels: Paid Search, Email, and Direct. From historical data, you observe patterns like:

  • 40% of users who start go to Paid Search
  • 30% of users at Paid Search move to Email
  • 50% of users at Email convert

These probabilities form a transition matrix, a table where rows are “from” states and columns are “to” states. Each cell contains the probability of that transition based on your historical journey data.

The power of this representation is that you can calculate the overall probability of reaching CONVERSION from START by following all possible paths through the matrix. More importantly, you can recalculate that probability after removing a channel, which gives you the removal effect.

The Removal Effect: How Markov Attribution Works

The removal effect measures how much conversion probability drops when a channel is removed from the journey graph. The formula is straightforward:

Removal Effect(Channel X) = 1 - (P(conversion without X) / P(conversion with all channels))

Let’s work through a concrete example. Suppose your transition matrix produces these results:

  • Total conversion probability with all channels: 50%
  • Conversion probability without Paid Search: 35%
  • Conversion probability without Email: 45%
  • Conversion probability without Direct: 48%

The removal effects would be:

  • Paid Search: 1 - (0.35 / 0.50) = 30%
  • Email: 1 - (0.45 / 0.50) = 10%
  • Direct: 1 - (0.48 / 0.50) = 4%

These percentages sum to 44%, not 100%. Removal effects measure marginal contribution, and those contributions overlap. Removing Paid Search affects paths that also included Email. To get attribution shares that sum to 100% of conversions, you normalize:

Total Removal Effects = 30% + 10% + 4% = 44%
Attribution Share(Paid Search) = 30% / 44% = 68%
Attribution Share(Email) = 10% / 44% = 23%
Attribution Share(Direct) = 4% / 44% = 9%

Now multiply these shares by your total conversions to get attributed conversions per channel.

Implementing Markov Attribution in SQL

Full Markov chain attribution requires matrix operations that push SQL beyond its comfort zone. The practical approach is to use SQL for path extraction and data preparation, then hand off to Python for the matrix calculations.

Path Extraction in BigQuery

The first step is extracting journey paths from your touchpoint data. You need two datasets: paths that ended in conversion and paths that didn’t.

WITH touchpoints AS (
SELECT
user_id,
channel,
event_timestamp,
conversion_id
FROM {{ ref('int__touchpoints') }}
WHERE event_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)
),
converting_paths AS (
SELECT
t.user_id,
t.conversion_id,
STRING_AGG(t.channel, ' > ' ORDER BY t.event_timestamp) AS journey_path,
'conversion' AS outcome
FROM touchpoints t
INNER JOIN {{ ref('int__conversions') }} c
ON t.user_id = c.user_id
AND t.conversion_id = c.conversion_id
WHERE t.event_timestamp BETWEEN
TIMESTAMP_SUB(c.conversion_timestamp, INTERVAL 30 DAY)
AND c.conversion_timestamp
GROUP BY t.user_id, t.conversion_id
),
non_converting_paths AS (
SELECT
t.user_id,
CAST(NULL AS STRING) AS conversion_id,
STRING_AGG(t.channel, ' > ' ORDER BY t.event_timestamp) AS journey_path,
'null' AS outcome
FROM touchpoints t
LEFT JOIN {{ ref('int__conversions') }} c
ON t.user_id = c.user_id
WHERE c.user_id IS NULL
GROUP BY t.user_id
)
SELECT
user_id,
conversion_id,
journey_path,
outcome
FROM converting_paths
UNION ALL
SELECT
user_id,
conversion_id,
journey_path,
outcome
FROM non_converting_paths

This produces rows like:

user_idjourney_pathoutcome
user_123Paid Search > Email > Directconversion
user_456Organic > Paid Search > Organicnull
user_789Email > Email > Directconversion

Transition Counting

From these paths, you can calculate transition counts in SQL:

WITH path_transitions AS (
SELECT
journey_path,
outcome,
SPLIT(journey_path, ' > ') AS channels
FROM journey_paths
),
transitions AS (
SELECT
'START' AS from_state,
channels[OFFSET(0)] AS to_state,
COUNT(*) AS transition_count
FROM path_transitions
GROUP BY from_state, to_state
UNION ALL
SELECT
channels[OFFSET(i)] AS from_state,
channels[OFFSET(i + 1)] AS to_state,
COUNT(*) AS transition_count
FROM path_transitions,
UNNEST(GENERATE_ARRAY(0, ARRAY_LENGTH(channels) - 2)) AS i
GROUP BY from_state, to_state
UNION ALL
SELECT
channels[OFFSET(ARRAY_LENGTH(channels) - 1)] AS from_state,
CASE outcome
WHEN 'conversion' THEN 'CONVERSION'
ELSE 'NULL'
END AS to_state,
COUNT(*) AS transition_count
FROM path_transitions
GROUP BY from_state, to_state
)
SELECT
from_state,
to_state,
transition_count,
transition_count / SUM(transition_count) OVER (PARTITION BY from_state) AS transition_probability
FROM transitions

From here, the removal effect calculation requires matrix operations (specifically, computing the probability of reaching CONVERSION from START through all possible paths). The ChannelAttribution package in R or marketing-attribution-models in Python handles this efficiently with C++ backends optimized for scale.

Shapley Values: A Game Theory Approach

Shapley values come from cooperative game theory, originally designed to fairly divide profits among players in a coalition. For attribution, channels are players and conversions are the value being divided.

The Shapley value for a channel equals its average marginal contribution across all possible orderings of channels. Mathematically:

φᵢ = Σ [|S|!(n-|S|-1)!/n!] × [v(S ∪ {i}) - v(S)]

In plain terms: for every possible subset of channels, calculate how much adding channel i increases conversion probability. Weight these contributions by how likely that subset is, then sum.

Shapley values satisfy four fairness axioms:

  1. Efficiency: Credits sum exactly to total conversions
  2. Symmetry: Channels with equal contribution get equal credit
  3. Dummy Player: Channels that add no value get zero credit
  4. Additivity: Attribution from two separate analyses can be combined

The theoretical elegance has a practical cost: calculating Shapley values requires evaluating 2^n coalitions where n is your number of channels. With 10 channels, that’s 1,024 coalitions. With 20 channels, over a million. Approximation methods using Monte Carlo sampling make this feasible at scale, but it remains more computationally intensive than Markov chains.

When to Use Shapley vs Markov

Markov chains excel when the sequential nature of journeys matters to your business. The transition probability from Paid Search to Email might differ meaningfully from Email to Paid Search. Markov models capture these asymmetries.

Shapley values treat channels as interchangeable coalition members without inherent ordering. This makes sense when you care more about which channels appear in journeys than the order they appear.

In practice, Markov chains are more common for three reasons: lower computational cost, intuitive interpretation because transition probabilities map to real customer behavior, and good empirical performance. Shapley values shine when you need theoretical guarantees about fairness or when regulatory requirements demand provable methodology.

Comparison with Google’s Data-Driven Attribution

Google’s Data-Driven Attribution in GA4 uses a combination of methods: conversion probability models, counterfactual analysis, and Shapley-based credit distribution enhanced with machine learning. It considers up to 50 touchpoints per conversion with a 90-day default lookback window.

There’s a critical caveat that catches many teams: DDA requires 400+ conversions per conversion type in the past 30 days, plus roughly 10,000 paths with two or more interactions. If you don’t meet these thresholds, GA4 silently falls back to last-click attribution without notification. You might think you’re using data-driven attribution when you’re actually getting last-click.

Building attribution in your warehouse offers advantages beyond avoiding silent fallbacks:

  • Transparency: You can inspect every step of the calculation
  • Customization: Adjust lookback windows, channel groupings, and model parameters
  • Integration: Combine online touchpoints with offline data, CRM interactions, and non-Google platforms
  • Auditability: Export results for validation and stakeholder review

The trade-off is implementation effort. Google’s DDA is turnkey if you meet the thresholds. Warehouse-native attribution requires building and maintaining the pipeline.

Practical Implementation Guidance

Channel Grouping

Start with 5-10 high-level channel groups. Too many channels produces sparse transition matrices and unreliable probabilities. Collapse low-volume channels (under 2% of touchpoints) into an “Other” category. You can add granularity later as data volume grows.

Grouping decisions matter: “Paid Social” as one channel behaves differently than “Facebook Ads” and “LinkedIn Ads” as separate channels. Start broad, validate model stability, then increase granularity.

Minimum Data Requirements

For reliable Markov chain attribution, you need roughly 10 times as many transitions as unique transition types. With 10 channels (plus START, CONVERSION, NULL), you have up to 13 × 13 = 169 possible transitions. Aim for at least 1,690 total path transitions, typically achievable with a few hundred conversions.

Higher-order Markov models (where next state depends on current plus previous states) require exponentially more data. First-order models work well for most use cases.

Validation Strategy

Run Markov attribution alongside your heuristic models for the same time period. Compare channel rankings:

  • High agreement: Channels rank similarly across models. Higher confidence in results.
  • Significant divergence: One model credits a channel much more than others. Investigate why, as this is either insight or a data quality issue.

The ultimate validation is incrementality testing: hold out a portion of users from seeing a channel’s ads and measure the actual conversion difference. Use incrementality results to calibrate your attribution model, treating measured incremental lift as ground truth.

Moving Forward

Data-driven attribution measures something heuristic models assume: actual channel contribution to conversion probability. Markov chains calculate this through transition matrices and removal effects. Shapley values calculate it through coalition analysis and marginal contributions. Both approaches beat position-based guessing, though they require more data and computation.

A practical first implementation combines path extraction in SQL with Markov chains in Python. Running the results alongside your existing attribution model comparison builds stakeholder confidence, and incrementality tests provide ground truth when budget allows.

No attribution model captures perfect truth because customer decisions are influenced by factors no model can measure. But data-driven approaches get closer to reality than assuming first and last touches deserve 40% each. Aim for better budget decisions, not perfect attribution.