Shapley Value Attribution

Shapley values come from cooperative game theory, originally designed to fairly divide profits among players in a coalition. For attribution, channels are the players and conversions are the value being divided. Where Markov chains model journeys as state transitions, Shapley values treat channels as coalition members and calculate each one’s average marginal contribution.

The core calculation

The Shapley value for a channel equals its average marginal contribution across all possible orderings of channels. Mathematically:

phi_i = SUM [|S|!(n-|S|-1)!/n!] * [v(S + {i}) - v(S)]

In plain terms: for every possible subset of channels S, calculate how much adding channel i increases conversion probability. Weight these contributions by how likely that subset is to occur, then sum.

The function v(S) represents the conversion probability when only the channels in subset S are present. v(S + {i}) - v(S) is the marginal contribution of channel i to that subset — how much does conversion probability increase when you add channel i to the mix?

This is the flip side of the removal effect. Instead of asking “what do we lose without this channel?”, Shapley asks “what do we gain by adding it?” The answers converge to the same insight through different mathematical paths.

The fairness axioms

Shapley values satisfy four mathematical properties that make them provably fair. No other attribution method satisfies all four:

Efficiency — attributed conversions sum exactly to total conversions. No credit is lost or created. This is the same property that the normalization step in Markov attribution achieves, but Shapley values get it for free — it’s baked into the math.
Symmetry — channels with equal contribution get equal credit. If Email and SMS produce identical conversion probability improvements in every possible subset, they get identical attribution. No model bias toward one over the other.
Dummy Player — channels that add no value get zero credit. If adding Display to any combination of channels never increases conversion probability, Display gets exactly zero attributed conversions. Heuristic models like linear attribution would still give Display equal credit just for being present in the journey.
Additivity — attribution from two separate analyses can be combined. If you calculate Shapley values for January and February separately, the sum equals the Shapley values for the combined period. This property enables incremental computation over time windows.

These axioms matter beyond mathematical elegance. In regulated industries or when stakeholders demand transparent, defensible methodology, Shapley values provide guarantees that no heuristic model can match.

The computational cost

The theoretical elegance comes with a practical price tag. Calculating exact Shapley values requires evaluating 2^n coalitions where n is your number of channels.

Channels	Coalitions	Feasibility
5	32	Trivial
10	1,024	Seconds
15	32,768	Minutes
20	1,048,576	Hours
25	33,554,432	Impractical

With 10 channels, exact computation is fast. With 20, it becomes expensive. With 25+, it’s impractical without approximation.

Monte Carlo sampling makes Shapley values feasible at scale. Instead of evaluating every possible coalition, you sample random orderings of channels and compute marginal contributions for each. With enough samples (typically 1,000-10,000 iterations), the approximation converges close to the exact values. The trade-off is precision: you get estimates with confidence intervals rather than exact numbers.

Shapley vs. Markov: when to use which

Markov chains and Shapley values both measure channel contribution through counterfactual analysis. They differ in how they model the problem:

Markov chains excel when journey sequence matters. The transition probability from Paid Search to Email might differ meaningfully from Email to Paid Search. Markov models capture these directional asymmetries because they explicitly model transitions. If the order in which channels appear in the journey matters to your business — and it usually does in e-commerce and SaaS — Markov chains preserve that information.

Shapley values treat channels as interchangeable coalition members without inherent ordering. They answer “does this channel contribute?” rather than “does this channel contribute at this point in the journey?” This makes sense when you care more about which channels appear than the order they appear in.

In practice, Markov chains are more common for three reasons:

Lower computational cost — matrix operations vs. 2^n coalition evaluations
Intuitive interpretation — transition probabilities map to real customer behavior that stakeholders can understand and validate
Good empirical performance — removal effects produce channel valuations that align well with incrementality test results

Shapley values shine in specific scenarios:

Regulatory requirements demand provable methodology with documented fairness guarantees
Stakeholder trust requires mathematical proof that the attribution is unbiased
Channel count is manageable (under 15) making exact computation feasible
Coalition effects matter more than sequence effects — you need to understand how channels interact in groups, not just in sequences

Implementation approach

Unlike Markov attribution, where SQL handles path extraction and Python handles matrix operations, Shapley value computation is almost entirely in Python. The SQL portion is simpler — you need conversion data segmented by which channels were present in each journey, but you don’t need the sequential path information.

-- For each conversion, which channels were present?
SELECT
  conversion_id,
  ARRAY_AGG(DISTINCT channel) AS channels_present,
  revenue
FROM {{ ref('int__touchpoints') }}
GROUP BY conversion_id, revenue

Python then:

Identifies all unique channels
Generates coalitions (all subsets, or Monte Carlo samples for large channel counts)
Calculates conversion probability for each coalition
Computes marginal contributions
Weights and sums to produce Shapley values

Libraries like shap (originally designed for ML model explanation) and custom implementations handle this. The marketing-attribution-models Python package includes Shapley attribution alongside Markov chains.

Practical considerations

Channel grouping matters even more for Shapley values than for Markov chains. Every additional channel doubles the number of coalitions. Collapsing “Facebook Ads” and “Instagram Ads” into “Meta Ads” isn’t just about data sparsity — it’s about computational feasibility.

Running both Markov and Shapley on the same data provides a useful cross-check. When both methods rank channels similarly, confidence is high. When they diverge, investigate whether the disagreement is driven by sequence effects (which Markov captures but Shapley ignores) or coalition effects (which Shapley captures more rigorously). This feeds into the broader disagreement-as-signal framework.

No attribution model captures perfect truth — customer decisions are influenced by factors no model can measure (word-of-mouth, competitor actions, life circumstances). Shapley values provide a provably fair distribution of credit among measured channels. The tradeoff is the additional computational cost versus heuristic approaches.