Markov Chain Attribution

A Markov chain models customer journeys as a sequence of states where the probability of moving to the next state depends only on the current state. This is called the “Markov property” or memorylessness — the model doesn’t care how you got to your current state, only where you are now.

For attribution, this means you stop guessing which touchpoint positions matter (that’s what heuristic models do) and start measuring which channels actually drive conversions by calculating what happens when you remove them.

States and transitions

In an attribution Markov chain, states are marketing channels plus three special states:

START — the beginning of every journey
CONVERSION — the journey ends with a successful outcome
NULL — the journey ends without converting

Each observed customer journey contributes to transition probabilities between states. If you see 1,000 journeys and 400 of them go from START to Paid Search, the transition probability from START to Paid Search is 40%.

Consider a simple example with three channels: Paid Search, Email, and Direct. From historical data, you might observe:

40% of users who start go to Paid Search
30% of users at Paid Search move to Email
50% of users at Email convert

These probabilities form a transition matrix — a table where rows are “from” states and columns are “to” states. Each cell contains the probability of that transition based on your historical journey data.

Why the transition matrix matters

The power of this representation is that you can calculate the overall probability of reaching CONVERSION from START by following all possible paths through the matrix. A customer might go START -> Paid Search -> CONVERSION, or START -> Email -> Direct -> CONVERSION, or any other combination. The matrix encodes all of these possibilities simultaneously.

More importantly, you can recalculate that total conversion probability after removing a channel entirely. When you remove Paid Search from the matrix, all transitions into and out of Paid Search disappear. Users who would have gone through Paid Search are forced into other paths — some of which lead to conversion, some to NULL.

The difference between the total conversion probability with all channels and the probability without a specific channel is the removal effect, which is the foundation of Markov chain attribution.

The Markov property in practice

The memorylessness assumption — that the next state depends only on the current state — is a simplification. In reality, a user who arrived at Email from Paid Search probably behaves differently than one who arrived at Email from Organic. Their intent, awareness level, and likelihood of converting differ based on their history.

This simplification is what makes Markov chains computationally tractable. A first-order Markov model (where next state depends only on current state) requires tracking transitions between N states, giving you an N x N matrix. A second-order model (where next state depends on current plus previous state) requires N^2 x N transitions — exponentially more data to estimate reliably.

First-order models work well for most attribution use cases. The loss in accuracy from ignoring journey history is typically small compared to the gain from moving beyond heuristic position-based assumptions. If you have enough data, higher-order models can capture sequence effects, but the data requirements grow fast.

Markov vs. Heuristic Models

Position-based models assume first and last touches deserve more credit. Linear models assume all touches contribute equally. Time-decay assumes recency correlates with influence. These assumptions are not validated against observed data.

Markov chain attribution calculates the removal effect from actual journey data: if a channel is removed, how does the overall conversion probability change? A channel that appears frequently in converting journeys but does not change conversion probability when removed has lower true contribution than position-based credit suggests.

When Markov chains are the right choice

Markov chains excel when the sequential nature of journeys matters to your business. The transition probability from Paid Search to Email might differ meaningfully from Email to Paid Search. Markov models capture these asymmetries because they explicitly model the direction and probability of each transition.

In practice, Markov chains are the most common data-driven attribution approach for three reasons:

Lower computational cost than Shapley values — you need matrix operations rather than evaluating 2^n channel coalitions
Intuitive interpretation — transition probabilities map to real customer behavior that stakeholders can understand
Good empirical performance — the removal effect produces channel valuations that align well with incrementality test results

The main requirement is data volume. You need enough observed journeys to estimate transition probabilities reliably. Aim for roughly 10 times as many transitions as unique transition types. With 10 channels (plus START, CONVERSION, NULL), you have up to 13 x 13 = 169 possible transitions, so you need at least 1,690 total path transitions — typically achievable with a few hundred conversions.

Relationship to other approaches

Markov chains and Shapley values both measure channel contribution through counterfactual analysis — what happens when you remove a channel. They differ in how they calculate the counterfactual: Markov chains use removal effects derived from transition probabilities, while Shapley values use marginal contributions across all possible channel subsets.

Running Markov attribution alongside your existing heuristic model comparison builds stakeholder confidence. When models disagree, the Markov result adds a data-driven perspective to the conversation. When all models agree, confidence is high regardless of methodology.

The SQL implementation handles path extraction and transition counting in your warehouse, with the matrix operations typically moving to Python for the removal effect calculation.