Full Markov chain attribution requires matrix operations that push SQL beyond its comfort zone. The practical approach is to use SQL for path extraction and transition probability calculation, then hand off to Python for the removal effect computation. SQL does the heavy data preparation; Python handles the linear algebra.
This note covers the SQL portion — extracting journey paths from touchpoint data and computing transition counts — which is where most of the engineering effort lives.
Path extraction in BigQuery
The first step is extracting journey paths from your touchpoint data. You need two datasets: paths that ended in conversion and paths that didn’t. Both matter for the Markov model because non-converting paths establish the NULL state transitions.
WITH touchpoints AS ( SELECT user_id, channel, event_timestamp, conversion_id FROM {{ ref('int__touchpoints') }} WHERE event_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)),
converting_paths AS ( SELECT t.user_id, t.conversion_id, STRING_AGG(t.channel, ' > ' ORDER BY t.event_timestamp) AS journey_path, 'conversion' AS outcome FROM touchpoints t INNER JOIN {{ ref('int__conversions') }} c ON t.user_id = c.user_id AND t.conversion_id = c.conversion_id WHERE t.event_timestamp BETWEEN TIMESTAMP_SUB(c.conversion_timestamp, INTERVAL 30 DAY) AND c.conversion_timestamp GROUP BY t.user_id, t.conversion_id),
non_converting_paths AS ( SELECT t.user_id, CAST(NULL AS STRING) AS conversion_id, STRING_AGG(t.channel, ' > ' ORDER BY t.event_timestamp) AS journey_path, 'null' AS outcome FROM touchpoints t LEFT JOIN {{ ref('int__conversions') }} c ON t.user_id = c.user_id WHERE c.user_id IS NULL GROUP BY t.user_id)
SELECT user_id, conversion_id, journey_path, outcomeFROM converting_paths
UNION ALL
SELECT user_id, conversion_id, journey_path, outcomeFROM non_converting_pathsThis produces rows like:
| user_id | journey_path | outcome |
|---|---|---|
| user_123 | Paid Search > Email > Direct | conversion |
| user_456 | Organic > Paid Search > Organic | null |
| user_789 | Email > Email > Direct | conversion |
A few design decisions to note:
- The 90-day outer window scopes how far back you pull touchpoints. This should be wider than your attribution lookback window to capture full journey context.
- The 30-day conversion window in the
converting_pathsCTE defines which touchpoints qualify for a specific conversion. Adjust this to match your sales cycle. - Non-converting paths include users who never converted at all. These are essential for accurate transition probabilities — without them, every path ends in conversion and the model can’t distinguish high-probability from low-probability channels.
Transition counting
From the extracted paths, calculate transition counts — how often each state-to-state transition occurs. This is where the Markov model’s structure takes shape.
WITH path_transitions AS ( SELECT journey_path, outcome, SPLIT(journey_path, ' > ') AS channels FROM journey_paths),
transitions AS ( SELECT 'START' AS from_state, channels[OFFSET(0)] AS to_state, COUNT(*) AS transition_count FROM path_transitions GROUP BY from_state, to_state
UNION ALL
SELECT channels[OFFSET(i)] AS from_state, channels[OFFSET(i + 1)] AS to_state, COUNT(*) AS transition_count FROM path_transitions, UNNEST(GENERATE_ARRAY(0, ARRAY_LENGTH(channels) - 2)) AS i GROUP BY from_state, to_state
UNION ALL
SELECT channels[OFFSET(ARRAY_LENGTH(channels) - 1)] AS from_state, CASE outcome WHEN 'conversion' THEN 'CONVERSION' ELSE 'NULL' END AS to_state, COUNT(*) AS transition_count FROM path_transitions GROUP BY from_state, to_state)
SELECT from_state, to_state, transition_count, transition_count / SUM(transition_count) OVER (PARTITION BY from_state) AS transition_probabilityFROM transitionsThis query handles three types of transitions:
- START transitions — from START to the first channel in each journey
- Channel-to-channel transitions — using
GENERATE_ARRAYandUNNESTto iterate through each consecutive pair in the path - Terminal transitions — from the last channel to either CONVERSION or NULL
The transition_probability column is calculated by dividing each transition count by the total transitions from that state, using a window function. This gives you the Markov transition matrix in tabular form.
From SQL to Python
The transition probability table is where SQL’s job ends. The next steps — computing the overall conversion probability from START, removing each channel and recomputing, then normalizing removal effects into attribution shares — require matrix operations that SQL can’t handle efficiently.
Export the transition table to Python and use a library:
- R: The
ChannelAttributionpackage has C++ backends optimized for scale - Python:
marketing-attribution-modelsor custom NumPy implementations
The handoff is clean: SQL produces a table with from_state, to_state, and transition_probability columns. Python reads it, constructs the matrix, and computes removal effects.
Fitting this into dbt
In a dbt project, this naturally extends the attribution comparison pattern:
models/├── intermediate/│ ├── int__touchpoints.sql│ ├── int__conversions.sql│ └── int__journey_paths.sql # Path extraction query└── marts/ └── attribution/ ├── mrt__attribution__transitions.sql # Transition counting ├── mrt__attribution__first_touch.sql ├── mrt__attribution__last_touch.sql └── mrt__attribution__comparison.sqlThe int__journey_paths model extracts paths. The mrt__attribution__transitions model computes transition probabilities. A Python script (or dbt Python model in dbt Core 1.3+) reads the transitions table and writes Markov attribution results back to the warehouse.
The Markov results can then be added to your comparison model alongside heuristic models, giving you a model_type = 'markov' row in the comparison table.
Data quality considerations
The quality of Markov attribution depends entirely on the quality of the underlying transition data.
Channel grouping is critical. Too many channels produces sparse transition matrices where many cell values are zero or based on a handful of observations. Start with 5-10 high-level channel groups and add granularity as data volume grows.
Minimum data requirements: aim for roughly 10 times as many transitions as unique transition types. With 10 channels (plus START, CONVERSION, NULL), you have up to 13 x 13 = 169 possible transitions. That means at least 1,690 total path transitions, typically achievable with a few hundred conversions.
Higher-order models (where the next state depends on the current plus previous states) require exponentially more data. A second-order model with 10 channels has up to 169 x 13 = 2,197 possible transitions. Unless you have tens of thousands of conversions, stick with first-order.
Repeated channels in a path (Email > Email > Email) are valid and common. They represent users returning to the same channel multiple times. Don’t deduplicate consecutive touches unless you have a specific reason to — the self-transition probability (Email -> Email) is meaningful information about channel stickiness.