ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Attribution Packages Landscape

Open-source dbt packages and Python libraries for production-ready attribution models -- Snowplow, Tasman, Rittman Analytics, ChannelAttribution, and when to build your own

Planted
dbtbigqueryanalyticsdata modeling

Several open-source dbt packages provide production-ready attribution models covering SQL patterns, edge cases, and testing. Python libraries handle data-driven models (Markov chains, Shapley values) where SQL is insufficient. The relevant question is whether a package fits the data model, warehouse, and team’s ability to maintain it. A package covering 80% of a use case with 20% customization needed is usually worth using. A package requiring fundamental changes to work with the data model may cost more time than building from scratch.

dbt packages

Snowplow dbt-snowplow-attribution

The most mature attribution package in the ecosystem. Built on Snowplow’s behavioral data model but adaptable to other event sources.

What it includes: First-touch, last-touch, linear, position-based, and time-decay models. ROAS calculation by channel. Configurable lookback windows.

Warehouse support: Snowflake, BigQuery, Databricks, Redshift.

Best for: Teams already using Snowplow for event tracking. The package assumes Snowplow’s event schema, so adapting it to GA4 or custom event data requires mapping your events to Snowplow’s expected structure.

Snowplow dbt-snowplow-fractribution

Snowplow’s data-driven attribution package, built on top of dbt-snowplow-attribution.

What it includes: Markov chain and Shapley value attribution. Uses Python models (dbt Core 1.3+) for the matrix operations and coalition calculations that SQL can’t handle efficiently.

Best for: Teams that need data-driven attribution and are running Snowplow. The fractribution package handles the full pipeline from path extraction through removal effect calculation, which is the most complex part of implementing Markov attribution.

Limitations: Requires dbt Python model support, which means your warehouse needs to run Python (BigQuery, Snowflake, and Databricks support this; Redshift and Postgres do not).

TasmanAnalytics/tasman-dbt-mta

A configurable multi-touch, multi-cycle attribution engine.

What it includes: Multiple heuristic models with the ability to handle multi-cycle attribution — attributing revenue not just to the first conversion but across a customer’s entire lifecycle.

Warehouse support: Snowflake, BigQuery.

Best for: Teams that need to attribute across multiple purchase cycles, not just first purchase. This is particularly relevant for subscription businesses where the value of a customer accrues over months or years, and you want to credit the channels that drove each renewal or expansion.

rittmananalytics/ra_attribution

Focused on multi-source data integration for attribution.

What it includes: Attribution models that work across multiple event collection platforms — Snowplow, Segment, RudderStack. Integrates ad spend data alongside web analytics for ROAS calculation.

Best for: Teams ingesting events from multiple sources (not just GA4 or Snowplow) who need a unified attribution layer. If you’re running Segment for product analytics and GA4 for web analytics and need to combine them, this package addresses that integration challenge.

Python libraries

Data-driven attribution models require matrix operations and iterative computation that push beyond SQL’s capabilities. These libraries handle the removal effect calculation for Markov chains and the coalition evaluation for Shapley values.

ChannelAttribution (R and Python)

The most established library for data-driven attribution. Originally an R package, now available in Python. Uses a C++ backend for performance at scale.

What it does: k-order Markov chain attribution with configurable model order. Handles the full pipeline from transition probability estimation through removal effect calculation and credit normalization.

Best for: Teams with meaningful data volume (thousands of converting paths) who need production-grade Markov attribution. The C++ backend makes it practical for datasets that would be slow in pure Python.

Integration pattern: Export the transition probability table from your warehouse, run ChannelAttribution in Python, write results back to the warehouse. In dbt, this can be a Python model (dbt Core 1.3+) or an external script that writes to a table the comparison model reads from.

marketing-attribution-models (DP6)

A Python library covering both Markov and Shapley models alongside all heuristic approaches.

What it does: Markov chain attribution, Shapley value attribution, and all standard heuristic models (first-touch, last-touch, linear, position-based, time-decay) with customizable weights.

Best for: Teams that want a single library covering the full spectrum of attribution approaches. Useful for prototyping — you can quickly compare Markov, Shapley, and heuristic results on the same data before deciding which to productionize.

Build vs. use a package

The decision depends on three factors:

How close is your data model to the package’s expectations? If you’re using Snowplow and considering dbt-snowplow-attribution, adoption is straightforward. If you’re using GA4 and considering a Snowplow package, you’ll spend significant time mapping data structures.

How customized does your attribution need to be? If standard heuristic models with configurable weights cover your needs, packages save time. If you need custom channel grouping logic, non-standard lookback windows per channel, or business-specific conversion definitions, you’ll be fighting the package’s assumptions.

Can your team maintain the dependency? Packages evolve. dbt version upgrades can break package compatibility. Understanding the package well enough to debug issues when they arise requires reading its source code — which means you need to understand the underlying patterns regardless.

For most teams, the practical path is:

  1. Understand the patterns by reading through SQL Attribution Patterns and dbt Attribution Comparison Pattern. You need to know how attribution works before you can evaluate whether a package implements it correctly for your context.
  2. Evaluate packages against your specific data model and warehouse. Run one on a sample of your data to see how much adaptation it requires.
  3. Build custom if the adaptation cost exceeds the build cost. The SQL for heuristic models is straightforward. The harder part — Markov chain and Shapley value computation — is where Python libraries add the most value regardless of whether you use a dbt package for the heuristic models.

The hybrid approach works well: build your own heuristic models in dbt (they’re simple and you want full control over the SQL), use a Python library for data-driven models (the math is complex and well-solved), and maintain the comparison layer yourself so you control how all results come together.