ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Attribution Comparison Pattern

How to structure a dbt project for multi-model attribution — running first-touch, last-touch, linear, position-based, and time-decay models in parallel with a union comparison layer

Planted
dbtbigqueryga4data modelinganalytics

A multi-model attribution system in dbt needs clean separation between the attribution logic and the comparison layer. Each model runs independently, implementing one attribution approach against the same touchpoint data. A final comparison model unions them together with a model_type discriminator for dashboard consumption. This pattern follows the dbt Three-Layer Architecture — base models clean GA4 events, intermediate models build sessionized touchpoint paths, and marts implement individual attribution models plus the comparison union.

Project structure

models/
├── base/
│ └── base__ga4__events.sql # Raw event cleaning
├── intermediate/
│ ├── int__events_sessionized.sql # Sessionization
│ ├── int__sessions_enriched.sql # Marketing touchpoints
│ └── int__touchpoints_pathed.sql # User journey paths
└── marts/
└── attribution/
├── mrt__attribution__first_touch.sql
├── mrt__attribution__last_touch.sql
├── mrt__attribution__linear.sql
├── mrt__attribution__position_based.sql
├── mrt__attribution__time_decay.sql
├── mrt__attribution__conversions.sql
└── mrt__attribution__comparison.sql

Each mrt__attribution__* model implements one attribution approach and outputs a consistent schema: conversion__id, touchpoint__channel, touchpoint__attributed_revenue, and any additional dimensions for analysis (campaign, converted_at, user segment). The consistent schema is critical — the comparison model depends on every upstream model producing the same columns.

The mrt__attribution__conversions model holds the ground truth: actual conversions with actual revenue. It serves as the reference table for validation tests.

Following the naming convention, all attribution models live under marts/attribution/ — organized by business domain, with the mrt__ prefix making the layer obvious.

The comparison model

The comparison model unions individual attribution models and adds an identifier:

-- mrt__attribution__comparison.sql
WITH first_touch AS (
SELECT
conversion__id,
touchpoint__channel,
touchpoint__attributed_revenue,
conversion__converted_at,
'first_touch' AS model_type
FROM {{ ref('mrt__attribution__first_touch') }}
),
last_touch AS (
SELECT
conversion__id,
touchpoint__channel,
touchpoint__attributed_revenue,
conversion__converted_at,
'last_touch' AS model_type
FROM {{ ref('mrt__attribution__last_touch') }}
),
linear AS (
SELECT
conversion__id,
touchpoint__channel,
touchpoint__attributed_revenue,
conversion__converted_at,
'linear' AS model_type
FROM {{ ref('mrt__attribution__linear') }}
),
position_based AS (
SELECT
conversion__id,
touchpoint__channel,
touchpoint__attributed_revenue,
conversion__converted_at,
'position_based' AS model_type
FROM {{ ref('mrt__attribution__position_based') }}
),
time_decay AS (
SELECT
conversion__id,
touchpoint__channel,
touchpoint__attributed_revenue,
conversion__converted_at,
'time_decay' AS model_type
FROM {{ ref('mrt__attribution__time_decay') }}
)
SELECT * FROM first_touch
UNION ALL
SELECT * FROM last_touch
UNION ALL
SELECT * FROM linear
UNION ALL
SELECT * FROM position_based
UNION ALL
SELECT * FROM time_decay

This produces one row per conversion per channel per model. A single conversion with three touchpoints generates 15 rows (3 touchpoints x 5 models). The model_type column becomes the primary filter in your attribution dashboard.

Why CTEs instead of direct UNION ALL

You could write SELECT *, 'first_touch' AS model_type FROM {{ ref('mrt__attribution__first_touch') }} UNION ALL ... directly. The CTE approach is more maintainable for three reasons:

  1. Explicit column selection. If an upstream model adds a column, the comparison model won’t silently change shape. You control exactly which columns pass through.
  2. Easier debugging. You can comment out a CTE to isolate issues with a specific model.
  3. Readability. When adding a sixth model, the pattern is obvious: add a CTE, add a UNION ALL.

Row multiplication awareness

The comparison table can get large. If you have 10,000 conversions with an average of 4 touchpoints each, one model produces 40,000 rows. Five models produce 200,000 rows. This isn’t usually a problem for BigQuery, but it matters for BI tools with row limits and for cost optimization. Pre-aggregating to channel-level summaries in a downstream model reduces the row count dramatically for dashboard consumption.

Adding a new model

The pattern makes adding models mechanical:

  1. Create mrt__attribution__new_model.sql implementing the attribution logic with the standard column contract.
  2. Add a CTE to the comparison model.
  3. Add a UNION ALL referencing the new CTE.
  4. The new model automatically appears as an option in dashboard filters.

No dashboard changes needed. No new data connections. The model_type dropdown in the BI tool picks up the new value automatically because it reads from the comparison table.

Validation: revenue integrity testing

Every attribution model should pass a fundamental integrity check: attributed revenue per conversion must sum to actual conversion revenue. The normalization step in each model’s SQL should guarantee this, but data quirks and edge cases can break it silently.

Add a dbt test that verifies the sum across each model:

_attribution__models.yml
models:
- name: mrt__attribution__comparison
description: >
Union of all attribution models with model_type discriminator.
One row per conversion per channel per model.
columns:
- name: conversion__id
data_tests:
- not_null
- name: model_type
data_tests:
- accepted_values:
values:
- first_touch
- last_touch
- linear
- position_based
- time_decay
tests:
- dbt_utils.expression_is_true:
expression: >
ABS(SUM(touchpoint__attributed_revenue) -
(SELECT SUM(conversion__revenue)
FROM {{ ref('mrt__attribution__conversions') }})) < 0.01
group_by_columns: ['model_type']

The group_by_columns parameter runs the test once per model. If linear attribution sums correctly but position-based doesn’t, you’ll know exactly which model has the issue. The 0.01 tolerance handles floating-point rounding.

This is a singular-test-style assertion using the dbt-utils expression_is_true generic test. For additional rigor, write a singular test that identifies specific conversions where attribution doesn’t balance:

-- tests/assert_attribution_revenue_balances.sql
WITH attributed AS (
SELECT
model_type,
conversion__id,
SUM(touchpoint__attributed_revenue) AS total_attributed
FROM {{ ref('mrt__attribution__comparison') }}
GROUP BY model_type, conversion__id
),
actual AS (
SELECT
conversion__id,
conversion__revenue
FROM {{ ref('mrt__attribution__conversions') }}
)
SELECT
a.model_type,
a.conversion__id,
a.total_attributed,
c.conversion__revenue,
ABS(a.total_attributed - c.conversion__revenue) AS diff
FROM attributed a
JOIN actual c ON a.conversion__id = c.conversion__id
WHERE ABS(a.total_attributed - c.conversion__revenue) > 0.01

Zero rows returned means every conversion balances across every model. Any rows returned identify exactly which conversion and which model have an issue, making debugging straightforward.

Schema contract for upstream models

To keep the comparison model reliable, enforce a consistent schema across all attribution models. A model contract prevents upstream changes from silently breaking the union:

models:
- name: mrt__attribution__first_touch
config:
contract:
enforced: true
columns:
- name: conversion__id
data_type: string
constraints:
- type: not_null
- name: touchpoint__channel
data_type: string
constraints:
- type: not_null
- name: touchpoint__attributed_revenue
data_type: numeric
- name: conversion__converted_at
data_type: timestamp

Apply the same contract to every mrt__attribution__* model. If someone adds a column to first-touch without adding it to the others, the comparison model’s SELECT * approach would break the union. Contracts make this explicit rather than a runtime surprise.

Pre-aggregation for dashboard performance

The comparison table at touchpoint level is the source of truth, but most dashboard views don’t need touchpoint-level granularity. A summary model aggregates for common dashboard queries:

-- mrt__attribution__comparison_summary.sql
SELECT
model_type,
touchpoint__channel,
DATE(conversion__converted_at) AS conversion_date,
COUNT(DISTINCT conversion__id) AS conversions,
SUM(touchpoint__attributed_revenue) AS attributed_revenue
FROM {{ ref('mrt__attribution__comparison') }}
GROUP BY ALL

This table is dramatically smaller — one row per model per channel per day instead of one row per conversion per channel per model. Looker Studio performs better with fewer rows, and BigQuery charges for bytes scanned, so pre-aggregating avoids repeated full-table scans from the BI layer.

Point dashboards at this summary table for standard views. Keep the detailed comparison table available for analyst-level deep dives into specific conversions or paths.