A multi-model attribution system in dbt needs clean separation between the attribution logic and the comparison layer. Each model runs independently, implementing one attribution approach against the same touchpoint data. A final comparison model unions them together with a model_type discriminator for dashboard consumption. This pattern follows the dbt Three-Layer Architecture — base models clean GA4 events, intermediate models build sessionized touchpoint paths, and marts implement individual attribution models plus the comparison union.
Project structure
models/├── base/│ └── base__ga4__events.sql # Raw event cleaning├── intermediate/│ ├── int__events_sessionized.sql # Sessionization│ ├── int__sessions_enriched.sql # Marketing touchpoints│ └── int__touchpoints_pathed.sql # User journey paths└── marts/ └── attribution/ ├── mrt__attribution__first_touch.sql ├── mrt__attribution__last_touch.sql ├── mrt__attribution__linear.sql ├── mrt__attribution__position_based.sql ├── mrt__attribution__time_decay.sql ├── mrt__attribution__conversions.sql └── mrt__attribution__comparison.sqlEach mrt__attribution__* model implements one attribution approach and outputs a consistent schema: conversion__id, touchpoint__channel, touchpoint__attributed_revenue, and any additional dimensions for analysis (campaign, converted_at, user segment). The consistent schema is critical — the comparison model depends on every upstream model producing the same columns.
The mrt__attribution__conversions model holds the ground truth: actual conversions with actual revenue. It serves as the reference table for validation tests.
Following the naming convention, all attribution models live under marts/attribution/ — organized by business domain, with the mrt__ prefix making the layer obvious.
The comparison model
The comparison model unions individual attribution models and adds an identifier:
-- mrt__attribution__comparison.sql
WITH first_touch AS ( SELECT conversion__id, touchpoint__channel, touchpoint__attributed_revenue, conversion__converted_at, 'first_touch' AS model_type FROM {{ ref('mrt__attribution__first_touch') }}),
last_touch AS ( SELECT conversion__id, touchpoint__channel, touchpoint__attributed_revenue, conversion__converted_at, 'last_touch' AS model_type FROM {{ ref('mrt__attribution__last_touch') }}),
linear AS ( SELECT conversion__id, touchpoint__channel, touchpoint__attributed_revenue, conversion__converted_at, 'linear' AS model_type FROM {{ ref('mrt__attribution__linear') }}),
position_based AS ( SELECT conversion__id, touchpoint__channel, touchpoint__attributed_revenue, conversion__converted_at, 'position_based' AS model_type FROM {{ ref('mrt__attribution__position_based') }}),
time_decay AS ( SELECT conversion__id, touchpoint__channel, touchpoint__attributed_revenue, conversion__converted_at, 'time_decay' AS model_type FROM {{ ref('mrt__attribution__time_decay') }})
SELECT * FROM first_touchUNION ALLSELECT * FROM last_touchUNION ALLSELECT * FROM linearUNION ALLSELECT * FROM position_basedUNION ALLSELECT * FROM time_decayThis produces one row per conversion per channel per model. A single conversion with three touchpoints generates 15 rows (3 touchpoints x 5 models). The model_type column becomes the primary filter in your attribution dashboard.
Why CTEs instead of direct UNION ALL
You could write SELECT *, 'first_touch' AS model_type FROM {{ ref('mrt__attribution__first_touch') }} UNION ALL ... directly. The CTE approach is more maintainable for three reasons:
- Explicit column selection. If an upstream model adds a column, the comparison model won’t silently change shape. You control exactly which columns pass through.
- Easier debugging. You can comment out a CTE to isolate issues with a specific model.
- Readability. When adding a sixth model, the pattern is obvious: add a CTE, add a UNION ALL.
Row multiplication awareness
The comparison table can get large. If you have 10,000 conversions with an average of 4 touchpoints each, one model produces 40,000 rows. Five models produce 200,000 rows. This isn’t usually a problem for BigQuery, but it matters for BI tools with row limits and for cost optimization. Pre-aggregating to channel-level summaries in a downstream model reduces the row count dramatically for dashboard consumption.
Adding a new model
The pattern makes adding models mechanical:
- Create
mrt__attribution__new_model.sqlimplementing the attribution logic with the standard column contract. - Add a CTE to the comparison model.
- Add a
UNION ALLreferencing the new CTE. - The new model automatically appears as an option in dashboard filters.
No dashboard changes needed. No new data connections. The model_type dropdown in the BI tool picks up the new value automatically because it reads from the comparison table.
Validation: revenue integrity testing
Every attribution model should pass a fundamental integrity check: attributed revenue per conversion must sum to actual conversion revenue. The normalization step in each model’s SQL should guarantee this, but data quirks and edge cases can break it silently.
Add a dbt test that verifies the sum across each model:
models: - name: mrt__attribution__comparison description: > Union of all attribution models with model_type discriminator. One row per conversion per channel per model. columns: - name: conversion__id data_tests: - not_null - name: model_type data_tests: - accepted_values: values: - first_touch - last_touch - linear - position_based - time_decay tests: - dbt_utils.expression_is_true: expression: > ABS(SUM(touchpoint__attributed_revenue) - (SELECT SUM(conversion__revenue) FROM {{ ref('mrt__attribution__conversions') }})) < 0.01 group_by_columns: ['model_type']The group_by_columns parameter runs the test once per model. If linear attribution sums correctly but position-based doesn’t, you’ll know exactly which model has the issue. The 0.01 tolerance handles floating-point rounding.
This is a singular-test-style assertion using the dbt-utils expression_is_true generic test. For additional rigor, write a singular test that identifies specific conversions where attribution doesn’t balance:
-- tests/assert_attribution_revenue_balances.sqlWITH attributed AS ( SELECT model_type, conversion__id, SUM(touchpoint__attributed_revenue) AS total_attributed FROM {{ ref('mrt__attribution__comparison') }} GROUP BY model_type, conversion__id),
actual AS ( SELECT conversion__id, conversion__revenue FROM {{ ref('mrt__attribution__conversions') }})
SELECT a.model_type, a.conversion__id, a.total_attributed, c.conversion__revenue, ABS(a.total_attributed - c.conversion__revenue) AS diffFROM attributed aJOIN actual c ON a.conversion__id = c.conversion__idWHERE ABS(a.total_attributed - c.conversion__revenue) > 0.01Zero rows returned means every conversion balances across every model. Any rows returned identify exactly which conversion and which model have an issue, making debugging straightforward.
Schema contract for upstream models
To keep the comparison model reliable, enforce a consistent schema across all attribution models. A model contract prevents upstream changes from silently breaking the union:
models: - name: mrt__attribution__first_touch config: contract: enforced: true columns: - name: conversion__id data_type: string constraints: - type: not_null - name: touchpoint__channel data_type: string constraints: - type: not_null - name: touchpoint__attributed_revenue data_type: numeric - name: conversion__converted_at data_type: timestampApply the same contract to every mrt__attribution__* model. If someone adds a column to first-touch without adding it to the others, the comparison model’s SELECT * approach would break the union. Contracts make this explicit rather than a runtime surprise.
Pre-aggregation for dashboard performance
The comparison table at touchpoint level is the source of truth, but most dashboard views don’t need touchpoint-level granularity. A summary model aggregates for common dashboard queries:
-- mrt__attribution__comparison_summary.sql
SELECT model_type, touchpoint__channel, DATE(conversion__converted_at) AS conversion_date, COUNT(DISTINCT conversion__id) AS conversions, SUM(touchpoint__attributed_revenue) AS attributed_revenueFROM {{ ref('mrt__attribution__comparison') }}GROUP BY ALLThis table is dramatically smaller — one row per model per channel per day instead of one row per conversion per channel per model. Looker Studio performs better with fewer rows, and BigQuery charges for bytes scanned, so pre-aggregating avoids repeated full-table scans from the BI layer.
Point dashboards at this summary table for standard views. Keep the detailed comparison table available for analyst-level deep dives into specific conversions or paths.