Unit Testing Attribution Models in dbt

Attribution models determine which marketing channels receive credit for conversions. Common models — first-touch, last-touch, linear, time-decay — all use window functions to identify the relevant touchpoint. Each model needs specific test scenarios to verify the window logic is correct.

Testing First-Touch and Last-Touch

-- models/marts/marketing/mrt__marketing__customer_attribution.sql
select
    customer_id,
    first_value(utm_source) over (
        partition by customer_id
        order by session_start
    ) as first_touch_source,
    first_value(utm_medium) over (
        partition by customer_id
        order by session_start
    ) as first_touch_medium,
    last_value(utm_source) over (
        partition by customer_id
        order by session_start
        rows between unbounded preceding and unbounded following
    ) as last_touch_source,
    max(case when converted then session_start end) as conversion_timestamp
from {{ ref('int__customers_sessions') }}

unit_tests:
  - name: test_mrt_marketing_customer_attribution_first_last_touch
    model: mrt__marketing__customer_attribution
    description: "First touch captures initial source, last touch captures converting source"
    given:
      - input: ref('int__customers_sessions')
        rows:
          # Customer journey: Facebook → Google → Direct (conversion)
          - {customer_id: 100, session_start: "2024-06-01 10:00:00", utm_source: "facebook", utm_medium: "paid", converted: false}
          - {customer_id: 100, session_start: "2024-06-05 14:00:00", utm_source: "google", utm_medium: "organic", converted: false}
          - {customer_id: 100, session_start: "2024-06-10 09:00:00", utm_source: "direct", utm_medium: "none", converted: true}
          # Single-touch conversion
          - {customer_id: 101, session_start: "2024-06-02 11:00:00", utm_source: "email", utm_medium: "newsletter", converted: true}
    expect:
      rows:
        - {customer_id: 100, first_touch_source: "facebook", first_touch_medium: "paid", last_touch_source: "direct"}
        - {customer_id: 101, first_touch_source: "email", first_touch_medium: "newsletter", last_touch_source: "email"}

This test verifies two scenarios:

Customer 100’s multi-touch journey: First encounter via Facebook paid ads (June 1), returns via Google organic (June 5), converts via direct visit (June 10). First-touch credit goes to Facebook (where the journey began). Last-touch credit goes to Direct (the final touchpoint). If your FIRST_VALUE or LAST_VALUE window functions have incorrect ordering or framing, this test catches it.

Customer 101’s single-touch conversion: When there’s only one session before conversion, both first-touch and last-touch should be the same channel. This edge case often breaks when implementations assume at least two touchpoints.

Testing the No-Conversion Exclusion

Attribution only makes sense for customers who actually converted. Non-converting visitors should be excluded entirely from the attribution model.

unit_tests:
  - name: test_mrt_marketing_customer_attribution_no_conversion
    model: mrt__marketing__customer_attribution
    description: "Customers without conversion should not appear"
    given:
      - input: ref('int__customers_sessions')
        rows:
          - {customer_id: 200, session_start: "2024-06-01 10:00:00", utm_source: "facebook", utm_medium: "paid", converted: false}
          - {customer_id: 200, session_start: "2024-06-05 14:00:00", utm_source: "google", utm_medium: "organic", converted: false}
    expect:
      rows: []

The empty expect block with rows: [] verifies that Customer 200, who had sessions but never purchased, is excluded from the attribution output. This test is equally important as the positive cases — without it, non-converting visitors might appear in attribution reports and skew channel performance metrics.

Three Test Scenarios for Any Attribution Model

Regardless of which attribution model you’re implementing, these three scenarios should always be tested:

Multi-touch journey: The typical customer who interacts with multiple channels before converting. Verifies that the right touchpoint gets credit based on the attribution model’s rules.
Single-touch conversion: A customer who converts on their first visit. Both first-touch and last-touch should agree. Linear attribution should give 100% to the single channel. This edge case breaks models that assume multiple touchpoints.
No conversion: Visitors who never convert should be excluded (or treated distinctly). Verifies the conversion filter works correctly.

Attribution-Specific Testing Pitfalls

LAST_VALUE frame specification: The default window frame for LAST_VALUE is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which returns the current row — not the actual last row in the partition. If your model uses LAST_VALUE, make sure it specifies ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Your test will catch this immediately if the last-touch source comes back wrong.

Touchpoint ordering: If two sessions have the same timestamp (possible with daily-grain data), the attribution assignment becomes non-deterministic. Include a tiebreaker in your ORDER BY, or add a test case with tied timestamps to document the expected behavior.

Null UTM sources: Direct traffic often has null UTM parameters. Test whether your model treats null UTM source as “direct” or leaves it null. The test documents whichever convention you’ve chosen.

For more complex attribution approaches like linear or time-decay, the test data becomes more elaborate but the principle stays the same: design test journeys that would produce wrong credit distribution if the weighting logic were incorrect.