ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Unit Test Mocking Dependencies

How to mock refs, sources, macros, variables, and the 'this' keyword in dbt unit tests — with patterns for multi-join models and incremental overrides.

Planted
dbttesting

Unit tests isolate transformation logic by replacing every dependency — upstream models, sources, macros, variables — with controlled values. The test verifies only the transformation, not upstream data quality or macro environment differences.

Mocking Multiple Refs and Sources

Most models join multiple tables. You need to provide mock data for each dependency your model references:

unit_tests:
- name: test_order_enrichment
model: mrt__sales__orders
given:
- input: ref('base__shopify__orders')
rows:
- {order_id: 1, customer_id: 100, product_id: 500, quantity: 2}
- input: ref('int__customers_enriched')
rows:
- {customer_id: 100, customer_segment: "enterprise"}
- input: ref('base__shopify__products')
rows:
- {product_id: 500, unit_price: 49.99}
expect:
rows:
- {order_id: 1, customer_segment: "enterprise", order_value: 99.98}

The critical insight: you only need to include columns your model actually references. If int__customers_enriched has 50 columns but your model only joins on customer_id and selects customer_segment, those two columns are sufficient in your mock. dbt fills in defaults for everything else.

This is a significant convenience. Without it, mocking a model that joins three 30-column tables would require writing out 90 columns per test row. With it, you typically need 3-5 columns per input — just the join keys and the fields your logic actually transforms.

For sources, use source() instead of ref():

given:
- input: source('salesforce', 'accounts')
rows:
- {id: "001ABC", name: "Acme Corp", type: "Customer"}

The syntax mirrors exactly how you reference sources in your model SQL: source('schema_name', 'table_name').

Overriding Macros

Many models use macros for timestamps, conditional logic, or utility functions. These introduce non-determinism — a test that depends on current_timestamp() will produce different results depending on when it runs. Override them for deterministic tests:

unit_tests:
- name: test_created_date_logic
model: mrt__core__customers
overrides:
macros:
dbt_utils.current_timestamp: "'2024-06-15 10:00:00'"
given:
- input: ref('base__crm__customers')
rows:
- {customer_id: 1, created_at: "2024-06-01 09:00:00"}
expect:
rows:
- {customer_id: 1, is_recent: true}

The overrides.macros block replaces the macro’s output with a static value during the test. The key is the fully qualified macro name (package + macro name), and the value is the SQL expression that should replace it.

This is particularly important for time-based logic. A model that calculates “is this customer active in the last 30 days” using current_timestamp() will behave differently on different days. By pinning the timestamp, you make the test deterministic: created_at of June 1 with a current time of June 15 is always 14 days ago, always within 30 days, always is_recent = true.

Overriding Variables

If your model uses var() for configurable thresholds or feature flags, override them in the test:

unit_tests:
- name: test_with_custom_threshold
model: mrt__core__customers
overrides:
vars:
days_threshold: 30
given:
- input: ref('base__crm__customers')
rows:
- {customer_id: 1, last_order_date: "2024-05-20"}
expect:
rows:
- {customer_id: 1, is_active: true}

Variable overrides are simpler than macro overrides — just key-value pairs in the overrides.vars block. The variable name doesn’t need a package qualifier.

This is valuable for testing boundary conditions. If days_threshold defaults to 90, you can write one test with days_threshold: 30 and another with days_threshold: 7 to verify the logic handles different thresholds correctly without changing the model code.

The this Keyword for Incremental Models

For incremental models, this represents the current state of the target table — the rows that already exist before the incremental run. This is the most powerful mocking pattern because it lets you test the merge logic explicitly:

unit_tests:
- name: test_incremental_dedup
model: int__events_deduplicated
overrides:
macros:
is_incremental: true
given:
- input: ref('base__ga4__events')
rows:
- {event_id: 1, event_time: "2024-06-15"}
- {event_id: 2, event_time: "2024-06-16"}
- input: this
rows:
- {event_id: 1, event_time: "2024-06-15"} # Already exists
expect:
rows:
- {event_id: 2, event_time: "2024-06-16"} # Only new row

Two things are happening here:

  1. overrides.macros.is_incremental: true forces the model into incremental mode. Without this, the {% if is_incremental() %} block in your model would evaluate to false, and the test would run the full-refresh path instead of the incremental path.

  2. input: this mocks the current table state. The rows you put here represent what’s already in the target table from previous runs.

Together, they let you verify the exact behavior of your merge logic: does it correctly deduplicate? Does it handle late-arriving data? Does it update existing records when it should? Does it leave existing records untouched when it should?

Incremental merge bugs accumulate silently across runs and may require backfilling months of data. A unit test that mocks this catches the bug before it reaches production.

What Happens When You Forget a Dependency

If your model references a ref() or source() that you haven’t mocked in the given block, dbt will throw a “node not found” error during compilation. The fix is straightforward: add the missing input to given, even if you only need it with minimal columns.

For inputs your model joins but doesn’t actually need for the specific scenario you’re testing, provide a minimal mock:

given:
# The input your test cares about
- input: ref('base__shopify__orders')
rows:
- {order_id: 1, quantity: 2, unit_price: 50.00}
# An input your model joins but this test doesn't care about
- input: ref('base__shopify__customers')
rows:
- {customer_id: 1} # Just the join key, minimum viable mock

The rule: every ref(), source(), or this in your model SQL needs a corresponding entry in given. You can minimize what you put in each one, but the entry must exist.

Practical Tips

Start with the happy path. Mock the simplest scenario first — one row per input, expected output matches basic logic. Once that passes, add edge case rows: nulls, boundary values, empty strings, negative numbers.

Name your test rows semantically. You can’t add comments inside YAML dict rows, but you can use IDs that communicate intent: {customer_id: 1, ...} for the normal case, {customer_id: 99, ...} for the edge case. Or add a description field to the test itself explaining what each row tests.

Don’t over-mock. If your model joins five tables but the logic you’re testing only involves two of them, the other three inputs just need join keys and nothing else. Keep the focus on what the test is actually verifying.

Test one behavior per test. Rather than one massive test with 20 rows covering every scenario, write separate tests for each behavior: test_discount_calculation, test_null_quantity_handling, test_negative_amount_edge_case. This makes failures immediately diagnostic — you know exactly what broke.