Native dbt unit tests don’t directly support snapshot resources. You can’t write a unit test with model: my_snapshot because snapshots aren’t models — they’re a separate resource type with their own execution path.
This limitation makes sense when you think about it. Snapshots detect changes over time, comparing the current source state to the previous snapshot state. Unit tests run in isolation with mocked data — there’s no “previous state” to compare against.
So how do you test snapshot-related logic? Three strategies, each targeting a different part of the pipeline.
Strategy 1: Test Pre-Snapshot Base Models
The transformation that prepares data for a snapshot is just as important as the snapshot itself. If your staging model has bugs, the snapshot will faithfully preserve those bugs forever.
unit_tests: - name: test_base_crm_users_snapshot_ready model: base__crm__users description: "Base model should correctly prepare data for snapshot" given: - input: source('crm', 'users') rows: - {id: 1, name: "Alice", email: "alice@example.com", _updated_at: "2024-06-01"} - {id: 1, name: "Alice Smith", email: "alice@example.com", _updated_at: "2024-06-15"} expect: rows: - {user_id: 1, user_name: "Alice", email: "alice@example.com", updated_at: "2024-06-01"} - {user_id: 1, user_name: "Alice Smith", email: "alice@example.com", updated_at: "2024-06-15"}This test verifies that:
- Raw column names are correctly transformed (
idtouser_id,nametouser_name) - The
updated_atcolumn (which the snapshot will use for change detection) is properly exposed - Multiple versions of the same user are preserved — critical for SCD2 to work correctly
If the base model accidentally deduplicates or filters out older versions, the snapshot will never see the changes it needs to track.
Strategy 2: Test SCD2 Date Range Consumers
After the snapshot runs, downstream models typically consume the snapshot output and calculate derived fields like valid_to dates and is_current flags. This is where window functions come into play, and where bugs love to hide.
-- models/intermediate/int__users_history.sqlselect user_id, user_name, valid_from, coalesce( lead(valid_from) over (partition by user_id order by valid_from), '2199-12-31' ) as valid_to, case when lead(valid_from) over (partition by user_id order by valid_from) is null then true else false end as is_currentfrom {{ ref('snp_users') }}unit_tests: - name: test_int_users_history_date_ranges model: int__users_history description: "Valid_to should be the next version's valid_from, or 2199-12-31 for current" given: - input: ref('snp_users') rows: - {user_id: 1, user_name: "Alice", valid_from: "2024-01-01"} - {user_id: 1, user_name: "Alice Smith", valid_from: "2024-06-15"} - {user_id: 2, user_name: "Bob", valid_from: "2024-03-01"} expect: rows: - {user_id: 1, user_name: "Alice", valid_from: "2024-01-01", valid_to: "2024-06-15", is_current: false} - {user_id: 1, user_name: "Alice Smith", valid_from: "2024-06-15", valid_to: "2199-12-31", is_current: true} - {user_id: 2, user_name: "Bob", valid_from: "2024-03-01", valid_to: "2199-12-31", is_current: true}This test verifies several things at once:
- Date range calculation: Alice’s first record has
valid_to = "2024-06-15"(the next version’svalid_from), while her second record uses the far-future sentinel date. - Current flag logic: Only the most recent version of each user has
is_current: true. - Single-version handling: Bob has only one record, so he’s automatically current with the sentinel
valid_to. - Partition isolation: Alice’s records don’t affect Bob’s calculations — each user_id is a separate partition.
This is the kind of logic that’s nearly impossible to verify through code review alone. The window function partitioning, ordering, and frame specifications all interact in ways that only reveal bugs with actual data.
Strategy 3: Test Change Detection Hashing
Some teams implement their own change detection using row-level hashes, especially when sources don’t have reliable updated_at timestamps. The idea: concatenate relevant columns, hash the result, compare hashes between runs to detect changes.
Hash functions are sensitive to column ordering, null handling, and data type casting. A seemingly cosmetic change (reordering columns in your hash) can completely break change detection.
unit_tests: - name: test_change_detection_hash model: base__crm__users_with_hash description: "Row hash should change when any tracked column changes" given: - input: source('crm', 'users') rows: - {id: 1, name: "Alice", email: "alice@example.com"} - {id: 2, name: "Bob", email: "bob@example.com"} expect: rows: - {user_id: 1, row_hash: "abc123"} - {user_id: 2, row_hash: "def456"}To write this test, you’ll need to calculate the expected hash values manually (or run your model once and capture the outputs). The test then becomes a regression safeguard — if someone accidentally modifies the hash logic, the test catches it.
A useful companion test verifies that the hash actually changes when data changes:
unit_tests: - name: test_change_detection_hash_sensitivity model: base__crm__users_with_hash description: "Hash should differ when email changes" given: - input: source('crm', 'users') rows: - {id: 1, name: "Alice", email: "alice@example.com"} - {id: 1, name: "Alice", email: "alice.new@example.com"} expect: rows: - {user_id: 1, row_hash: "abc123"} - {user_id: 1, row_hash: "xyz789"}The first test proves the hash is deterministic. The second proves it’s sensitive to changes. Together, they protect the foundation of your change detection pipeline.
Choosing a Strategy
In practice, you’ll often use all three:
- Pre-snapshot tests prevent bad data from entering the history table permanently
- Consumer tests verify that the SCD2 windowing logic produces correct date ranges and current flags
- Hash tests protect the change detection mechanism itself
The consumer tests (Strategy 2) are typically the highest value because SCD2 date range logic is complex, hard to verify by eye, and has direct impact on how downstream queries filter historical data.