dbt Testing Pyramid

In dbt, data tests form the broad base of the testing pyramid, not unit tests. Data quality issues (duplicates, nulls, stale data, invalid values) are more common than logic bugs in SQL transformations, so the pyramid widens at the layer with the highest risk.

The Pyramid

                    /\
                   /  \     Data Diffs
                  /    \    (dbt-audit-helper)
                 /______\   Dev only, use sparingly
                /        \
               /  Anomaly \   Elementary
              /  Detection \  Key tables only
             /______________\
            /                \
           /    Unit Tests    \   5-10% of models
          /                    \  Complex logic only
         /______________________\
        /                        \
       /      Data Tests          \   Broad coverage
      /____________________________\   Every PK, critical FKs

Each layer serves a different purpose, catches a different category of problem, and has a different cost profile. The layers are complementary, not redundant — removing any one of them creates a blind spot.

Layer 1: Data Tests (Broad Coverage)

The foundation. These are the tests that every dbt project should have, regardless of size or complexity. They’re cheap to configure (a few lines of YAML), cheap to run (per-column scans), and catch the most common class of production failures.

Generic data tests on every model:

Primary keys: unique + not_null on every table’s grain. Non-negotiable. Duplicate primary keys cause join explosions downstream that are painful to diagnose and expensive to fix.
Foreign keys: relationships on critical joins. Not every foreign key needs a test, but the ones that feed dashboards and metrics should be validated.
Enums: accepted_values on status and type columns. An unexpected upstream value silently breaks CASE WHEN logic downstream.

dbt-expectations for targeted coverage:

Regex patterns for emails, URLs, product codes — anything with a known format
Numeric ranges for values that have known bounds (revenue > 0, conversion rate between 0 and 1)
Row count comparisons for critical transformations where row loss would be invisible otherwise

This layer runs on every dbt build in production. It sees real data. It catches the problems that correct code can’t prevent: source systems sending garbage, vendor APIs returning duplicates, upstream teams changing schemas without notice.

Layer 2: Unit Tests (5-10% of Models)

The middle layer. Unit tests are high-effort, high-precision tools. They use mocked inputs to verify that transformation logic produces the expected output for specific scenarios. They’re the only tool that can catch logic bugs before they reach the warehouse.

Not every model needs unit tests. The community consensus is roughly 5-10% of models warrant them — specifically the ones with complex logic where bugs are genuinely plausible:

Calculations with multiple branches — multi-condition CASE WHEN statements where branch ordering matters
Window functions — ranking, running totals, lag/lead calculations where window frame definitions are tricky
Custom business rules — customer segmentation, lead scoring, discount calculations with edge cases
Anything that’s caused bugs before — the “test on failure” pattern where a production bug gets a regression test

Unit tests run in CI only. They use mocked data and add no value in production. Exclude them from production builds with dbt build --exclude-resource-type unit_test.

The key insight: unit tests and data tests are not competing approaches. They catch entirely different categories of problems. Your unit tests can all pass while your data tests fail (source data is broken but your code is correct) and vice versa (your code has a logic bug but the data happens to look plausible).

Layer 3: Anomaly Detection (Key Tables Only)

The upper layer. Elementary takes a fundamentally different approach from the layers below: instead of testing against rules you define, it learns what’s normal and alerts on deviations. This catches the “unknown unknowns” — anomalies you wouldn’t think to write explicit tests for.

Apply anomaly detection selectively to key business tables:

Volume monitoring on fact tables — alerts when row counts deviate from historical patterns. Catches silent upstream failures where an incremental model processes zero rows and succeeds.
Freshness on source-adjacent models — monitors time between updates adaptively, without requiring you to guess and maintain fixed thresholds.
Column anomalies on financial data — tracks distribution metrics (average, null count, distinct count) and flags shifts. A sudden 50% drop in average order value is technically within range bounds but signals a real problem.

Elementary runs on every pipeline execution but at a higher compute cost than data tests because it queries historical data to establish baselines. Keep training periods reasonable (14-30 days) to balance accuracy against cost.

Layer 4: Data Diffs (Development Only)

The peak. dbt-audit-helper provides row-level comparison between two relations — purpose-built for validating migrations and refactoring.

{{ audit_helper.compare_queries(
    a_query=old_query,
    b_query=new_query,
    primary_key='customer_id'
) }}

This layer is for development and CI only. Never run data diffs in production pipelines. They’re expensive (full table scans on both relations), temporary (only relevant during the migration period), and diagnostic (they tell you what’s different, not whether it’s wrong).

Use cases:

Major refactoring projects where you need to prove the new version produces identical results
Migration from legacy scheduled queries to dbt
Validating that a performance optimization didn’t change outputs

Recommended Distribution

The pyramid shape encodes an important principle: invest more in the layers with broader coverage and lower cost, less in the layers with narrower coverage and higher cost.

A typical well-tested project might look like:

Layer	Coverage	Cost per test	Maintenance
Data tests	Every model	Low	Low — mostly YAML config
Unit tests	5-10% of models	Medium	Medium — fixtures need updating when models change
Elementary	Key business tables	Higher	Medium — tuning sensitivity and training periods
Data diffs	Active migrations only	Highest	None once migration is complete

A common mistake is inverting this pyramid — writing unit tests for everything while skipping basic data tests. A project with unique + not_null on every primary key and no unit tests catches more production incidents than a project with comprehensive unit tests and no primary key checks.

How This Relates to Layer Strategy

The pyramid describes what types of tests to use and in what proportion. The layer-by-layer strategy describes where in the DAG to place them. They’re complementary frameworks:

Data tests (pyramid layer 1) go everywhere, but with increasing intensity toward the edges of the DAG
Unit tests (pyramid layer 2) concentrate almost entirely at the mart layer where complex business logic lives
Elementary (pyramid layer 3) targets fact tables and source-adjacent models where volume and freshness matter most
Data diffs (pyramid layer 4) are used during development regardless of DAG layer

Together, the two frameworks answer both “what kind of test?” and “where does it go?” For the combined decision framework that brings both dimensions together, start with the three diagnostic questions.