Unit Tests vs Data Tests in dbt

dbt has two fundamentally different categories of tests, and they answer different questions. Unit tests ask “does my transformation logic work?” Data tests ask “is my actual production data healthy?” Conflating the two leads to testing strategies that are either redundant or full of gaps.

The Two Checkpoints

Think of your data pipeline as having two gates:

Code changes → Unit tests → Deployment → Data tests → Dashboard

Unit tests gate your deployments. They run in CI when you push code changes. They use mocked inputs you define, not real data. They verify that your SQL transformations produce the expected output for known scenarios. If a unit test fails, the code change doesn’t deploy.

Data tests gate your data. They run on every dbt build or dbt test execution against actual production data. They verify that the data flowing through your models is healthy, fresh, and within expected bounds. If a data test fails, downstream models are blocked (if severity is error) or a warning is logged (if severity is warn).

You need both. They’re not redundant — they catch entirely different categories of problems.

What Each Type Catches

Aspect	Unit tests (dbt 1.8+)	Data tests (generic + dbt-expectations)
What it tests	Transformation logic	Data quality
Input data	Mocked fixtures you define	Actual production data
When it runs	CI pipeline on code changes	Every dbt build/test run
What it catches	Logic bugs, edge cases	Data anomalies, source issues
Example question	”Does my CASE WHEN categorize correctly?"	"Are all values in the expected range?”

Unit tests are high-precision, narrow-scope tools. You write them for the 5-10% of models with complex logic: intricate CASE WHEN branches, window functions, date calculations, string parsing. They’re expensive to write and maintain but catch bugs that no amount of data testing would find.

Data tests are broad-coverage, lower-precision tools. You apply them to every model: primary key checks, freshness monitoring, value range validation, format enforcement. They’re cheap to configure (a few lines of YAML) and catch problems that correct code can’t prevent.

The Concrete Example

Consider a model that calculates revenue by multiplying quantity by unit price.

Your unit test verifies that the multiplication logic works:

unit_tests:
  - name: test_revenue_calculation
    model: mrt__sales__orders
    given:
      - input: ref('base__shopify__orders')
        rows:
          - {order_id: 1, quantity: 3, unit_price: 10.00}
          - {order_id: 2, quantity: 1, unit_price: 99.99}
    expect:
      rows:
        - {order_id: 1, revenue: 30.00}
        - {order_id: 2, revenue: 99.99}

This proves that 3 * 10.00 = 30.00. The logic is correct. But it says nothing about the actual data in production.

Your data test verifies that the resulting revenue values in production are sane:

models:
  - name: mrt__sales__orders
    columns:
      - name: revenue
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              max_value: 10000000
          - dbt_expectations.expect_column_mean_to_be_between:
              min_value: 50
              max_value: 500
              config:
                severity: warn

This catches the problem when a source system suddenly sends negative quantities, or when a pricing error produces $0.01 orders for premium products, or when a data migration fills quantity with NULL and the multiplication silently produces NULL revenue. The SQL is flawless. The data is broken.

When Unit Tests Pass but Data Tests Fail

This is the scenario that justifies having both:

A source system starts sending NULLs where it never did before. Your unit tests pass (your mocked data doesn’t include NULLs in that column). Your data tests catch it.
A daily batch arrives six hours late. Your unit tests are unaffected (they don’t use real data). Your freshness test catches it.
An upstream team changes their pricing from dollars to cents without notice. Your unit tests pass (3 * 10.00 still equals 30.00). Your expect_column_mean_to_be_between test catches the 100x shift in average revenue.
A vendor API starts returning duplicate records. Your unit tests pass. Your unique test on the primary key catches the duplicates before they cause join explosions downstream.

In each case, your code is correct. Your data is not. Unit tests can’t protect you here because they don’t see real data. Data tests can.

When Data Tests Pass but Unit Tests Would Have Caught the Bug

The reverse also happens:

You refactor a CASE WHEN statement and accidentally swap two categories. The data distribution looks normal (same ranges, same averages), so data tests pass. A unit test with explicit expected outputs for each category would have caught it.
You change a window function from ROWS BETWEEN UNBOUNDED PRECEDING to ROWS BETWEEN 1 PRECEDING, and the running total is now a rolling two-row sum. For most rows the numbers look plausible. A unit test with a specific multi-row scenario would expose the error immediately.
You introduce an off-by-one error in a date boundary calculation. Records near month boundaries are misclassified, but they’re a tiny fraction of the total, so aggregate data tests don’t notice. A unit test targeting the boundary case catches it.

Practical Guidance

Unit tests belong in CI only. They use mocked data and add no value in production. Exclude them from production runs:

dbt build --exclude-resource-type unit_test

Data tests belong in production. They need real data to be meaningful. Running them in CI against a dev dataset with 100 rows is marginally useful at best. The full value comes from running them against production-scale data on every build.

Start with data tests. They’re faster to implement (YAML configuration, not fixture writing), broader in coverage, and catch the most common class of failures (data issues, not logic bugs). Add unique + not_null on every primary key, dbt-expectations for value ranges and patterns, and freshness checks on critical models.

Add unit tests selectively. Reserve them for the models where transformation logic is complex enough that bugs are plausible and data tests alone wouldn’t catch them. The community consensus is roughly 5-10% of models warrant unit tests.

Unit tests verify transformation logic is correct. Data tests verify actual data is healthy. Neither alone is sufficient — unit tests catch code bugs before deployment; data tests catch data issues during production runs.