Unit Testing vs. Data Testing: When to Use Each

This is Part 3 of a 3-part series on dbt unit testing. Building on the implementation (Part 1) and patterns (Part 2), this article provides a strategic framework for choosing the right testing approach.

You’ve learned how to write unit tests. You know the patterns for incremental models, window functions, and marketing analytics. But here’s the question that keeps coming up in code reviews: “Should this be a unit test or a data test?”

The dbt testing ecosystem has grown significantly. Between native unit tests, generic tests, singular tests, dbt-expectations, Elementary, and dbt-audit-helper, the options can feel overwhelming. This article cuts through the complexity with a clear decision framework.

The dbt Testing Taxonomy

Before we can decide when to use each test type, we need to understand what each one does.

Unit Tests

You’ve seen these throughout Part 1 and Part 2. Unit tests validate transformation logic using mocked, static inputs:

unit_tests:
  - name: test_discount_calculation
    model: mrt__finance__orders
    given:
      - input: ref('base__shopify__orders')
        rows:
          - {order_id: 1, subtotal: 100, discount_code: "SAVE20"}
    expect:
      rows:
        - {order_id: 1, discount_amount: 20, final_total: 80}

When they run: During the build phase, before materialization.

What they catch: Logic bugs, edge cases, regressions in transformation code.

Key characteristic: They never touch your actual warehouse data.

Generic Tests

Generic tests are parameterized assertions defined in YAML. dbt ships with four built-in tests:

models:
  - name: mrt__core__customers
    columns:
      - name: customer_id
        data_tests:
          - unique
          - not_null
      - name: customer_status
        data_tests:
          - accepted_values:
              values: ['active', 'churned', 'pending']
      - name: account_manager_id
        data_tests:
          - relationships:
              to: ref('mrt__hr__employees')
              field: employee_id

When they run: After materialization, on actual warehouse data.

What they catch: Data integrity issues—duplicates, nulls, invalid values, broken foreign keys.

Key characteristic: They validate data, not logic.

Singular Tests

Singular tests are custom SQL queries in the tests/ folder. A test passes if the query returns zero rows:

-- tests/assert_no_orphaned_orders.sql
-- Orders should always have a valid customer
select o.order_id
from {{ ref('mrt__finance__orders') }} o
left join {{ ref('mrt__core__customers') }} c on o.customer_id = c.customer_id
where c.customer_id is null

When to use: Complex business rules that don’t fit generic test parameters.

Data Tests: The Umbrella Term

“Data tests” refers to both generic and singular tests—anything that runs on actual warehouse data via dbt test. The key distinction from unit tests:

Aspect	Unit Tests	Data Tests
Input data	Mocked, static	Actual warehouse data
Tests for	Logic correctness	Data quality
When to run	CI/development	Every pipeline run
Warehouse cost	Minimal (with `--empty`)	Proportional to data size

External Testing Packages

The dbt ecosystem offers powerful packages that extend native testing capabilities.

dbt-expectations

Inspired by Great Expectations, this package provides 60+ test types organized by category. See the dedicated dbt-expectations tutorial for a full walkthrough.

# Installation
packages:
  - package: calogica/dbt_expectations
    version: ">=0.10.0"

Table-level tests:

models:
  - name: mrt__finance__orders
    data_tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 1000
          max_value: 1000000
      - dbt_expectations.expect_table_row_count_to_equal_other_table:
          compare_model: ref('base__shopify__orders')

Column-level tests:

columns:
  - name: email
    data_tests:
      - dbt_expectations.expect_column_values_to_match_regex:
          regex: "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$"
  - name: order_value
    data_tests:
      - dbt_expectations.expect_column_values_to_be_between:
          min_value: 0
          max_value: 100000
  - name: order_date
    data_tests:
      - dbt_expectations.expect_column_values_to_be_of_type:
          column_type: date

Distribution tests:

columns:
  - name: order_value
    data_tests:
      - dbt_expectations.expect_column_mean_to_be_between:
          min_value: 50
          max_value: 200
      - dbt_expectations.expect_column_proportion_of_unique_values_to_be_between:
          min_value: 0.8

Best for: Comprehensive, explicit assertions when you know exactly what to check.

dbt-utils

The official dbt Labs utility package includes battle-tested testing macros:

packages:
  - package: dbt-labs/dbt_utils
    version: ">=1.0.0"

Key tests:

models:
  - name: mrt__finance__orders
    data_tests:
      # Row count comparison
      - dbt_utils.equal_rowcount:
          compare_model: ref('base__shopify__orders')

      # Data freshness
      - dbt_utils.recency:
          datepart: day
          field: created_at
          interval: 1

      # Non-overlapping date ranges
      - dbt_utils.mutually_exclusive_ranges:
          lower_bound_column: valid_from
          upper_bound_column: valid_to
          partition_by: customer_id

    columns:
      - name: revenue
        data_tests:
          # Arbitrary SQL expression
          - dbt_utils.expression_is_true:
              expression: ">= 0"

Best for: Simple, reliable tests that work across all warehouses. Low maintenance burden.

Elementary

Elementary takes a different approach: ML-powered anomaly detection without hardcoded thresholds.

packages:
  - package: elementary-data/elementary
    version: ">=0.14.0"

Instead of specifying “row count must be between X and Y,” Elementary learns what’s normal and alerts on deviations:

models:
  - name: mrt__finance__orders
    data_tests:
      # Alerts if row count deviates significantly from historical pattern
      - elementary.volume_anomalies:
          timestamp_column: created_at
          training_period:
            period: day
            count: 30

      # Alerts if data arrives late
      - elementary.freshness_anomalies:
          timestamp_column: created_at

      # Monitors column-level metrics (nulls, cardinality, etc.)
      - elementary.column_anomalies:
          column_name: order_value

      # Detects schema changes
      - elementary.schema_changes

How it works: Elementary calculates a Z-score (how many standard deviations a metric is from its historical mean). No need to guess thresholds.

Best for: Catching “unknown unknowns,” anomalies you wouldn’t think to write explicit tests for.

dbt-audit-helper

Purpose-built for migrations and refactoring validation:

packages:
  - package: dbt-labs/audit_helper
    version: ">=0.9.0"

-- analyses/compare_legacy_to_new.sql
{% set old_query %}
    select * from {{ ref('legacy_customers') }}
{% endset %}

{% set new_query %}
    select * from {{ ref('mrt__core__customers') }}
{% endset %}

{{ audit_helper.compare_queries(
    a_query=old_query,
    b_query=new_query,
    primary_key='customer_id'
) }}

Output shows exactly which rows differ and how:

| in_a | in_b | count |
|------|------|-------|
| true | true | 9950  |  -- Matching rows
| true | false| 30    |  -- Only in legacy
| false| true | 20    |  -- Only in new

For BigQuery specifically, use the hash-based comparison for better performance:

{{ audit_helper.quick_are_queries_identical(
    a_query=old_query,
    b_query=new_query
) }}

Best for: Validating that refactored models produce identical results to legacy queries.

Package Comparison Summary

Package	Best For	Setup Effort	Maintenance	BigQuery
dbt-expectations	Explicit, comprehensive assertions	Medium	Medium (many options)	Full support
dbt-utils	Simple, reliable standard tests	Low	Low	Full support
Elementary	Anomaly detection, observability	Medium	Medium	Full support
dbt-audit-helper	Migration validation	Low	Low	Full support

The Decision Framework

Now for the practical part: how do you decide which testing approach to use?

Question 1: What Are You Testing?

Testing For	Recommended Approach	Why
Transformation logic correctness	Unit tests	Mocked inputs isolate logic from data issues
Primary key integrity	Generic tests (unique + not_null)	Simple, built-in, runs on actual data
Referential integrity	Generic tests (relationships)	Built-in foreign key validation
Data freshness SLAs	dbt-utils recency OR Elementary	Explicit threshold vs. adaptive detection
Volume anomalies	Elementary (volume_anomalies)	No need to guess thresholds
Schema drift	Elementary (schema_changes)	Automatic monitoring
Complex business rules	Singular tests	Full SQL flexibility
Regex/pattern validation	Unit tests OR dbt-expectations	Logic testing vs. data validation
Migration accuracy	dbt-audit-helper	Row-level comparison
Statistical distribution	dbt-expectations	Mean, median, percentile checks

Question 2: When Do You Want to Catch Issues?

Timing	Test Type	Rationale
Before bad data enters warehouse	Unit tests	Catch logic bugs in CI, before merge
After transformation, before downstream	Data tests	Gate failed models, prevent propagation
Continuously in production	Elementary	Ongoing anomaly monitoring
During development/refactoring	dbt-audit-helper	Validate changes against baseline

Question 3: What’s Your Cost/Complexity Tolerance?

Approach	Setup Effort	Runtime Cost	Maintenance
Unit tests only	Medium	Low (mocked data)	Low
Generic tests only	Low	Medium (table scans)	Low
dbt-expectations	Medium	Medium	Medium (many test options)
Elementary	High (requires setup)	Higher (historical queries)	Medium
Full stack	High	Highest	Ongoing

Decision Tree

Is the issue about LOGIC or DATA?
├── LOGIC (calculation, transformation, edge case)
│   └── → Unit test
│
└── DATA (integrity, quality, freshness)
    │
    ├── Do you know the exact threshold/rule?
    │   ├── Yes, simple rule (unique, not null)
    │   │   └── → Generic test
    │   ├── Yes, complex rule
    │   │   └── → Singular test or dbt-expectations
    │   └── No, want adaptive detection
    │       └── → Elementary
    │
    └── Are you validating a migration?
        └── → dbt-audit-helper

The dbt Testing Pyramid

Like the software testing pyramid, dbt projects benefit from a layered approach. The testing strategy framework covers how to apply this per DAG layer.

                    /\
                   /  \     Data Diffs
                  /    \    (dbt-audit-helper)
                 /______\   Dev only, use sparingly
                /        \
               /  Anomaly \   Elementary
              /  Detection \  Key tables only
             /______________\
            /                \
           /    Unit Tests    \   5-10% of models
          /                    \  Complex logic only
         /______________________\
        /                        \
       /      Data Tests          \   Broad coverage
      /____________________________\   Every PK, critical FKs

Recommended Distribution

Unit tests (~5-10% of models): Focus on models with complex logic:

Calculations with multiple branches
Window functions
Custom business rules
Anything that’s caused bugs before

Generic data tests (broad coverage):

Primary keys: unique + not_null on every table
Foreign keys: relationships on critical joins
Enums: accepted_values on status/type columns

dbt-expectations (targeted):

Regex patterns for emails, URLs, codes
Numeric ranges for values that have known bounds
Row count comparisons for critical transformations

Elementary (key business tables):

Volume monitoring on fact tables
Freshness on source-adjacent models
Column anomalies on financial data

dbt-audit-helper (development only):

Major refactoring projects
Migration from legacy scheduled queries
Never in production pipelines

Anti-Patterns to Avoid

Unit testing everything: Diminishing returns. Focus on complex logic, not simple passthrough columns.

Only testing happy paths: Your unit tests should include nulls, empty strings, boundary values, and edge cases.

Hardcoding thresholds that drift: If you write expect_table_row_count_to_be_between(min=1000, max=2000), you’ll need to update it as data grows. Consider Elementary for adaptive thresholds.

Testing warehouse functions: Don’t unit test SUM() or DATE_TRUNC(). BigQuery tests those extensively. Test your own logic.

Performance and Cost Implications

When Tests Run

Test Type	Typical Trigger	BigQuery Cost
Unit tests	CI only	Minimal (use `--empty`)
Generic tests	Every `dbt build`	Per-table scans
dbt-expectations	Every `dbt build`	Per-table scans
Elementary	Every run + training	Historical queries (higher)
dbt-audit-helper	Dev/CI only	Full table scans (highest)

Optimizing for BigQuery

Unit tests: Always use the --empty flag in CI:

dbt run --select +test_type:unit --empty
dbt test --select test_type:unit

Generic tests: Consider partition filters for large tables. Create a custom test wrapper:

-- macros/partition_aware_unique.sql
{% test partition_aware_unique(model, column_name, partition_column, lookback_days=7) %}
select {{ column_name }}
from {{ model }}
where {{ partition_column }} >= date_sub(current_date(), interval {{ lookback_days }} day)
group by 1
having count(*) > 1
{% endtest %}

Elementary: Configure appropriate training periods to balance accuracy and cost:

- elementary.volume_anomalies:
    training_period:
      period: day
      count: 14  # 2 weeks, not 90 days

dbt-audit-helper: Use sampling for large tables:

{{ audit_helper.compare_queries(
    a_query=old_query,
    b_query=new_query,
    primary_key='customer_id',
    summarize=true  -- Just counts, not full diff
) }}

Excluding Tests from Production

Unit tests add no value in production (the inputs are mocked). Exclude them:

# Production deployment
dbt build --exclude-resource-type unit_test

Or use environment variables:

export DBT_EXCLUDE_RESOURCE_TYPES=unit_test
dbt build

For selective test execution by environment, use dbt_project.yml:

tests:
  my_project:
    +enabled: "{{ target.name != 'prod' or var('run_all_tests', false) }}"

Building a Testing Strategy

For New Projects

Start simple and expand:

Week 1: Add unique + not_null to all primary keys
Week 2: Add unit tests for your 3 most complex models
Week 3: Add relationships tests for critical foreign keys
Month 2: Evaluate Elementary for key business tables
Ongoing: Add unit tests when bugs are discovered (regression prevention)

For Existing Projects

Audit and prioritize:

Inventory: Which models have tests? Which have none? A well-organized project structure makes this audit easier.
Risk assessment: Which untested models are most critical?
Quick wins: Add generic tests to all primary keys (low effort, high value)
Unit test candidates: Models with complex logic, recent bugs, or upcoming refactors
Observability: Consider Elementary for tables with historical data quality issues

For Migrations (Scheduled Queries → dbt)

dbt-audit-helper is essential:

-- Step 1: Create comparison
{% set legacy_query %}
select * from `project.dataset.legacy_scheduled_query_output`
{% endset %}

{% set dbt_query %}
select * from {{ ref('mrt__finance__new_model') }}
{% endset %}

-- Step 2: Run comparison
{{ audit_helper.compare_all_columns(
    a_query=legacy_query,
    b_query=dbt_query,
    primary_key='id'
) }}

Workflow:

Run comparison, identify all differences
Investigate: Are differences bugs or intentional improvements?
Unit test the new logic (not just parity)
Document intentional changes
Sign off with stakeholders

The “Test on Failure” Pattern

When a bug is discovered:

Write the unit test first — reproduce the bug with mocked data
Verify it fails — confirms you’ve captured the issue
Fix the model
Verify the test passes
Document the scenario in the test description

unit_tests:
  - name: test_discount_negative_quantity_bug
    model: mrt__finance__orders
    description: |
      Regression test for BUG-1234: Negative quantities were causing
      discount calculations to return negative values. Fixed 2024-06-15.
    given:
      - input: ref('base__shopify__orders')
        rows:
          - {order_id: 1, quantity: -1, unit_price: 100, discount_rate: 0.1}
    expect:
      rows:
        - {order_id: 1, discount_amount: 0}  # Should be 0, not -10

This builds a regression test suite organically, focused on actual failure modes.

Key Lessons

Manual test addition doesn’t scale: Automate where possible (Elementary) and prioritize ruthlessly
Unknown unknowns matter: Explicit tests only catch issues you anticipate; anomaly detection catches the rest
Test ownership is critical: Use meta.owner to ensure tests are maintained:

models:
  - name: mrt__finance__orders
    config:
      meta:
        owner: "finance-data-team"
        slack_channel: "#data-finance-alerts"

Quick Reference: Test Selection Cheat Sheet

Scenario	Use This	Example
Complex calculation	Unit test	Revenue with discounts, taxes, fees
Primary key check	`unique` + `not_null`	Every model’s ID column
Foreign key check	`relationships`	order.customer_id → customer.id
Enum validation	`accepted_values`	status in [‘active’, ‘pending’, ‘closed’]
Regex pattern	`dbt_expectations.expect_column_values_to_match_regex`	Email format
Numeric bounds	`dbt_expectations.expect_column_values_to_be_between`	order_value 0-100000
Row count stability	Elementary `volume_anomalies`	Fact tables
Data freshness	`dbt_utils.recency` or Elementary	Source-adjacent staging
Schema monitoring	Elementary `schema_changes`	All models
Custom business rule	Singular test	”No orders without line items”
Migration validation	dbt-audit-helper	Scheduled query → dbt comparison

This concludes the dbt unit testing series. For questions or feedback, feel free to reach out.