Adrienne Vermorel

Unit Testing vs. Data Testing: When to Use Each

This is Part 3 of a 3-part series on dbt unit testing. Building on the implementation (Part 1) and patterns (Part 2), this article provides a strategic framework for choosing the right testing approach.


You’ve learned how to write unit tests. You know the patterns for incremental models, window functions, and marketing analytics. But here’s the question that keeps coming up in code reviews: “Should this be a unit test or a data test?”

The dbt testing ecosystem has grown significantly. Between native unit tests, generic tests, singular tests, dbt-expectations, Elementary, and dbt-audit-helper, the options can feel overwhelming. This article cuts through the complexity with a clear decision framework.

The dbt Testing Taxonomy

Before we can decide when to use each test type, we need to understand what each one does.

Unit Tests

You’ve seen these throughout Parts 1 and 2. Unit tests validate transformation logic using mocked, static inputs:

unit_tests:
- name: test_discount_calculation
model: mrt__finance__orders
given:
- input: ref('base__shopify__orders')
rows:
- {order_id: 1, subtotal: 100, discount_code: "SAVE20"}
expect:
rows:
- {order_id: 1, discount_amount: 20, final_total: 80}

When they run: During the build phase, before materialization.

What they catch: Logic bugs, edge cases, regressions in transformation code.

Key characteristic: They never touch your actual warehouse data.

Generic Tests

Generic tests are parameterized assertions defined in YAML. dbt ships with four built-in tests:

models:
- name: mrt__core__customers
columns:
- name: customer_id
data_tests:
- unique
- not_null
- name: customer_status
data_tests:
- accepted_values:
values: ['active', 'churned', 'pending']
- name: account_manager_id
data_tests:
- relationships:
to: ref('mrt__hr__employees')
field: employee_id

When they run: After materialization, on actual warehouse data.

What they catch: Data integrity issues—duplicates, nulls, invalid values, broken foreign keys.

Key characteristic: They validate data, not logic.

Singular Tests

Singular tests are custom SQL queries in the tests/ folder. A test passes if the query returns zero rows:

-- tests/assert_no_orphaned_orders.sql
-- Orders should always have a valid customer
select o.order_id
from {{ ref('mrt__finance__orders') }} o
left join {{ ref('mrt__core__customers') }} c on o.customer_id = c.customer_id
where c.customer_id is null

When to use: Complex business rules that don’t fit generic test parameters.

Data Tests: The Umbrella Term

“Data tests” refers to both generic and singular tests—anything that runs on actual warehouse data via dbt test. The key distinction from unit tests:

AspectUnit TestsData Tests
Input dataMocked, staticActual warehouse data
Tests forLogic correctnessData quality
When to runCI/developmentEvery pipeline run
Warehouse costMinimal (with --empty)Proportional to data size

External Testing Packages

The dbt ecosystem offers powerful packages that extend native testing capabilities.

dbt-expectations

Inspired by Great Expectations, this package provides 60+ test types organized by category:

Terminal window
# Installation
packages:
- package: calogica/dbt_expectations
version: ">=0.10.0"

Table-level tests:

models:
- name: mrt__finance__orders
data_tests:
- dbt_expectations.expect_table_row_count_to_be_between:
min_value: 1000
max_value: 1000000
- dbt_expectations.expect_table_row_count_to_equal_other_table:
compare_model: ref('base__shopify__orders')

Column-level tests:

columns:
- name: email
data_tests:
- dbt_expectations.expect_column_values_to_match_regex:
regex: "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$"
- name: order_value
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 100000
- name: order_date
data_tests:
- dbt_expectations.expect_column_values_to_be_of_type:
column_type: date

Distribution tests:

columns:
- name: order_value
data_tests:
- dbt_expectations.expect_column_mean_to_be_between:
min_value: 50
max_value: 200
- dbt_expectations.expect_column_proportion_of_unique_values_to_be_between:
min_value: 0.8

Best for: Comprehensive, explicit assertions when you know exactly what to check.

dbt-utils

The official dbt Labs utility package includes battle-tested testing macros:

Terminal window
packages:
- package: dbt-labs/dbt_utils
version: ">=1.0.0"

Key tests:

models:
- name: mrt__finance__orders
data_tests:
# Row count comparison
- dbt_utils.equal_rowcount:
compare_model: ref('base__shopify__orders')
# Data freshness
- dbt_utils.recency:
datepart: day
field: created_at
interval: 1
# Non-overlapping date ranges
- dbt_utils.mutually_exclusive_ranges:
lower_bound_column: valid_from
upper_bound_column: valid_to
partition_by: customer_id
columns:
- name: revenue
data_tests:
# Arbitrary SQL expression
- dbt_utils.expression_is_true:
expression: ">= 0"

Best for: Simple, reliable tests that work across all warehouses. Low maintenance burden.

Elementary

Elementary takes a different approach: ML-powered anomaly detection without hardcoded thresholds.

Terminal window
packages:
- package: elementary-data/elementary
version: ">=0.14.0"

Instead of specifying “row count must be between X and Y,” Elementary learns what’s normal and alerts on deviations:

models:
- name: mrt__finance__orders
data_tests:
# Alerts if row count deviates significantly from historical pattern
- elementary.volume_anomalies:
timestamp_column: created_at
training_period:
period: day
count: 30
# Alerts if data arrives late
- elementary.freshness_anomalies:
timestamp_column: created_at
# Monitors column-level metrics (nulls, cardinality, etc.)
- elementary.column_anomalies:
column_name: order_value
# Detects schema changes
- elementary.schema_changes

How it works: Elementary calculates a Z-score (how many standard deviations a metric is from its historical mean). No need to guess thresholds.

Best for: Catching “unknown unknowns,” anomalies you wouldn’t think to write explicit tests for.

dbt-audit-helper

Purpose-built for migrations and refactoring validation:

Terminal window
packages:
- package: dbt-labs/audit_helper
version: ">=0.9.0"
-- analyses/compare_legacy_to_new.sql
{% set old_query %}
select * from {{ ref('legacy_customers') }}
{% endset %}
{% set new_query %}
select * from {{ ref('mrt__core__customers') }}
{% endset %}
{{ audit_helper.compare_queries(
a_query=old_query,
b_query=new_query,
primary_key='customer_id'
) }}

Output shows exactly which rows differ and how:

| in_a | in_b | count |
|------|------|-------|
| true | true | 9950 | -- Matching rows
| true | false| 30 | -- Only in legacy
| false| true | 20 | -- Only in new

For BigQuery specifically, use the hash-based comparison for better performance:

{{ audit_helper.quick_are_queries_identical(
a_query=old_query,
b_query=new_query
) }}

Best for: Validating that refactored models produce identical results to legacy queries.

Package Comparison Summary

PackageBest ForSetup EffortMaintenanceBigQuery
dbt-expectationsExplicit, comprehensive assertionsMediumMedium (many options)Full support
dbt-utilsSimple, reliable standard testsLowLowFull support
ElementaryAnomaly detection, observabilityMediumMediumFull support
dbt-audit-helperMigration validationLowLowFull support

The Decision Framework

Now for the practical part: how do you decide which testing approach to use?

Question 1: What Are You Testing?

Testing ForRecommended ApproachWhy
Transformation logic correctnessUnit testsMocked inputs isolate logic from data issues
Primary key integrityGeneric tests (unique + not_null)Simple, built-in, runs on actual data
Referential integrityGeneric tests (relationships)Built-in foreign key validation
Data freshness SLAsdbt-utils recency OR ElementaryExplicit threshold vs. adaptive detection
Volume anomaliesElementary (volume_anomalies)No need to guess thresholds
Schema driftElementary (schema_changes)Automatic monitoring
Complex business rulesSingular testsFull SQL flexibility
Regex/pattern validationUnit tests OR dbt-expectationsLogic testing vs. data validation
Migration accuracydbt-audit-helperRow-level comparison
Statistical distributiondbt-expectationsMean, median, percentile checks

Question 2: When Do You Want to Catch Issues?

TimingTest TypeRationale
Before bad data enters warehouseUnit testsCatch logic bugs in CI, before merge
After transformation, before downstreamData testsGate failed models, prevent propagation
Continuously in productionElementaryOngoing anomaly monitoring
During development/refactoringdbt-audit-helperValidate changes against baseline

Question 3: What’s Your Cost/Complexity Tolerance?

ApproachSetup EffortRuntime CostMaintenance
Unit tests onlyMediumLow (mocked data)Low
Generic tests onlyLowMedium (table scans)Low
dbt-expectationsMediumMediumMedium (many test options)
ElementaryHigh (requires setup)Higher (historical queries)Medium
Full stackHighHighestOngoing

Decision Tree

Is the issue about LOGIC or DATA?
├── LOGIC (calculation, transformation, edge case)
│ └── → Unit test
└── DATA (integrity, quality, freshness)
├── Do you know the exact threshold/rule?
│ ├── Yes, simple rule (unique, not null)
│ │ └── → Generic test
│ ├── Yes, complex rule
│ │ └── → Singular test or dbt-expectations
│ └── No, want adaptive detection
│ └── → Elementary
└── Are you validating a migration?
└── → dbt-audit-helper

The dbt Testing Pyramid

Like the software testing pyramid, dbt projects benefit from a layered approach:

/\
/ \ Data Diffs
/ \ (dbt-audit-helper)
/______\ Dev only, use sparingly
/ \
/ Anomaly \ Elementary
/ Detection \ Key tables only
/______________\
/ \
/ Unit Tests \ 5-10% of models
/ \ Complex logic only
/______________________\
/ \
/ Data Tests \ Broad coverage
/____________________________\ Every PK, critical FKs

Unit tests (~5-10% of models): Focus on models with complex logic:

  • Calculations with multiple branches
  • Window functions
  • Custom business rules
  • Anything that’s caused bugs before

Generic data tests (broad coverage):

  • Primary keys: unique + not_null on every table
  • Foreign keys: relationships on critical joins
  • Enums: accepted_values on status/type columns

dbt-expectations (targeted):

  • Regex patterns for emails, URLs, codes
  • Numeric ranges for values that have known bounds
  • Row count comparisons for critical transformations

Elementary (key business tables):

  • Volume monitoring on fact tables
  • Freshness on source-adjacent models
  • Column anomalies on financial data

dbt-audit-helper (development only):

  • Major refactoring projects
  • Migration from legacy scheduled queries
  • Never in production pipelines

Anti-Patterns to Avoid

Unit testing everything: Diminishing returns. Focus on complex logic, not simple passthrough columns.

Only testing happy paths: Your unit tests should include nulls, empty strings, boundary values, and edge cases.

Hardcoding thresholds that drift: If you write expect_table_row_count_to_be_between(min=1000, max=2000), you’ll need to update it as data grows. Consider Elementary for adaptive thresholds.

Testing warehouse functions: Don’t unit test SUM() or DATE_TRUNC(). BigQuery tests those extensively. Test your own logic.

Performance and Cost Implications

When Tests Run

Test TypeTypical TriggerBigQuery Cost
Unit testsCI onlyMinimal (use --empty)
Generic testsEvery dbt buildPer-table scans
dbt-expectationsEvery dbt buildPer-table scans
ElementaryEvery run + trainingHistorical queries (higher)
dbt-audit-helperDev/CI onlyFull table scans (highest)

Optimizing for BigQuery

Unit tests: Always use the --empty flag in CI:

Terminal window
dbt run --select +test_type:unit --empty
dbt test --select test_type:unit

Generic tests: Consider partition filters for large tables. Create a custom test wrapper:

-- macros/partition_aware_unique.sql
{% test partition_aware_unique(model, column_name, partition_column, lookback_days=7) %}
select {{ column_name }}
from {{ model }}
where {{ partition_column }} >= date_sub(current_date(), interval {{ lookback_days }} day)
group by 1
having count(*) > 1
{% endtest %}

Elementary: Configure appropriate training periods to balance accuracy and cost:

- elementary.volume_anomalies:
training_period:
period: day
count: 14 # 2 weeks, not 90 days

dbt-audit-helper: Use sampling for large tables:

{{ audit_helper.compare_queries(
a_query=old_query,
b_query=new_query,
primary_key='customer_id',
summarize=true -- Just counts, not full diff
) }}

Excluding Tests from Production

Unit tests add no value in production (the inputs are mocked). Exclude them:

Terminal window
# Production deployment
dbt build --exclude-resource-type unit_test

Or use environment variables:

Terminal window
export DBT_EXCLUDE_RESOURCE_TYPES=unit_test
dbt build

For selective test execution by environment, use dbt_project.yml:

dbt_project.yml
tests:
my_project:
+enabled: "{{ target.name != 'prod' or var('run_all_tests', false) }}"

Building a Testing Strategy

For New Projects

Start simple and expand:

  1. Week 1: Add unique + not_null to all primary keys
  2. Week 2: Add unit tests for your 3 most complex models
  3. Week 3: Add relationships tests for critical foreign keys
  4. Month 2: Evaluate Elementary for key business tables
  5. Ongoing: Add unit tests when bugs are discovered (regression prevention)

For Existing Projects

Audit and prioritize:

  1. Inventory: Which models have tests? Which have none?
  2. Risk assessment: Which untested models are most critical?
  3. Quick wins: Add generic tests to all primary keys (low effort, high value)
  4. Unit test candidates: Models with complex logic, recent bugs, or upcoming refactors
  5. Observability: Consider Elementary for tables with historical data quality issues

For Migrations (Scheduled Queries → dbt)

dbt-audit-helper is essential:

-- Step 1: Create comparison
{% set legacy_query %}
select * from `project.dataset.legacy_scheduled_query_output`
{% endset %}
{% set dbt_query %}
select * from {{ ref('mrt__finance__new_model') }}
{% endset %}
-- Step 2: Run comparison
{{ audit_helper.compare_all_columns(
a_query=legacy_query,
b_query=dbt_query,
primary_key='id'
) }}

Workflow:

  1. Run comparison, identify all differences
  2. Investigate: Are differences bugs or intentional improvements?
  3. Unit test the new logic (not just parity)
  4. Document intentional changes
  5. Sign off with stakeholders

The “Test on Failure” Pattern

When a bug is discovered:

  1. Write the unit test first — reproduce the bug with mocked data
  2. Verify it fails — confirms you’ve captured the issue
  3. Fix the model
  4. Verify the test passes
  5. Document the scenario in the test description
unit_tests:
- name: test_discount_negative_quantity_bug
model: mrt__finance__orders
description: |
Regression test for BUG-1234: Negative quantities were causing
discount calculations to return negative values. Fixed 2024-06-15.
given:
- input: ref('base__shopify__orders')
rows:
- {order_id: 1, quantity: -1, unit_price: 100, discount_rate: 0.1}
expect:
rows:
- {order_id: 1, discount_amount: 0} # Should be 0, not -10

This builds a regression test suite organically, focused on actual failure modes.

Key Lessons

  1. Manual test addition doesn’t scale: Automate where possible (Elementary) and prioritize ruthlessly
  2. Unknown unknowns matter: Explicit tests only catch issues you anticipate; anomaly detection catches the rest
  3. Test ownership is critical: Use meta.owner to ensure tests are maintained:
models:
- name: mrt__finance__orders
config:
meta:
owner: "finance-data-team"
slack_channel: "#data-finance-alerts"

Quick Reference: Test Selection Cheat Sheet

ScenarioUse ThisExample
Complex calculationUnit testRevenue with discounts, taxes, fees
Primary key checkunique + not_nullEvery model’s ID column
Foreign key checkrelationshipsorder.customer_id → customer.id
Enum validationaccepted_valuesstatus in [‘active’, ‘pending’, ‘closed’]
Regex patterndbt_expectations.expect_column_values_to_match_regexEmail format
Numeric boundsdbt_expectations.expect_column_values_to_be_betweenorder_value 0-100000
Row count stabilityElementary volume_anomaliesFact tables
Data freshnessdbt_utils.recency or ElementarySource-adjacent staging
Schema monitoringElementary schema_changesAll models
Custom business ruleSingular test”No orders without line items”
Migration validationdbt-audit-helperScheduled query → dbt comparison

This concludes the dbt unit testing series. For questions or feedback, feel free to reach out.