Adrienne Vermorel
Unit Testing vs. Data Testing: When to Use Each
This is Part 3 of a 3-part series on dbt unit testing. Building on the implementation (Part 1) and patterns (Part 2), this article provides a strategic framework for choosing the right testing approach.
You’ve learned how to write unit tests. You know the patterns for incremental models, window functions, and marketing analytics. But here’s the question that keeps coming up in code reviews: “Should this be a unit test or a data test?”
The dbt testing ecosystem has grown significantly. Between native unit tests, generic tests, singular tests, dbt-expectations, Elementary, and dbt-audit-helper, the options can feel overwhelming. This article cuts through the complexity with a clear decision framework.
The dbt Testing Taxonomy
Before we can decide when to use each test type, we need to understand what each one does.
Unit Tests
You’ve seen these throughout Parts 1 and 2. Unit tests validate transformation logic using mocked, static inputs:
unit_tests: - name: test_discount_calculation model: mrt__finance__orders given: - input: ref('base__shopify__orders') rows: - {order_id: 1, subtotal: 100, discount_code: "SAVE20"} expect: rows: - {order_id: 1, discount_amount: 20, final_total: 80}When they run: During the build phase, before materialization.
What they catch: Logic bugs, edge cases, regressions in transformation code.
Key characteristic: They never touch your actual warehouse data.
Generic Tests
Generic tests are parameterized assertions defined in YAML. dbt ships with four built-in tests:
models: - name: mrt__core__customers columns: - name: customer_id data_tests: - unique - not_null - name: customer_status data_tests: - accepted_values: values: ['active', 'churned', 'pending'] - name: account_manager_id data_tests: - relationships: to: ref('mrt__hr__employees') field: employee_idWhen they run: After materialization, on actual warehouse data.
What they catch: Data integrity issues—duplicates, nulls, invalid values, broken foreign keys.
Key characteristic: They validate data, not logic.
Singular Tests
Singular tests are custom SQL queries in the tests/ folder. A test passes if the query returns zero rows:
-- tests/assert_no_orphaned_orders.sql-- Orders should always have a valid customerselect o.order_idfrom {{ ref('mrt__finance__orders') }} oleft join {{ ref('mrt__core__customers') }} c on o.customer_id = c.customer_idwhere c.customer_id is nullWhen to use: Complex business rules that don’t fit generic test parameters.
Data Tests: The Umbrella Term
“Data tests” refers to both generic and singular tests—anything that runs on actual warehouse data via dbt test. The key distinction from unit tests:
| Aspect | Unit Tests | Data Tests |
|---|---|---|
| Input data | Mocked, static | Actual warehouse data |
| Tests for | Logic correctness | Data quality |
| When to run | CI/development | Every pipeline run |
| Warehouse cost | Minimal (with --empty) | Proportional to data size |
External Testing Packages
The dbt ecosystem offers powerful packages that extend native testing capabilities.
dbt-expectations
Inspired by Great Expectations, this package provides 60+ test types organized by category:
# Installationpackages: - package: calogica/dbt_expectations version: ">=0.10.0"Table-level tests:
models: - name: mrt__finance__orders data_tests: - dbt_expectations.expect_table_row_count_to_be_between: min_value: 1000 max_value: 1000000 - dbt_expectations.expect_table_row_count_to_equal_other_table: compare_model: ref('base__shopify__orders')Column-level tests:
columns: - name: email data_tests: - dbt_expectations.expect_column_values_to_match_regex: regex: "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$" - name: order_value data_tests: - dbt_expectations.expect_column_values_to_be_between: min_value: 0 max_value: 100000 - name: order_date data_tests: - dbt_expectations.expect_column_values_to_be_of_type: column_type: dateDistribution tests:
columns: - name: order_value data_tests: - dbt_expectations.expect_column_mean_to_be_between: min_value: 50 max_value: 200 - dbt_expectations.expect_column_proportion_of_unique_values_to_be_between: min_value: 0.8Best for: Comprehensive, explicit assertions when you know exactly what to check.
dbt-utils
The official dbt Labs utility package includes battle-tested testing macros:
packages: - package: dbt-labs/dbt_utils version: ">=1.0.0"Key tests:
models: - name: mrt__finance__orders data_tests: # Row count comparison - dbt_utils.equal_rowcount: compare_model: ref('base__shopify__orders')
# Data freshness - dbt_utils.recency: datepart: day field: created_at interval: 1
# Non-overlapping date ranges - dbt_utils.mutually_exclusive_ranges: lower_bound_column: valid_from upper_bound_column: valid_to partition_by: customer_id
columns: - name: revenue data_tests: # Arbitrary SQL expression - dbt_utils.expression_is_true: expression: ">= 0"Best for: Simple, reliable tests that work across all warehouses. Low maintenance burden.
Elementary
Elementary takes a different approach: ML-powered anomaly detection without hardcoded thresholds.
packages: - package: elementary-data/elementary version: ">=0.14.0"Instead of specifying “row count must be between X and Y,” Elementary learns what’s normal and alerts on deviations:
models: - name: mrt__finance__orders data_tests: # Alerts if row count deviates significantly from historical pattern - elementary.volume_anomalies: timestamp_column: created_at training_period: period: day count: 30
# Alerts if data arrives late - elementary.freshness_anomalies: timestamp_column: created_at
# Monitors column-level metrics (nulls, cardinality, etc.) - elementary.column_anomalies: column_name: order_value
# Detects schema changes - elementary.schema_changesHow it works: Elementary calculates a Z-score (how many standard deviations a metric is from its historical mean). No need to guess thresholds.
Best for: Catching “unknown unknowns,” anomalies you wouldn’t think to write explicit tests for.
dbt-audit-helper
Purpose-built for migrations and refactoring validation:
packages: - package: dbt-labs/audit_helper version: ">=0.9.0"-- analyses/compare_legacy_to_new.sql{% set old_query %} select * from {{ ref('legacy_customers') }}{% endset %}
{% set new_query %} select * from {{ ref('mrt__core__customers') }}{% endset %}
{{ audit_helper.compare_queries( a_query=old_query, b_query=new_query, primary_key='customer_id') }}Output shows exactly which rows differ and how:
| in_a | in_b | count ||------|------|-------|| true | true | 9950 | -- Matching rows| true | false| 30 | -- Only in legacy| false| true | 20 | -- Only in newFor BigQuery specifically, use the hash-based comparison for better performance:
{{ audit_helper.quick_are_queries_identical( a_query=old_query, b_query=new_query) }}Best for: Validating that refactored models produce identical results to legacy queries.
Package Comparison Summary
| Package | Best For | Setup Effort | Maintenance | BigQuery |
|---|---|---|---|---|
| dbt-expectations | Explicit, comprehensive assertions | Medium | Medium (many options) | Full support |
| dbt-utils | Simple, reliable standard tests | Low | Low | Full support |
| Elementary | Anomaly detection, observability | Medium | Medium | Full support |
| dbt-audit-helper | Migration validation | Low | Low | Full support |
The Decision Framework
Now for the practical part: how do you decide which testing approach to use?
Question 1: What Are You Testing?
| Testing For | Recommended Approach | Why |
|---|---|---|
| Transformation logic correctness | Unit tests | Mocked inputs isolate logic from data issues |
| Primary key integrity | Generic tests (unique + not_null) | Simple, built-in, runs on actual data |
| Referential integrity | Generic tests (relationships) | Built-in foreign key validation |
| Data freshness SLAs | dbt-utils recency OR Elementary | Explicit threshold vs. adaptive detection |
| Volume anomalies | Elementary (volume_anomalies) | No need to guess thresholds |
| Schema drift | Elementary (schema_changes) | Automatic monitoring |
| Complex business rules | Singular tests | Full SQL flexibility |
| Regex/pattern validation | Unit tests OR dbt-expectations | Logic testing vs. data validation |
| Migration accuracy | dbt-audit-helper | Row-level comparison |
| Statistical distribution | dbt-expectations | Mean, median, percentile checks |
Question 2: When Do You Want to Catch Issues?
| Timing | Test Type | Rationale |
|---|---|---|
| Before bad data enters warehouse | Unit tests | Catch logic bugs in CI, before merge |
| After transformation, before downstream | Data tests | Gate failed models, prevent propagation |
| Continuously in production | Elementary | Ongoing anomaly monitoring |
| During development/refactoring | dbt-audit-helper | Validate changes against baseline |
Question 3: What’s Your Cost/Complexity Tolerance?
| Approach | Setup Effort | Runtime Cost | Maintenance |
|---|---|---|---|
| Unit tests only | Medium | Low (mocked data) | Low |
| Generic tests only | Low | Medium (table scans) | Low |
| dbt-expectations | Medium | Medium | Medium (many test options) |
| Elementary | High (requires setup) | Higher (historical queries) | Medium |
| Full stack | High | Highest | Ongoing |
Decision Tree
Is the issue about LOGIC or DATA?├── LOGIC (calculation, transformation, edge case)│ └── → Unit test│└── DATA (integrity, quality, freshness) │ ├── Do you know the exact threshold/rule? │ ├── Yes, simple rule (unique, not null) │ │ └── → Generic test │ ├── Yes, complex rule │ │ └── → Singular test or dbt-expectations │ └── No, want adaptive detection │ └── → Elementary │ └── Are you validating a migration? └── → dbt-audit-helperThe dbt Testing Pyramid
Like the software testing pyramid, dbt projects benefit from a layered approach:
/\ / \ Data Diffs / \ (dbt-audit-helper) /______\ Dev only, use sparingly / \ / Anomaly \ Elementary / Detection \ Key tables only /______________\ / \ / Unit Tests \ 5-10% of models / \ Complex logic only /______________________\ / \ / Data Tests \ Broad coverage /____________________________\ Every PK, critical FKsRecommended Distribution
Unit tests (~5-10% of models): Focus on models with complex logic:
- Calculations with multiple branches
- Window functions
- Custom business rules
- Anything that’s caused bugs before
Generic data tests (broad coverage):
- Primary keys:
unique+not_nullon every table - Foreign keys:
relationshipson critical joins - Enums:
accepted_valueson status/type columns
dbt-expectations (targeted):
- Regex patterns for emails, URLs, codes
- Numeric ranges for values that have known bounds
- Row count comparisons for critical transformations
Elementary (key business tables):
- Volume monitoring on fact tables
- Freshness on source-adjacent models
- Column anomalies on financial data
dbt-audit-helper (development only):
- Major refactoring projects
- Migration from legacy scheduled queries
- Never in production pipelines
Anti-Patterns to Avoid
Unit testing everything: Diminishing returns. Focus on complex logic, not simple passthrough columns.
Only testing happy paths: Your unit tests should include nulls, empty strings, boundary values, and edge cases.
Hardcoding thresholds that drift: If you write expect_table_row_count_to_be_between(min=1000, max=2000), you’ll need to update it as data grows. Consider Elementary for adaptive thresholds.
Testing warehouse functions: Don’t unit test SUM() or DATE_TRUNC(). BigQuery tests those extensively. Test your own logic.
Performance and Cost Implications
When Tests Run
| Test Type | Typical Trigger | BigQuery Cost |
|---|---|---|
| Unit tests | CI only | Minimal (use --empty) |
| Generic tests | Every dbt build | Per-table scans |
| dbt-expectations | Every dbt build | Per-table scans |
| Elementary | Every run + training | Historical queries (higher) |
| dbt-audit-helper | Dev/CI only | Full table scans (highest) |
Optimizing for BigQuery
Unit tests: Always use the --empty flag in CI:
dbt run --select +test_type:unit --emptydbt test --select test_type:unitGeneric tests: Consider partition filters for large tables. Create a custom test wrapper:
-- macros/partition_aware_unique.sql{% test partition_aware_unique(model, column_name, partition_column, lookback_days=7) %}select {{ column_name }}from {{ model }}where {{ partition_column }} >= date_sub(current_date(), interval {{ lookback_days }} day)group by 1having count(*) > 1{% endtest %}Elementary: Configure appropriate training periods to balance accuracy and cost:
- elementary.volume_anomalies: training_period: period: day count: 14 # 2 weeks, not 90 daysdbt-audit-helper: Use sampling for large tables:
{{ audit_helper.compare_queries( a_query=old_query, b_query=new_query, primary_key='customer_id', summarize=true -- Just counts, not full diff) }}Excluding Tests from Production
Unit tests add no value in production (the inputs are mocked). Exclude them:
# Production deploymentdbt build --exclude-resource-type unit_testOr use environment variables:
export DBT_EXCLUDE_RESOURCE_TYPES=unit_testdbt buildFor selective test execution by environment, use dbt_project.yml:
tests: my_project: +enabled: "{{ target.name != 'prod' or var('run_all_tests', false) }}"Building a Testing Strategy
For New Projects
Start simple and expand:
- Week 1: Add
unique+not_nullto all primary keys - Week 2: Add unit tests for your 3 most complex models
- Week 3: Add
relationshipstests for critical foreign keys - Month 2: Evaluate Elementary for key business tables
- Ongoing: Add unit tests when bugs are discovered (regression prevention)
For Existing Projects
Audit and prioritize:
- Inventory: Which models have tests? Which have none?
- Risk assessment: Which untested models are most critical?
- Quick wins: Add generic tests to all primary keys (low effort, high value)
- Unit test candidates: Models with complex logic, recent bugs, or upcoming refactors
- Observability: Consider Elementary for tables with historical data quality issues
For Migrations (Scheduled Queries → dbt)
dbt-audit-helper is essential:
-- Step 1: Create comparison{% set legacy_query %}select * from `project.dataset.legacy_scheduled_query_output`{% endset %}
{% set dbt_query %}select * from {{ ref('mrt__finance__new_model') }}{% endset %}
-- Step 2: Run comparison{{ audit_helper.compare_all_columns( a_query=legacy_query, b_query=dbt_query, primary_key='id') }}Workflow:
- Run comparison, identify all differences
- Investigate: Are differences bugs or intentional improvements?
- Unit test the new logic (not just parity)
- Document intentional changes
- Sign off with stakeholders
The “Test on Failure” Pattern
When a bug is discovered:
- Write the unit test first — reproduce the bug with mocked data
- Verify it fails — confirms you’ve captured the issue
- Fix the model
- Verify the test passes
- Document the scenario in the test description
unit_tests: - name: test_discount_negative_quantity_bug model: mrt__finance__orders description: | Regression test for BUG-1234: Negative quantities were causing discount calculations to return negative values. Fixed 2024-06-15. given: - input: ref('base__shopify__orders') rows: - {order_id: 1, quantity: -1, unit_price: 100, discount_rate: 0.1} expect: rows: - {order_id: 1, discount_amount: 0} # Should be 0, not -10This builds a regression test suite organically, focused on actual failure modes.
Key Lessons
- Manual test addition doesn’t scale: Automate where possible (Elementary) and prioritize ruthlessly
- Unknown unknowns matter: Explicit tests only catch issues you anticipate; anomaly detection catches the rest
- Test ownership is critical: Use
meta.ownerto ensure tests are maintained:
models: - name: mrt__finance__orders config: meta: owner: "finance-data-team" slack_channel: "#data-finance-alerts"Quick Reference: Test Selection Cheat Sheet
| Scenario | Use This | Example |
|---|---|---|
| Complex calculation | Unit test | Revenue with discounts, taxes, fees |
| Primary key check | unique + not_null | Every model’s ID column |
| Foreign key check | relationships | order.customer_id → customer.id |
| Enum validation | accepted_values | status in [‘active’, ‘pending’, ‘closed’] |
| Regex pattern | dbt_expectations.expect_column_values_to_match_regex | Email format |
| Numeric bounds | dbt_expectations.expect_column_values_to_be_between | order_value 0-100000 |
| Row count stability | Elementary volume_anomalies | Fact tables |
| Data freshness | dbt_utils.recency or Elementary | Source-adjacent staging |
| Schema monitoring | Elementary schema_changes | All models |
| Custom business rule | Singular test | ”No orders without line items” |
| Migration validation | dbt-audit-helper | Scheduled query → dbt comparison |
This concludes the dbt unit testing series. For questions or feedback, feel free to reach out.