A three-question framework for choosing the right dbt testing approach among native unit tests, generic tests, singular tests, dbt-expectations, Elementary, and dbt-audit-helper. Answer the questions in order to identify the appropriate tool for any scenario.
Question 1: What Are You Testing?
The first question is the most important. Different categories of problems call for fundamentally different tools.
| Testing for | Recommended approach | Why |
|---|---|---|
| Transformation logic correctness | Unit tests | Mocked inputs isolate logic from data issues |
| Primary key integrity | Generic tests (unique + not_null) | Simple, built-in, runs on actual data |
| Referential integrity | Generic tests (relationships) | Built-in foreign key validation |
| Data freshness SLAs | dbt-utils recency OR Elementary | Explicit threshold vs. adaptive detection |
| Volume anomalies | Elementary (volume_anomalies) | No need to guess thresholds |
| Schema drift | Elementary (schema_changes) | Automatic monitoring |
| Complex business rules | Singular tests | Full SQL flexibility |
| Regex/pattern validation | Unit tests OR dbt-expectations | Logic testing vs. data validation |
| Migration accuracy | dbt-audit-helper | Row-level comparison |
| Statistical distribution | dbt-expectations | Mean, median, percentile checks |
The pattern here: if you’re testing logic (is my SQL correct?), you want unit tests with mocked data. If you’re testing data (is my production data healthy?), you want data tests against the warehouse. If you’re testing for surprises (is something weird happening that I didn’t anticipate?), you want Elementary’s anomaly detection.
Question 2: When Do You Want to Catch Issues?
Timing matters as much as tool choice. Different tools run at different points in the pipeline lifecycle, and placing them correctly determines whether you catch issues before or after damage is done.
| Timing | Test type | Rationale |
|---|---|---|
| Before bad data enters the warehouse | Unit tests | Catch logic bugs in CI, before merge |
| After transformation, before downstream | Data tests | Gate failed models, prevent propagation |
| Continuously in production | Elementary | Ongoing anomaly monitoring |
| During development/refactoring | dbt-audit-helper | Validate changes against baseline |
Unit tests gate deployments. They run in CI when you push code changes. If a unit test fails, the code change doesn’t merge. They never see production data, which means they can’t protect you from data quality issues — but they catch the logic bugs that no amount of data testing would find.
Data tests gate production data flow. They run on every dbt build against actual warehouse data. A failed data test with error severity blocks downstream models from running. They catch the problems that correct code can’t prevent: source systems sending NULLs where they never did before, duplicate records from vendor APIs, sudden shifts in data volume.
Elementary runs continuously in production, learning patterns from historical data. It catches the “unknown unknowns” — anomalies you wouldn’t think to write explicit tests for.
dbt-audit-helper is a development tool. Use it when refactoring models or migrating from legacy systems. It proves that your changes produce identical results. It has no place in production pipelines.
Question 3: What’s Your Cost and Complexity Tolerance?
Every testing approach has a setup cost, a runtime cost, and an ongoing maintenance cost. The right answer for a solo consultant managing 30 models is different from a data team managing 500.
| Approach | Setup effort | Runtime cost | Maintenance |
|---|---|---|---|
| Unit tests only | Medium | Low (mocked data) | Low |
| Generic tests only | Low | Medium (table scans) | Low |
| dbt-expectations | Medium | Medium | Medium (many test options) |
| Elementary | High (requires setup) | Higher (historical queries) | Medium |
| Full stack | High | Highest | Ongoing |
The honest recommendation: start with generic tests (low effort, broad coverage) and add complexity only when you have evidence it’s needed. A test suite of unique + not_null on every primary key catches more production incidents than a sophisticated Elementary deployment that nobody maintains.
The Decision Tree
When you’re staring at a model in a code review and asking “what kind of test does this need?”, walk through this tree:
Is the issue about LOGIC or DATA?|+-- LOGIC (calculation, transformation, edge case)| +-- Is the logic complex enough to plausibly have bugs?| +-- Yes --> Unit test| +-- No (simple passthrough, basic aggregation) --> Skip|+-- DATA (integrity, quality, freshness) | +-- Do you know the exact threshold or rule? | +-- Yes, simple rule (unique, not null, accepted values) | | +-- Generic test | +-- Yes, complex rule (regex, cross-column, custom SQL) | | +-- Singular test or dbt-expectations | +-- No, want adaptive detection | +-- Elementary | +-- Are you validating a migration? +-- dbt-audit-helperThe tree starts with the fundamental split: logic vs. data. This single distinction eliminates half the options immediately. If you’re testing logic, unit tests are the only tool that makes sense — data tests can’t verify logic because they don’t control their inputs. If you’re testing data, unit tests are irrelevant because they never see real data.
Within the data branch, the split is between explicit rules (you know what “correct” looks like) and adaptive detection (you want the system to learn what “normal” looks like). Explicit rules belong in generic tests, singular tests, or dbt-expectations. Adaptive detection belongs in Elementary.
Quick Reference: Test Selection Cheat Sheet
For the code reviewer who just needs a fast answer:
| Scenario | Use this | Example |
|---|---|---|
| Complex calculation | Unit test | Revenue with discounts, taxes, fees |
| Primary key check | unique + not_null | Every model’s ID column |
| Foreign key check | relationships | order.customer_id to customer.id |
| Enum validation | accepted_values | status in [‘active’, ‘pending’, ‘closed’] |
| Regex pattern | dbt_expectations.expect_column_values_to_match_regex | Email format |
| Numeric bounds | dbt_expectations.expect_column_values_to_be_between | order_value 0-100000 |
| Row count stability | Elementary volume_anomalies | Fact tables |
| Data freshness | dbt_utils.recency or Elementary | Source-adjacent staging |
| Schema monitoring | Elementary schema_changes | All models |
| Custom business rule | Singular test | ”No orders without line items” |
| Migration validation | dbt-audit-helper | Scheduled query to dbt comparison |
This cheat sheet is a compressed version of the framework above. For uncertain cases, the three questions — what are you testing, when do you want to catch it, and what is the cost and complexity tolerance — provide the fuller analysis.