Adrienne Vermorel
dbt Testing Strategy: A Framework for Every Project
Most data teams fall into one of two traps. The first group writes tests for everything, generating so many alerts that the team learns to ignore them. Real issues slip through unnoticed. The second group writes almost no tests, relying on stakeholders to report problems after dashboards break. Both approaches fail for the same reason: they treat testing as a checkbox rather than a strategy.
After auditing dozen of dbt projects, a pattern emerges. The teams with reliable data don’t necessarily have more tests; they have better-placed tests with clear ownership and appropriate severity. This article synthesizes the full landscape of dbt testing tools in 2026 and provides an opinionated framework that scales from your first project to enterprise-grade pipelines.
Understanding the Testing Taxonomy
Before diving into implementation, it’s worth clarifying terminology that often causes confusion. dbt now offers three distinct mechanisms for validation, each serving a different purpose.
Data tests validate the quality of your actual data after models materialize. When you add a unique test to a column, dbt queries the built table and checks whether duplicates exist. These tests answer: “Is my data correct?”
Unit tests, introduced in dbt 1.8, validate your transformation logic before materialization. You provide mock inputs and expected outputs, and dbt verifies your SQL produces the right results without touching production data. These tests answer: “Is my logic correct?”
Model contracts enforce schema guarantees at build time. When you enable a contract, dbt refuses to build the model if the output doesn’t match your declared column names and data types. Contracts answer: “Is my schema stable?”
The decision matrix is straightforward: use data tests for quality validation on every model, unit tests for complex business logic you need to verify, and contracts for public-facing models where schema stability matters to downstream consumers.
Within data tests, you’ll encounter two flavors. Generic tests are reusable and configured in YAML (the four built-in tests plus anything from packages). Singular tests are one-off SQL files in your tests/ directory that return zero rows on success. Use singular tests when validation logic is unique to one model or requires complex cross-table logic that generic tests can’t express.
Built-in dbt Tests: The Foundation
dbt ships with four generic tests that form the backbone of any testing strategy. Applied correctly, these catch the majority of data integrity issues before they propagate downstream.
The unique test validates that no duplicate values exist in a column. Apply it to every primary key without exception. The not_null test ensures required fields are populated; pair it with unique on primary keys. The accepted_values test constrains categorical columns to expected values, particularly valuable for status fields where unexpected values from upstream systems break downstream logic. The relationships test validates referential integrity, ensuring foreign key values exist in their parent table.
models: - name: mrt__sales__orders columns: - name: order_id data_tests: - unique - not_null - name: customer_id data_tests: - not_null - relationships: to: ref('mrt__sales__customers') field: customer_id - name: status data_tests: - accepted_values: values: ['pending', 'confirmed', 'shipped', 'delivered', 'cancelled']This baseline (unique and not_null on every primary key, relationships on critical foreign keys) should be non-negotiable. It catches join explosions from duplicate keys, broken references from orphaned foreign keys, and unexpected nulls that cause downstream failures.
When built-in tests don’t suffice, custom generic tests let you create reusable, parameterized validations for your organization. Create them as macros in tests/generic/ using the {% test %} block syntax:
-- tests/generic/test_is_positive.sql{% test is_positive(model, column_name) %}
SELECT *FROM {{ model }}WHERE {{ column_name }} < 0
{% endtest %}Once defined, apply it like any generic test:
columns: - name: quantity data_tests: - is_positiveCommon custom tests worth implementing: is_positive for numeric fields that should never be negative, not_empty_string for text fields where empty strings are invalid, date_not_in_future for timestamps that shouldn’t exceed the current date, and valid_email_format for basic email validation.
Native Unit Tests: Testing Logic, Not Just Data
Native unit tests, introduced in dbt 1.8, let you validate that your SQL transformations produce correct results against controlled inputs. This enables true test-driven development for data.
Unit tests are defined in YAML, typically in your schema files alongside the model they test:
unit_tests: - name: test_customer_lifetime_value_calculation description: "Verify LTV calculation handles edge cases correctly" model: mrt__sales__customers given: - input: ref('base__shopify__orders') rows: - {customer_id: 1, order_total: 100.00, order_date: '2024-01-15'} - {customer_id: 1, order_total: 250.00, order_date: '2024-02-20'} - {customer_id: 2, order_total: 0.00, order_date: '2024-01-10'} - input: ref('base__shopify__customers') rows: - {customer_id: 1, created_at: '2024-01-01'} - {customer_id: 2, created_at: '2024-01-01'} expect: rows: - {customer_id: 1, lifetime_value: 350.00, order_count: 2} - {customer_id: 2, lifetime_value: 0.00, order_count: 1}The given block mocks your input refs with static data. The expect block defines what your model should output. dbt builds the model using only the mocked inputs and compares actual output to expected output.
Not every model needs unit tests. They’re high-effort, high-precision tools best reserved for specific scenarios:
Complex string parsing or regex: If you’re extracting domains from emails, parsing UTM parameters from URLs, or cleaning messy text fields, unit tests verify your regex handles edge cases.
Date calculations with edge cases: Fiscal year mappings, business day calculations, and timezone conversions all have edge cases that are easy to get wrong. Unit tests with specific dates catch these before production.
Multi-branch CASE WHEN logic: When a column’s value depends on multiple conditions, unit tests document the expected behavior for each branch and catch regressions when logic changes.
Window functions: Ranking, running totals, and lag/lead calculations are notoriously tricky. Mocking a small dataset lets you verify the window behaves correctly.
The community consensus suggests roughly 1% of columns warrant unit tests, specifically those with complex, high-business-impact calculations. Don’t try to unit test everything; the maintenance burden isn’t worth it.
For incremental models, unit tests become even more valuable. You can override the is_incremental() macro and mock this (the current table state) to verify merge logic:
unit_tests: - name: test_incremental_deduplication model: int__events_deduplicated overrides: macros: is_incremental: true given: - input: ref('base__segment__events') rows: - {event_id: 1, event_time: '2024-03-01 10:00:00', value: 100} - {event_id: 1, event_time: '2024-03-01 10:00:00', value: 150} # duplicate - input: this rows: - {event_id: 0, event_time: '2024-02-28 09:00:00', value: 50} # existing expect: rows: - {event_id: 1, event_time: '2024-03-01 10:00:00', value: 150} # latest value winsThe main friction point is “YAML accounting” (manually crafting fixtures for models with many columns). AI-assisted fixture generation helps significantly here. For production runs, exclude unit tests entirely with dbt build --exclude-resource-type unit_test; they’re development and CI tools only.
Model Contracts: Schema Stability for Consumers
Model contracts serve a fundamentally different purpose than tests. While tests validate after the fact, contracts prevent builds when the output schema doesn’t match your declaration. This provides guarantees to downstream consumers (BI tools, reverse ETL, external teams) that your model’s structure won’t change unexpectedly.
models: - name: mrt__sales__customers config: contract: enforced: true columns: - name: customer_id data_type: int64 constraints: - type: not_null - type: primary_key - name: email data_type: string - name: created_at data_type: timestamp - name: lifetime_value data_type: numericWhen contract.enforced is true, dbt validates at build time that your model produces exactly these columns with exactly these types. If you add a column to your SQL but forget to add it to the contract, the build fails. If a column comes back as float64 instead of int64, the build fails.
Use contracts on public/exposed models only: marts that BI tools query, tables that feed reverse ETL, datasets shared with external teams. Don’t bother with contracts on base or intermediate models; they add overhead without benefit for internal transformations.
Combined with model versions, contracts enable non-breaking schema evolution. You can introduce a v2 of a model with a new schema while maintaining v1 for existing consumers, giving them time to migrate.
One caveat: constraint enforcement varies by warehouse. Snowflake and BigQuery often treat constraints as metadata only. They won’t prevent constraint violations at insert time. Postgres and Redshift may actually enforce them. The not_null constraint is most reliably enforced across platforms.
The Testing Package Ecosystem
The built-in tests cover basics, but packages extend dbt’s capabilities dramatically.
dbt-utils is essential for every project. Maintained by dbt Labs, it provides tests that should arguably be built-in:
equal_rowcount: Compare row counts between two relations (invaluable during refactoring)expression_is_true: Test any SQL expression you can writerecency: Verify data freshness without source freshness checksunique_combination_of_columns: Test composite key uniqueness
models: - name: mrt__sales__order_items data_tests: - dbt_utils.unique_combination_of_columns: combination_of_columns: - order_id - line_item_id - dbt_utils.expression_is_true: expression: "quantity > 0" - dbt_utils.recency: datepart: day field: created_at interval: 1dbt-expectations ports over 50 tests from the Great Expectations Python library, enabling statistical and pattern-based validation:
expect_column_values_to_be_between: Range validation with min/maxexpect_column_mean_to_be_between: Statistical distribution checksexpect_column_values_to_match_regex: Pattern matchingexpect_table_row_count_to_equal_other_table: Cross-table validation
The killer feature is the row_condition parameter, enabling conditional testing:
- dbt_expectations.expect_column_values_to_not_be_null: column_name: shipping_date row_condition: "status = 'shipped'" # only test shipped ordersNote that following a fork in late 2024, the package is now maintained by Metaplane as dbt_expectations.
Elementary takes a fundamentally different approach: anomaly detection instead of static thresholds. Rather than defining “fail if more than 100 nulls,” Elementary learns patterns from your historical data and alerts when metrics deviate beyond expected ranges.
models: - name: mrt__sales__orders data_tests: - elementary.volume_anomalies: timestamp_column: created_at where: "status != 'cancelled'" - elementary.freshness_anomalies: timestamp_column: updated_at - elementary.column_anomalies: column_name: order_total anomaly_sensitivity: 3 # z-score thresholdElementary also provides schema_changes detection, alerting on unexpected column additions, deletions, or type changes. It includes a CLI that generates observability reports. For teams experiencing alert fatigue from threshold-based tests, Elementary’s dynamic approach reduces noise while catching issues static tests miss.
dbt-audit-helper is essential during migrations and refactoring. Its compare_queries macro performs row-by-row comparison between two queries, classifying results as perfect match, modified, or missing:
{% set old_query %} SELECT * FROM {{ ref('legacy_customers') }}{% endset %}
{% set new_query %} SELECT * FROM {{ ref('mrt__sales__customers') }}{% endset %}
{{ audit_helper.compare_queries( a_query=old_query, b_query=new_query, primary_key='customer_id') }}Use it when migrating legacy SQL to dbt or validating that refactored logic produces identical results.
Layer-by-Layer Testing Strategy
The most effective testing framework applies different tests at each layer of your DAG. Testing intensity should increase toward the edges: sources where issues enter your pipeline, and marts where issues exit to consumers.
At the sources layer, test only what’s fixable upstream. Apply source freshness checks with appropriate severity (error for business-critical sources, warn for informational ones). Test for unique and not_null on primary keys only if duplicates or nulls are actually removable in the source system. If they’re not fixable at source, remove the test and handle the issue in base models instead.
sources: - name: salesforce freshness: warn_after: {count: 12, period: hour} error_after: {count: 24, period: hour} tables: - name: accounts loaded_at_field: systemmodstamp columns: - name: id data_tests: - unique - not_nullAt the base layer, establish your “clean contract” (the promise that downstream models receive well-typed, deduplicated, null-handled data). Don’t test your cleanup: if your base model filters nulls, adding a not_null test is redundant and adds execution time. Instead, test for business-specific anomalies: values outside acceptable ranges, unexpected categorical values, volume anomalies.
models: - name: base__shopify__orders columns: - name: order_id data_tests: - unique - not_null - name: order_total data_tests: - dbt_utils.expression_is_true: expression: ">= 0" - name: currency data_tests: - accepted_values: values: ['USD', 'EUR', 'GBP', 'CAD']At the intermediate layer, test the consequences of joins and aggregations. Apply primary key tests to any model where you’ve changed granularity. If you’ve aggregated from line items to orders, the order_id should now be unique. Use relationships tests where you’ve joined tables. Consider expression tests for sanity-checking aggregation results.
At the marts layer, invest most heavily. This is where unit tests earn their keep: customer segmentation rules, financial calculations, forecasting logic. Apply comprehensive generic tests on all exposed columns. Enable model contracts for schema stability. Focus testing on net-new calculated columns; don’t re-test passthrough fields that were already validated upstream.
Severity Management and the Broken Window Philosophy
The distinction between warn and error severity seems minor but dramatically impacts how your team relates to tests. An error blocks execution; a warn logs a message and continues. The pattern that works: error in CI, warn in production.
data_tests: - unique: config: severity: "{{ 'error' if target.name == 'ci' else 'warn' }}"For more nuance, use conditional severity thresholds:
data_tests: - not_null: config: severity: error error_if: ">1000" warn_if: ">10"This creates a two-stage system: early warnings for investigation when you exceed 10 nulls, hard failures when the situation becomes severe at 1000.
The Broken Window Theory applies directly to data testing: never tolerate chronically failing tests. Teams that accept a few perpetual failures quickly accept more, until the entire test suite becomes background noise. When a test fails, you have four options:
- Fix the underlying issue in the data or logic
- Update test expectations if the original threshold was wrong
- Tag as under-investigation with a deadline and owner
- Delete the test if it provides no actionable value
What you cannot do is leave it failing indefinitely.
Alert routing matters as much as severity. Tag models with ownership and criticality in their meta:
models: - name: mrt__finance__revenue config: meta: owner: "finance-analytics" criticality: "high"Then route alerts accordingly: page on-call for critical model failures, Slack for high-priority, daily digest emails for medium, weekly review for low. Without clear ownership, test failures become everyone’s problem, which means no one’s problem.
Advanced Patterns
Testing incremental models requires extra attention because correctness depends on existing table state. Always set a unique_key to ensure idempotency. Without it, rerunning the same incremental model creates duplicates.
For late-arriving data, implement a lookback window:
{% if is_incremental() %}WHERE event_time >= ( SELECT TIMESTAMP_SUB(MAX(event_time), INTERVAL {{ var('lookback_days', 3) }} DAY) FROM {{ this }}){% endif %}This reprocesses the last N days on each run, catching records that arrived after their logical timestamp.
Source freshness checks should run at least twice as often as your lowest SLA. If you promise stakeholders data within 6 hours, check freshness every 3 hours minimum. For large tables, add a filter to limit scanning:
sources: - name: events tables: - name: raw_events loaded_at_field: _loaded_at freshness: warn_after: {count: 2, period: hour} error_after: {count: 6, period: hour} filter: "_loaded_at >= current_date - 1"Schema change detection combines contracts (build-time enforcement) with Elementary’s schema_changes test (runtime alerting). Contracts catch issues you explicitly define; Elementary catches unexpected changes you haven’t anticipated. Together they provide comprehensive schema governance.
Referential integrity at scale requires performance optimization. For tables with billions of rows, testing every foreign key relationship becomes expensive. Strategies include:
- Filter to recent data:
WHERE created_at >= CURRENT_DATE - 30 - Sample the data: test a random subset rather than the full table
- Use the
dbt_constraintspackage to create native database constraints where supported - Parallelize with higher thread counts
Testing vs. Observability
Testing and observability are complementary approaches to data quality. Testing catches known unknowns, conditions you’ve anticipated and written rules for. Observability catches unknown unknowns, anomalies you couldn’t have predicted.
A production-down scenario illustrates the difference. Tests might catch “null values in customer_id” (a known failure mode you test for). Observability catches “row count dropped 80% compared to typical Wednesday volumes” (an anomaly you didn’t explicitly test for).
The practical maturity path:
- Start with generic tests + dbt-utils: Cover primary keys, foreign keys, basic constraints
- Add dbt-expectations: When you need statistical validation or conditional tests
- Layer in Elementary: When your test suite matures and you want anomaly detection
- Consider commercial platforms: Monte Carlo, Metaplane, or others when open-source reaches its limits
The most mature teams build a “self-improving system”: when observability tools detect an anomaly, they convert it into a permanent dbt test. This continuously expands coverage based on real incidents rather than speculation about what might break.
The Opinionated Framework
Below is a concrete implementation path based on your project’s maturity.
Starting out (0-50 models):
- Add
uniqueandnot_nullto every primary key, no exceptions - Add
relationshipstests on critical foreign keys - Configure source freshness on all ingestion sources
- Use
dbt buildto run tests immediately after each model builds - Set all tests to
severity: errorinitially. You want to know about every failure
Maturing (50-200 models):
- Implement layer-specific testing based on the strategy above
- Add
dbt_project_evaluatorto enforce coverage thresholds - Set up Slim CI with
--select state:modified+to test only changed models - Introduce conditional severity thresholds for noisy tests
- Establish clear ownership with
meta.ownertags
Advanced (200+ models):
- Write native unit tests for your most complex business logic (target ~1% of columns)
- Deploy Elementary for anomaly detection
- Enable model contracts on all public-facing marts
- Build automated severity classification based on downstream impact
- Implement tiered alerting: page, Slack, digest, weekly review
Conclusion
Effective dbt testing isn’t about maximizing coverage. It’s about strategic placement, appropriate severity, and clear ownership. Test intensively at the edges of your pipeline where issues enter (sources) and exit (marts). Use the right tool for each scenario: generic tests for integrity, unit tests for logic, contracts for schema stability, observability for unknowns.
The teams with the best data quality share common traits. They never normalize failing tests. They route alerts to owners who can act. They continuously convert incidents into permanent tests. And they treat data quality as a feature to be designed, not a burden to be endured.
Start with the basics (unique and not_null on every primary key) and build from there. The framework scales from a single analytics engineer to enterprise data platforms. What matters is starting with intention and maintaining discipline as you grow.