dbt test severity and scheduling determine whether a failure blocks the pipeline and when tests run. Misconfigured severity or unconstrained test execution leads to either silent failures or pipelines that stop unnecessarily.
Severity Levels
dbt supports two severity levels for tests: error (the default) and warn.
error stops the pipeline. If you run dbt build, an error-severity test failure prevents downstream models from running. Use this for problems that would produce incorrect data if the pipeline continued:
- Primary key violations (duplicates cause join explosions)
- Freshness failures on critical tables (stale data is worse than no data)
- NULL values in required fields that feed dashboards
- Value ranges where violations indicate data corruption
warn logs the failure but lets the pipeline continue. Use this for problems that need investigation but shouldn’t block production:
columns: - name: order_value tests: - dbt_expectations.expect_column_mean_to_be_between: min_value: 50 max_value: 200 config: severity: warnStatistical tests like expect_column_mean_to_be_between are natural candidates for warn severity. A shift in average order value might signal a real problem, or it might reflect a seasonal trend, a successful promotion, or a market shift. You want visibility without halting production while you investigate.
Severity Guidelines by Test Type
| Test category | Recommended severity | Reasoning |
|---|---|---|
Primary key (unique, not_null) | error | Duplicates and nulls in keys corrupt all downstream joins |
Freshness (expect_row_values_to_have_recent_data) | error on critical tables, warn on others | Stale data in a finance mart is an emergency; stale data in a marketing staging table can wait |
Format (expect_column_values_to_match_regex) | error | Invalid formats typically indicate upstream schema changes |
Range (expect_column_values_to_be_between) | error | Out-of-range values usually indicate data corruption |
Statistical (expect_column_mean_to_be_between) | warn | Distribution shifts need investigation, not pipeline halts |
Volume (expect_table_row_count_to_be_between) | warn or error | Depends on how tightly you can define expected bounds |
Completeness (expect_row_values_to_have_data_for_every_n_datepart) | warn | Missing days may reflect source delays, not failures |
The goal is a test suite where error means “stop everything and fix this now” and warn means “look at this today.” If your team ignores warnings, you have too many of them. If your pipeline stops daily, you have too many errors on volatile tests.
Performance on BigQuery
Data quality tests run SQL against your warehouse. On BigQuery’s on-demand pricing, every test scans data and costs money. Some tests are cheap (a COUNT(*) with a partition filter). Others are expensive (a full-table statistical computation). Managing this cost is a first-class concern.
Filter on Partition Columns
If your table is partitioned by date, always use row_condition to filter on the partition column:
- dbt_expectations.expect_column_values_to_be_between: min_value: 0 max_value: 1000000 row_condition: "event_date >= current_date() - 30"Without this filter, BigQuery scans the entire table. With it, BigQuery prunes to the last 30 partitions. On a table with years of event data, this can reduce bytes scanned — and cost — by 90% or more.
This is especially important for these tests, which tend to be the most expensive:
expect_row_values_to_have_data_for_every_n_datepart(generates a date spine and joins)expect_column_mean_to_be_between(aggregates across all rows)expect_column_values_to_match_regex(evaluates a function on every row)expect_table_row_count_to_equal_other_table(full count on two tables)
Run Expensive Tests Only in Production
Some tests add value in production but are wasteful in development or CI. Use dbt’s enabled config with target-aware Jinja:
- dbt_expectations.expect_column_mean_to_be_between: min_value: 50 max_value: 200 config: enabled: "{{ target.name == 'prod' }}"This compiles the test only when running against the prod target. In development and CI, the test is skipped entirely — no compilation, no warehouse query, no cost.
Use this pattern for:
- Statistical tests that need production-scale data to be meaningful
- Volume checks where development datasets are intentionally small
- Cross-table comparisons where both tables must be fully materialized
Tag-Based Test Scheduling
For tests that should run on a schedule rather than every build, use dbt tags:
- dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart: date_col: event_date date_part: day config: tags: ['slow', 'daily']Then in your CI pipeline, exclude slow tests:
dbt test --exclude tag:slowAnd in your daily production run, include everything:
dbt testOr run slow tests on their own schedule:
dbt test --select tag:slowThis gives you fast CI feedback (under 5 minutes) while still running comprehensive validation daily. The tag names are arbitrary — slow, daily, weekly, expensive, nightly — pick whatever convention your team will actually use.
Estimating Test Costs
A rough framework for BigQuery on-demand pricing ($6.25 per TB scanned as of 2026):
| Test type | Typical scan | Cost on a 100GB table |
|---|---|---|
unique / not_null (single column) | Column only | ~$0.01 |
expect_column_values_to_be_between (with partition filter) | Filtered partition | ~$0.001 |
expect_column_values_to_be_between (no filter) | Full column | ~$0.01 |
expect_column_mean_to_be_between (full table) | Full column | ~$0.01 |
expect_row_values_to_have_data_for_every_n_datepart (full table) | Full table + date spine | ~$0.06 |
expect_table_row_count_to_equal_other_table | Two full tables | ~$0.12 |
Individual tests are cheap. The cost adds up when you have 200+ tests running multiple times per day across dozens of models. Partition filtering and tag-based scheduling are the two highest-leverage optimizations.
Test Placement by Layer
Where you place tests in your three-layer architecture affects both coverage and cost.
Sources and base models: Focus on freshness, schema validation, and basic format checks. This is where problems enter your pipeline. Catching them here prevents cascading failures downstream.
models: - name: base__ga4__events tests: - dbt_expectations.expect_row_values_to_have_recent_data: datepart: hour interval: 48 columns: - name: user_pseudo_id tests: - dbt_expectations.expect_column_values_to_match_regex: regex: '^[0-9]+\\.[0-9]+$'Intermediate models: Focus on join integrity and transformation validation. Verify that joins don’t drop or duplicate rows unexpectedly. These tests are often the first signal that something broke upstream.
Marts: Focus on business rules and aggregation sanity checks. These tests protect your final outputs — the tables that BI tools query and stakeholders rely on.
models: - name: mrt__marketing__campaign_performance columns: - name: roas tests: - dbt_expectations.expect_column_values_to_be_between: min_value: 0 max_value: 100 row_condition: "spend > 0"The general principle: test intensity should increase toward the edges of your DAG. Sources are where problems enter. Marts are where they reach consumers. The middle layers get lighter coverage focused on join integrity.
Starting Point for New Projects
When adding dbt-expectations to an existing project, a minimal set of three tests on the most important model covers the most common gaps native dbt tests miss:
- Freshness:
expect_row_values_to_have_recent_dataon the main timestamp column - Format:
expect_column_values_to_match_regexon one key identifier column - Range:
expect_column_values_to_be_betweenon one numeric KPI column
Expand from there based on which failures actually occur in production.