dbt Test Severity and Performance Tuning

dbt test severity and scheduling determine whether a failure blocks the pipeline and when tests run. Misconfigured severity or unconstrained test execution leads to either silent failures or pipelines that stop unnecessarily.

Severity Levels

dbt supports two severity levels for tests: error (the default) and warn.

error stops the pipeline. If you run dbt build, an error-severity test failure prevents downstream models from running. Use this for problems that would produce incorrect data if the pipeline continued:

Primary key violations (duplicates cause join explosions)
Freshness failures on critical tables (stale data is worse than no data)
NULL values in required fields that feed dashboards
Value ranges where violations indicate data corruption

warn logs the failure but lets the pipeline continue. Use this for problems that need investigation but shouldn’t block production:

columns:
  - name: order_value
    tests:
      - dbt_expectations.expect_column_mean_to_be_between:
          min_value: 50
          max_value: 200
          config:
            severity: warn

Statistical tests like expect_column_mean_to_be_between are natural candidates for warn severity. A shift in average order value might signal a real problem, or it might reflect a seasonal trend, a successful promotion, or a market shift. You want visibility without halting production while you investigate.

Severity Guidelines by Test Type

Test category	Recommended severity	Reasoning
Primary key (`unique`, `not_null`)	`error`	Duplicates and nulls in keys corrupt all downstream joins
Freshness (`expect_row_values_to_have_recent_data`)	`error` on critical tables, `warn` on others	Stale data in a finance mart is an emergency; stale data in a marketing staging table can wait
Format (`expect_column_values_to_match_regex`)	`error`	Invalid formats typically indicate upstream schema changes
Range (`expect_column_values_to_be_between`)	`error`	Out-of-range values usually indicate data corruption
Statistical (`expect_column_mean_to_be_between`)	`warn`	Distribution shifts need investigation, not pipeline halts
Volume (`expect_table_row_count_to_be_between`)	`warn` or `error`	Depends on how tightly you can define expected bounds
Completeness (`expect_row_values_to_have_data_for_every_n_datepart`)	`warn`	Missing days may reflect source delays, not failures

The goal is a test suite where error means “stop everything and fix this now” and warn means “look at this today.” If your team ignores warnings, you have too many of them. If your pipeline stops daily, you have too many errors on volatile tests.

Performance on BigQuery

Data quality tests run SQL against your warehouse. On BigQuery’s on-demand pricing, every test scans data and costs money. Some tests are cheap (a COUNT(*) with a partition filter). Others are expensive (a full-table statistical computation). Managing this cost is a first-class concern.

Filter on Partition Columns

If your table is partitioned by date, always use row_condition to filter on the partition column:

- dbt_expectations.expect_column_values_to_be_between:
    min_value: 0
    max_value: 1000000
    row_condition: "event_date >= current_date() - 30"

Without this filter, BigQuery scans the entire table. With it, BigQuery prunes to the last 30 partitions. On a table with years of event data, this can reduce bytes scanned — and cost — by 90% or more.

This is especially important for these tests, which tend to be the most expensive:

expect_row_values_to_have_data_for_every_n_datepart (generates a date spine and joins)
expect_column_mean_to_be_between (aggregates across all rows)
expect_column_values_to_match_regex (evaluates a function on every row)
expect_table_row_count_to_equal_other_table (full count on two tables)

Run Expensive Tests Only in Production

Some tests add value in production but are wasteful in development or CI. Use dbt’s enabled config with target-aware Jinja:

- dbt_expectations.expect_column_mean_to_be_between:
    min_value: 50
    max_value: 200
    config:
      enabled: "{{ target.name == 'prod' }}"

This compiles the test only when running against the prod target. In development and CI, the test is skipped entirely — no compilation, no warehouse query, no cost.

Use this pattern for:

Statistical tests that need production-scale data to be meaningful
Volume checks where development datasets are intentionally small
Cross-table comparisons where both tables must be fully materialized

Tag-Based Test Scheduling

For tests that should run on a schedule rather than every build, use dbt tags:

- dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
    date_col: event_date
    date_part: day
    config:
      tags: ['slow', 'daily']

Then in your CI pipeline, exclude slow tests:

dbt test --exclude tag:slow

And in your daily production run, include everything:

dbt test

Or run slow tests on their own schedule:

dbt test --select tag:slow

This gives you fast CI feedback (under 5 minutes) while still running comprehensive validation daily. The tag names are arbitrary — slow, daily, weekly, expensive, nightly — pick whatever convention your team will actually use.

Estimating Test Costs

A rough framework for BigQuery on-demand pricing ($6.25 per TB scanned as of 2026):

Test type	Typical scan	Cost on a 100GB table
`unique` / `not_null` (single column)	Column only	~$0.01
`expect_column_values_to_be_between` (with partition filter)	Filtered partition	~$0.001
`expect_column_values_to_be_between` (no filter)	Full column	~$0.01
`expect_column_mean_to_be_between` (full table)	Full column	~$0.01
`expect_row_values_to_have_data_for_every_n_datepart` (full table)	Full table + date spine	~$0.06
`expect_table_row_count_to_equal_other_table`	Two full tables	~$0.12

Individual tests are cheap. The cost adds up when you have 200+ tests running multiple times per day across dozens of models. Partition filtering and tag-based scheduling are the two highest-leverage optimizations.

Test Placement by Layer

Where you place tests in your three-layer architecture affects both coverage and cost.

Sources and base models: Focus on freshness, schema validation, and basic format checks. This is where problems enter your pipeline. Catching them here prevents cascading failures downstream.

models:
  - name: base__ga4__events
    tests:
      - dbt_expectations.expect_row_values_to_have_recent_data:
          datepart: hour
          interval: 48
    columns:
      - name: user_pseudo_id
        tests:
          - dbt_expectations.expect_column_values_to_match_regex:
              regex: '^[0-9]+\\.[0-9]+$'

Intermediate models: Focus on join integrity and transformation validation. Verify that joins don’t drop or duplicate rows unexpectedly. These tests are often the first signal that something broke upstream.

Marts: Focus on business rules and aggregation sanity checks. These tests protect your final outputs — the tables that BI tools query and stakeholders rely on.

models:
  - name: mrt__marketing__campaign_performance
    columns:
      - name: roas
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              max_value: 100
              row_condition: "spend > 0"

The general principle: test intensity should increase toward the edges of your DAG. Sources are where problems enter. Marts are where they reach consumers. The middle layers get lighter coverage focused on join integrity.

Starting Point for New Projects

When adding dbt-expectations to an existing project, a minimal set of three tests on the most important model covers the most common gaps native dbt tests miss:

Freshness: expect_row_values_to_have_recent_data on the main timestamp column
Format: expect_column_values_to_match_regex on one key identifier column
Range: expect_column_values_to_be_between on one numeric KPI column

Expand from there based on which failures actually occur in production.