ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt-expectations Test Reference

A categorized reference of the highest-value dbt-expectations tests — table-level, pattern, range, multi-column, and completeness — with BigQuery-ready YAML examples.

Planted
dbtdata qualitytesting

A reference for the highest-value dbt-expectations tests by category — tests that fill gaps native dbt tests cannot cover. All examples use BigQuery-compatible SQL. dbt-expectations ships 50+ tests total; this note covers the most broadly applicable.

Table-Level Tests

expect_row_values_to_have_recent_data

Native dbt only offers freshness checks on sources. This test works on any model, catching stale data before dashboards show outdated numbers.

models:
- name: mrt__sales__orders
columns:
- name: order_timestamp
tests:
- dbt_expectations.expect_row_values_to_have_recent_data:
datepart: hour
interval: 24

This fails if no rows have an order_timestamp within the last 24 hours. For GA4 data, which typically has a 24-48 hour delay, set interval: 48. For daily batch models, use datepart: day with interval: 2 to allow for weekends or holidays.

expect_table_row_count_to_equal_other_table

Transformations should not silently drop rows. This test catches it when they do:

models:
- name: mrt__sales__orders
tests:
- dbt_expectations.expect_table_row_count_to_equal_other_table:
compare_model: ref('base__shopify__orders')

If your base model has 50,000 rows and your mart has 49,000, something went wrong in between. This is particularly useful during refactoring — when you rewrite a model, the row count should match the original unless you have a specific reason for it to differ.

expect_table_row_count_to_be_between

Detects unexpected volume changes. If your daily batch normally contains 10,000-100,000 rows and suddenly has 500, you want to know before anyone opens a dashboard:

models:
- name: base__ga4__events
tests:
- dbt_expectations.expect_table_row_count_to_be_between:
min_value: 10000
max_value: 100000

Query your data’s historical volumes to set reasonable bounds. If you’d rather not maintain static thresholds, Elementary’s volume_anomalies test learns the pattern automatically. Use expect_table_row_count_to_be_between when you know the bounds; use Elementary when you want adaptive detection.

Pattern Validation

expect_column_values_to_match_regex

Native dbt has no format validation tests. This test covers that category:

columns:
- name: customer_email
tests:
- dbt_expectations.expect_column_values_to_match_regex:
regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'

Other common patterns:

  • GA4 user_pseudo_id: '^[0-9]+\\.[0-9]+$'
  • GA4 event_name: '^[a-z_]+$'
  • Product SKU like PRD-1234: '^PRD-[0-9]{4}$'
  • ISO country code: '^[A-Z]{2}$'

Always combine regex tests with row_condition on nullable columns. Without it, NULL values fail the regex and produce false positives.

expect_column_values_to_match_like_pattern

When regex is overkill, use SQL LIKE patterns instead. Simpler to read, sufficient for prefix/suffix checks:

columns:
- name: product_sku
tests:
- dbt_expectations.expect_column_values_to_match_like_pattern:
like_pattern: 'PRD-%'

Value Range Validation

expect_column_values_to_be_between

Validates that values fall within defined bounds. Applies to numeric and date columns:

columns:
- name: order_value
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 1000000
- name: conversion_rate
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 1
- name: event_date
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: "'2020-01-01'"
max_value: "current_date()"

Note the quoting for string and date values: the outer quotes are YAML, the inner quotes are SQL. "'2020-01-01'" becomes '2020-01-01' in the compiled query.

expect_column_mean_to_be_between

Catches distribution shifts that row-level range checks miss. Individual values may all be within bounds while the overall distribution is anomalous:

columns:
- name: order_value
tests:
- dbt_expectations.expect_column_mean_to_be_between:
min_value: 50
max_value: 200

Query the data first to establish reasonable bounds (SELECT AVG(order_value) FROM your_model). Bounds should be wide enough that normal fluctuations pass, narrow enough that real anomalies fail.

For columns where the mean is meaningless (bimodal distributions, heavy outliers), pair this with expect_column_median_to_be_between or use Elementary’s column_anomalies instead.

Multi-Column Validation

expect_compound_columns_to_be_unique

Native unique only works on a single column. For tables with a composite primary key:

models:
- name: mrt__sales__order_lines
tests:
- dbt_expectations.expect_compound_columns_to_be_unique:
column_list: ["order_id", "line_item_id"]

Apply this to every table with a composite primary key. Join explosions from duplicate composite keys are among the hardest bugs to diagnose because they silently inflate metrics rather than producing errors.

expect_column_pair_values_A_to_be_greater_than_B

Validates business logic that spans two columns:

models:
- name: mrt__finance__subscriptions
tests:
- dbt_expectations.expect_column_pair_values_A_to_be_greater_than_B:
column_A: end_date
column_B: start_date
or_equal: true
row_condition: "end_date is not null"

Common use cases:

  • shipped_date > order_date
  • total_amount >= subtotal
  • updated_at >= created_at
  • end_date >= start_date

The or_equal: true parameter matters — without it, records where both dates are equal (same-day subscriptions, instant shipments) would fail.

Completeness Tests

expect_row_values_to_have_data_for_every_n_datepart

Detects gaps in time series data. If you’re missing an entire day of GA4 events, this test fails:

columns:
- name: event_date
tests:
- dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
date_col: event_date
date_part: day
test_start_date: "'2024-01-01'"
test_end_date: "current_date() - 1"

Always specify date bounds. Without test_start_date and test_end_date, this test scans your entire table history. On a large GA4 dataset, that means scanning millions of rows to check for gaps going back to the beginning of time. Bound it to a recent window (90 days, a year) to keep costs manageable.

For BigQuery specifically, use partition-aware date expressions:

test_start_date: "date_sub(current_date(), interval 90 day)"
test_end_date: "date_sub(current_date(), interval 2 day)"

The interval 2 day end bound accounts for GA4’s typical processing delay — you don’t want the test failing because today’s data hasn’t arrived yet.

Quick Reference

TestUse case
expect_row_values_to_have_recent_dataFreshness checks on any model
expect_table_row_count_to_equal_other_tableVerify transformations don’t drop rows
expect_table_row_count_to_be_betweenDetect volume anomalies
expect_column_values_to_match_regexFormat validation (emails, IDs, SKUs)
expect_column_values_to_match_like_patternSimple prefix/suffix format checks
expect_column_values_to_be_betweenValue range checks (numeric and date)
expect_column_mean_to_be_betweenDistribution shift detection
expect_compound_columns_to_be_uniqueComposite primary key validation
expect_column_pair_values_A_to_be_greater_than_BCross-column relationship validation
expect_row_values_to_have_data_for_every_n_datepartTime series completeness

For test configuration patterns (severity levels, performance optimization, prod-only execution), see dbt Test Severity and Performance Tuning. For the row_condition parameter that works across nearly all of these tests, see dbt-expectations row_condition Pattern.