ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt-expectations row_condition Pattern

How the row_condition parameter in dbt-expectations enables conditional test filtering — applying tests to specific segments without custom SQL.

Planted
dbtdata qualitytesting

Almost every test in the dbt-expectations package supports a row_condition parameter — a SQL WHERE clause that filters which rows the test applies to. This parameter removes the need for many custom singular tests by enabling conditional filtering directly in YAML.

The Problem It Solves

Real data isn’t uniform. A not_null check on account_id makes sense for active subscriptions but not for cancelled ones. An email regex is meaningful where emails exist but produces false positives on nullable fields. A value range for order_value differs between countries or order types.

Without row_condition, you have three bad options:

  1. Skip the test entirely because it doesn’t apply to all rows. You lose coverage on the rows where it does apply.
  2. Write a custom singular test with a WHERE clause baked in. This works but doesn’t scale — you end up with a tests/ directory full of one-off SQL files.
  3. Restructure your model to split segments into separate tables or add filtering columns. This changes your data model to accommodate testing, which is backwards.

row_condition provides a fourth option: apply any pre-built test to a specific segment of data, declaratively in YAML.

Core Patterns

Nullable Field Validation

The most common use case. You want to validate the format of a column, but NULLs are acceptable (the field is optional). Without row_condition, NULL values get tested against the regex and fail:

columns:
- name: email
tests:
- dbt_expectations.expect_column_values_to_match_regex:
regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'
row_condition: "email is not null"

This says: “every email that exists must be properly formatted, but it’s fine if email is NULL.” Without the row_condition, you’d need to choose between skipping the test or accepting false positives.

Status-Based Filtering

Business rules often depend on the state of a record. An active subscription must have an account. A shipped order must have a tracking number. A completed payment must have a non-null amount:

columns:
- name: account_id
tests:
- dbt_expectations.expect_column_values_to_not_be_null:
row_condition: "subscription_status = 'active'"
- name: tracking_number
tests:
- dbt_expectations.expect_column_values_to_not_be_null:
row_condition: "shipping_status = 'shipped'"
- name: payment_amount
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 1000000
row_condition: "payment_status = 'completed'"

Each test enforces a rule that only makes sense for a specific subset of records. The YAML is readable, declarative, and doesn’t require any custom SQL.

Segment-Specific Ranges

Different segments of your data may have different valid ranges. Order values in France differ from order values in the US. B2B transactions differ from B2C. Premium tier metrics differ from free tier:

columns:
- name: order_value
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 50000
row_condition: "country_code = 'FR' and order_status = 'completed'"
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 100000
row_condition: "country_code = 'US' and order_status = 'completed'"

You can stack multiple tests on the same column with different row_condition values. Each test evaluates independently. This gives you segment-specific validation without splitting your model into separate tables.

Partition-Aware Filtering for BigQuery

On BigQuery, row_condition doubles as a performance optimization tool. If your table is partitioned by date, filtering on the partition column avoids full table scans:

columns:
- name: order_value
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 1000000
row_condition: "event_date >= current_date() - 30"

Without this filter, the test scans the entire table. With it, BigQuery prunes to the last 30 days of partitions. On a table with years of history, this can reduce test cost by 90%+.

This pattern is especially important for the more expensive tests like expect_column_mean_to_be_between and expect_row_values_to_have_data_for_every_n_datepart, where full table scans on large datasets can be genuinely costly.

How It Works Under the Hood

The row_condition parameter compiles to a WHERE clause in the generated SQL. When you write:

- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 100
row_condition: "status = 'active'"

The compiled test SQL is roughly:

SELECT COUNT(*)
FROM your_model
WHERE status = 'active'
AND (column_value < 0 OR column_value > 100)

This means row_condition supports any valid SQL expression your warehouse understands: AND/OR logic, IN lists, subqueries (though subqueries in test conditions can be fragile), date functions, and NULL checks.

When Not to Use It

row_condition is not a substitute for proper data modeling. If you find yourself writing the same complex row_condition across ten tests, that’s a signal to create a filtered intermediate model or a view that pre-applies the filter. The test YAML should express what you’re validating, not replicate transformation logic.

Similarly, avoid row_condition expressions so complex that they become their own source of bugs. If the condition spans multiple lines or uses nested logic, a singular test (a standalone SQL file) is more maintainable. You get syntax highlighting, version control diffs, and the ability to run the query directly for debugging.

The sweet spot is conditions of 1-2 predicates: "status = 'active'", "email is not null", "country_code = 'FR' and order_date >= '2024-01-01'". These are readable at a glance and unlikely to harbor subtle bugs.