The test reference catalogs what each test does. This note shows how to apply those tests to a real BigQuery analytics project. The examples use GA4 event data and advertising performance models — two of the most common data sources in marketing analytics — but the patterns generalize to any BigQuery project.
GA4 Events Base Model
GA4 data has known quirks: 24-48 hour processing delays, event names that should follow a snake_case convention, a user_pseudo_id with a specific numeric format, and timestamps that should never be in the future. A base model test suite should encode all of these:
version: 2
models: - name: base__ga4__events description: "Base GA4 events with basic cleaning applied"
tests: # Table should have recent data (accounting for GA4's 24-48h delay) - dbt_expectations.expect_row_values_to_have_recent_data: datepart: hour interval: 48
# No missing days in the time series - dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart: date_col: event_date date_part: day test_start_date: "date_sub(current_date(), interval 90 day)" test_end_date: "date_sub(current_date(), interval 2 day)"
columns: - name: event_id description: "Unique event identifier" tests: - unique - not_null
- name: event_timestamp description: "Event timestamp in UTC" tests: - not_null - dbt_expectations.expect_column_values_to_be_between: min_value: "'2020-01-01 00:00:00'" max_value: "current_timestamp()"
- name: event_name description: "GA4 event name" tests: - not_null - dbt_expectations.expect_column_values_to_match_regex: regex: '^[a-z_]+$'A few things to notice:
The freshness interval is 48 hours, not 24. GA4’s export to BigQuery has a well-documented delay. Setting this to 24 hours would produce false positives every time the export runs late, which erodes trust in the test suite. Match the interval to the source’s actual SLA, not an ideal.
The time series completeness test bounds are partition-aware. date_sub(current_date(), interval 90 day) as the start and date_sub(current_date(), interval 2 day) as the end. The 90-day lookback avoids scanning the entire table history (which can be years of events). The 2-day buffer on the end avoids failing because today’s and yesterday’s data haven’t finished processing. On BigQuery, these date expressions enable partition pruning, keeping costs low.
The event_name regex is strict. GA4 event names should be lowercase snake_case. If you see Page_View or purchase-complete in production, something is misconfigured in the GA4 property or a custom event was named incorrectly. Catching this at the base layer prevents downstream models from silently dropping events they don’t recognize.
Advertising Performance Mart
An ads performance mart aggregates data across platforms (Google Ads, Meta Ads, etc.) into a unified daily grain. The test suite validates the composite key, recency, value ranges, and business-rule sanity:
version: 2
models: - name: mrt__marketing__ads_performance description: "Daily ads performance by campaign"
tests: # Composite primary key - dbt_expectations.expect_compound_columns_to_be_unique: column_list: ["date", "platform", "campaign_id"]
# Should have recent data - dbt_expectations.expect_row_values_to_have_recent_data: datepart: day interval: 2
columns: - name: date tests: - not_null - dbt_expectations.expect_column_values_to_be_between: min_value: "'2023-01-01'" max_value: "current_date()"
- name: platform tests: - not_null - accepted_values: values: ['google_ads', 'meta_ads', 'tiktok_ads', 'linkedin_ads']
- name: campaign_id tests: - not_null
- name: impressions tests: - dbt_expectations.expect_column_values_to_be_between: min_value: 0 max_value: 1000000000
- name: clicks tests: - dbt_expectations.expect_column_values_to_be_between: min_value: 0 max_value: 100000000
- name: spend tests: - dbt_expectations.expect_column_values_to_be_between: min_value: 0 max_value: 10000000 # Warn if average daily spend seems off - dbt_expectations.expect_column_mean_to_be_between: min_value: 10 max_value: 100000 row_condition: "date >= date_sub(current_date(), interval 30 day)" config: severity: warn
- name: conversions tests: - dbt_expectations.expect_column_values_to_be_between: min_value: 0 max_value: 1000000
- name: roas description: "Return on ad spend" tests: # ROAS should be between 0 and 100 (where it's calculable) - dbt_expectations.expect_column_values_to_be_between: min_value: 0 max_value: 100 row_condition: "spend > 0" config: severity: warnKey patterns in this example:
The composite key uses expect_compound_columns_to_be_unique rather than a single-column unique test. The table’s grain is one row per date-platform-campaign combination. Native dbt’s unique test can’t express this. Without composite key validation, duplicate rows silently inflate metrics — a campaign appears to have double the spend, double the impressions, and suddenly your ROAS calculations are meaningless.
The spend column has both a row-level range check and a statistical check. The range check (0 to 10,000,000) catches obviously invalid individual values. The mean check (10 to 100,000) catches distribution shifts where individual values are fine but the overall pattern is wrong. A source that suddenly sends all zero-spend rows would pass the range check but fail the mean check.
The mean check uses row_condition for partition filtering. date >= date_sub(current_date(), interval 30 day) limits the statistical computation to the last 30 days. Without this, BigQuery scans the entire table to compute the average — expensive and increasingly irrelevant as you accumulate historical data. The partition filter keeps costs low and makes the statistic reflect recent behavior rather than all-time averages.
ROAS validation uses row_condition: "spend > 0" to avoid dividing by zero or testing rows where ROAS is undefined. The warn severity acknowledges that ROAS outliers might be real (a viral campaign with minimal spend can have legitimate 50x ROAS) — they need investigation, not pipeline halts.
Test Placement by DAG Layer
Where you place tests matters as much as which tests you choose. The three-layer architecture suggests different testing priorities at each layer.
Base Layer: Catch Problems at the Source
Focus on freshness, schema validation, and format checks. Problems caught here prevent cascading failures through your entire DAG:
models: - name: base__ga4__events tests: - dbt_expectations.expect_row_values_to_have_recent_data: datepart: hour interval: 48 columns: - name: user_pseudo_id tests: - dbt_expectations.expect_column_values_to_match_regex: regex: '^[0-9]+\.[0-9]+$'Base models are your first line of defense. A format change in user_pseudo_id caught here produces a clear, immediate signal. The same problem caught at the mart layer, after joins and aggregations, produces a confusing cascade of downstream failures.
Intermediate Layer: Validate Join Integrity
Intermediate models primarily join and reshape data. The critical tests are row count comparisons (joins shouldn’t silently drop rows) and referential integrity:
models: - name: int__orders__enriched tests: - dbt_expectations.expect_table_row_count_to_equal_other_table: compare_model: ref('base__shopify__orders')If the base has 50,000 orders and the intermediate has 49,000 after enrichment, a join condition is dropping rows. This test surfaces the issue before the missing orders propagate to marts and dashboards.
Mart Layer: Enforce Business Rules
Marts are the tables stakeholders query. Tests here protect final outputs with business-rule validation and aggregation sanity checks:
models: - name: mrt__marketing__campaign_performance columns: - name: roas tests: - dbt_expectations.expect_column_values_to_be_between: min_value: 0 max_value: 100 row_condition: "spend > 0"The general principle: test intensity should increase toward the edges of your DAG. Sources are where problems enter. Marts are where they reach consumers. The middle layers get lighter coverage focused on join integrity.
Starting point: three tests on one model
After installing dbt-expectations, a practical starting point is three tests on the most critical model:
1. Freshness. expect_row_values_to_have_recent_data on the main timestamp column — catches stale data.
columns: - name: updated_at tests: - dbt_expectations.expect_row_values_to_have_recent_data: datepart: hour interval: 242. Format. expect_column_values_to_match_regex on one key identifier column — catches upstream format changes.
columns: - name: customer_email tests: - dbt_expectations.expect_column_values_to_match_regex: regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' row_condition: "customer_email is not null"3. Range. expect_column_values_to_be_between on one numeric KPI column — catches impossible values.
columns: - name: revenue tests: - dbt_expectations.expect_column_values_to_be_between: min_value: 0 max_value: 10000000Expand from these based on what breaks in production — incidents should drive test coverage rather than anticipated failure modes.
Expanding Coverage
Once your first three tests are running, use this priority order to expand:
- Primary keys on every model —
unique+not_null(orexpect_compound_columns_to_be_uniquefor composite keys). These are non-negotiable. - Freshness on every source-adjacent model —
expect_row_values_to_have_recent_dataon base models that feed from external sources. - Range checks on every KPI column —
expect_column_values_to_be_betweenon metrics that feed dashboards. - Format validation on identifiers —
expect_column_values_to_match_regexon columns with known formats (emails, SKUs, IDs). - Statistical checks on critical aggregates —
expect_column_mean_to_be_betweenwithseverity: warnon high-value metrics. - Time series completeness —
expect_row_values_to_have_data_for_every_n_dateparton models where missing days would go unnoticed.
Every production incident should result in a new test that prevents recurrence.