dbt-expectations BigQuery Implementation Patterns

The test reference catalogs what each test does. This note shows how to apply those tests to a real BigQuery analytics project. The examples use GA4 event data and advertising performance models — two of the most common data sources in marketing analytics — but the patterns generalize to any BigQuery project.

GA4 Events Base Model

GA4 data has known quirks: 24-48 hour processing delays, event names that should follow a snake_case convention, a user_pseudo_id with a specific numeric format, and timestamps that should never be in the future. A base model test suite should encode all of these:

version: 2

models:
  - name: base__ga4__events
    description: "Base GA4 events with basic cleaning applied"

    tests:
      # Table should have recent data (accounting for GA4's 24-48h delay)
      - dbt_expectations.expect_row_values_to_have_recent_data:
          datepart: hour
          interval: 48

      # No missing days in the time series
      - dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
          date_col: event_date
          date_part: day
          test_start_date: "date_sub(current_date(), interval 90 day)"
          test_end_date: "date_sub(current_date(), interval 2 day)"

    columns:
      - name: event_id
        description: "Unique event identifier"
        tests:
          - unique
          - not_null

      - name: event_timestamp
        description: "Event timestamp in UTC"
        tests:
          - not_null
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: "'2020-01-01 00:00:00'"
              max_value: "current_timestamp()"

      - name: event_name
        description: "GA4 event name"
        tests:
          - not_null
          - dbt_expectations.expect_column_values_to_match_regex:
              regex: '^[a-z_]+$'

A few things to notice:

The freshness interval is 48 hours, not 24. GA4’s export to BigQuery has a well-documented delay. Setting this to 24 hours would produce false positives every time the export runs late, which erodes trust in the test suite. Match the interval to the source’s actual SLA, not an ideal.

The time series completeness test bounds are partition-aware. date_sub(current_date(), interval 90 day) as the start and date_sub(current_date(), interval 2 day) as the end. The 90-day lookback avoids scanning the entire table history (which can be years of events). The 2-day buffer on the end avoids failing because today’s and yesterday’s data haven’t finished processing. On BigQuery, these date expressions enable partition pruning, keeping costs low.

The event_name regex is strict. GA4 event names should be lowercase snake_case. If you see Page_View or purchase-complete in production, something is misconfigured in the GA4 property or a custom event was named incorrectly. Catching this at the base layer prevents downstream models from silently dropping events they don’t recognize.

Advertising Performance Mart

An ads performance mart aggregates data across platforms (Google Ads, Meta Ads, etc.) into a unified daily grain. The test suite validates the composite key, recency, value ranges, and business-rule sanity:

version: 2

models:
  - name: mrt__marketing__ads_performance
    description: "Daily ads performance by campaign"

    tests:
      # Composite primary key
      - dbt_expectations.expect_compound_columns_to_be_unique:
          column_list: ["date", "platform", "campaign_id"]

      # Should have recent data
      - dbt_expectations.expect_row_values_to_have_recent_data:
          datepart: day
          interval: 2

    columns:
      - name: date
        tests:
          - not_null
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: "'2023-01-01'"
              max_value: "current_date()"

      - name: platform
        tests:
          - not_null
          - accepted_values:
              values: ['google_ads', 'meta_ads', 'tiktok_ads', 'linkedin_ads']

      - name: campaign_id
        tests:
          - not_null

      - name: impressions
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              max_value: 1000000000

      - name: clicks
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              max_value: 100000000

      - name: spend
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              max_value: 10000000
          # Warn if average daily spend seems off
          - dbt_expectations.expect_column_mean_to_be_between:
              min_value: 10
              max_value: 100000
              row_condition: "date >= date_sub(current_date(), interval 30 day)"
              config:
                severity: warn

      - name: conversions
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              max_value: 1000000

      - name: roas
        description: "Return on ad spend"
        tests:
          # ROAS should be between 0 and 100 (where it's calculable)
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              max_value: 100
              row_condition: "spend > 0"
              config:
                severity: warn

Key patterns in this example:

The composite key uses expect_compound_columns_to_be_unique rather than a single-column unique test. The table’s grain is one row per date-platform-campaign combination. Native dbt’s unique test can’t express this. Without composite key validation, duplicate rows silently inflate metrics — a campaign appears to have double the spend, double the impressions, and suddenly your ROAS calculations are meaningless.

The spend column has both a row-level range check and a statistical check. The range check (0 to 10,000,000) catches obviously invalid individual values. The mean check (10 to 100,000) catches distribution shifts where individual values are fine but the overall pattern is wrong. A source that suddenly sends all zero-spend rows would pass the range check but fail the mean check.

The mean check uses row_condition for partition filtering. date >= date_sub(current_date(), interval 30 day) limits the statistical computation to the last 30 days. Without this, BigQuery scans the entire table to compute the average — expensive and increasingly irrelevant as you accumulate historical data. The partition filter keeps costs low and makes the statistic reflect recent behavior rather than all-time averages.

ROAS validation uses row_condition: "spend > 0" to avoid dividing by zero or testing rows where ROAS is undefined. The warn severity acknowledges that ROAS outliers might be real (a viral campaign with minimal spend can have legitimate 50x ROAS) — they need investigation, not pipeline halts.

Test Placement by DAG Layer

Where you place tests matters as much as which tests you choose. The three-layer architecture suggests different testing priorities at each layer.

Base Layer: Catch Problems at the Source

Focus on freshness, schema validation, and format checks. Problems caught here prevent cascading failures through your entire DAG:

models:
  - name: base__ga4__events
    tests:
      - dbt_expectations.expect_row_values_to_have_recent_data:
          datepart: hour
          interval: 48
    columns:
      - name: user_pseudo_id
        tests:
          - dbt_expectations.expect_column_values_to_match_regex:
              regex: '^[0-9]+\.[0-9]+$'

Base models are your first line of defense. A format change in user_pseudo_id caught here produces a clear, immediate signal. The same problem caught at the mart layer, after joins and aggregations, produces a confusing cascade of downstream failures.

Intermediate Layer: Validate Join Integrity

Intermediate models primarily join and reshape data. The critical tests are row count comparisons (joins shouldn’t silently drop rows) and referential integrity:

models:
  - name: int__orders__enriched
    tests:
      - dbt_expectations.expect_table_row_count_to_equal_other_table:
          compare_model: ref('base__shopify__orders')

If the base has 50,000 orders and the intermediate has 49,000 after enrichment, a join condition is dropping rows. This test surfaces the issue before the missing orders propagate to marts and dashboards.

Mart Layer: Enforce Business Rules

Marts are the tables stakeholders query. Tests here protect final outputs with business-rule validation and aggregation sanity checks:

models:
  - name: mrt__marketing__campaign_performance
    columns:
      - name: roas
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              max_value: 100
              row_condition: "spend > 0"

The general principle: test intensity should increase toward the edges of your DAG. Sources are where problems enter. Marts are where they reach consumers. The middle layers get lighter coverage focused on join integrity.

Starting point: three tests on one model

After installing dbt-expectations, a practical starting point is three tests on the most critical model:

1. Freshness. expect_row_values_to_have_recent_data on the main timestamp column — catches stale data.

columns:
  - name: updated_at
    tests:
      - dbt_expectations.expect_row_values_to_have_recent_data:
          datepart: hour
          interval: 24

2. Format. expect_column_values_to_match_regex on one key identifier column — catches upstream format changes.

columns:
  - name: customer_email
    tests:
      - dbt_expectations.expect_column_values_to_match_regex:
          regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
          row_condition: "customer_email is not null"

3. Range. expect_column_values_to_be_between on one numeric KPI column — catches impossible values.

columns:
  - name: revenue
    tests:
      - dbt_expectations.expect_column_values_to_be_between:
          min_value: 0
          max_value: 10000000

Expand from these based on what breaks in production — incidents should drive test coverage rather than anticipated failure modes.

Expanding Coverage

Once your first three tests are running, use this priority order to expand:

Primary keys on every model — unique + not_null (or expect_compound_columns_to_be_unique for composite keys). These are non-negotiable.
Freshness on every source-adjacent model — expect_row_values_to_have_recent_data on base models that feed from external sources.
Range checks on every KPI column — expect_column_values_to_be_between on metrics that feed dashboards.
Format validation on identifiers — expect_column_values_to_match_regex on columns with known formats (emails, SKUs, IDs).
Statistical checks on critical aggregates — expect_column_mean_to_be_between with severity: warn on high-value metrics.
Time series completeness — expect_row_values_to_have_data_for_every_n_datepart on models where missing days would go unnoticed.

Every production incident should result in a new test that prevents recurrence.