dbt Testing Anti-Patterns

Four common testing mistakes in dbt projects — over-testing with unit tests, happy-path-only coverage, drifting hardcoded thresholds, and testing warehouse functions — each with a specific fix.

Anti-Pattern 1: Unit Testing Everything

The instinct from software engineering is understandable: aim for high test coverage. In dbt, this instinct leads to diminishing returns fast.

A model that selects five columns from a source table and renames them has nothing to unit test. The SQL is SELECT column_a AS better_name FROM source_table. There’s no logic. There’s no edge case. A unit test that mocks column_a = 'hello' and expects better_name = 'hello' proves that dbt can execute a SELECT statement — which is not in question.

The same applies to simple aggregations. SUM(revenue), COUNT(DISTINCT customer_id), MAX(created_at) — these are warehouse functions. BigQuery tests them extensively. Your unit test budget should go to the 5-10% of models where transformation logic is complex enough that bugs are genuinely plausible.

What to do instead: Apply the decision framework. If the logic isn’t complex enough to plausibly have bugs, skip the unit test. Add a not_null data test on the output column instead — it’s cheaper, runs against real data, and catches a broader class of problems.

Where unit tests earn their keep: Multi-branch CASE WHEN logic, window functions with specific frame definitions, date calculations with boundary conditions, string parsing with regex, incremental merge logic. These are the models where a subtle bug hides for weeks before someone notices the numbers are wrong.

Anti-Pattern 2: Only Testing Happy Paths

Your unit tests should make you uncomfortable. If every fixture row represents a clean, well-formed input that exercises the expected path, you’re testing that your code works when everything goes right. That’s the easy case. The hard case is what happens when things go wrong.

The inputs that break SQL transformations are predictable:

NULLs where you don’t expect them. A CASE WHEN status = 'active' doesn’t match NULL — it falls through to the ELSE branch. Is that what you intended?
Empty strings that behave differently from NULLs in string operations. CONCAT('prefix_', NULL) returns NULL, but CONCAT('prefix_', '') returns 'prefix_'.
Boundary values at the edges of your CASE WHEN conditions. If your logic says WHEN amount > 100 THEN 'high', what happens at exactly 100?
Zero values in denominators. A revenue-per-customer calculation where customer_count = 0 produces a division-by-zero error or an infinite value, depending on the warehouse.
Negative values that your source system “never sends” until it does. A discount calculation that assumes positive quantities will produce negative discounts when the quantity is negative.

unit_tests:
  - name: test_discount_handles_edge_cases
    model: mrt__finance__orders
    given:
      - input: ref('base__shopify__orders')
        rows:
          - {order_id: 1, quantity: 3, unit_price: 100, discount_rate: 0.1}    # happy path
          - {order_id: 2, quantity: -1, unit_price: 100, discount_rate: 0.1}   # negative qty
          - {order_id: 3, quantity: 0, unit_price: 100, discount_rate: 0.1}    # zero qty
          - {order_id: 4, quantity: 3, unit_price: 100, discount_rate: null}   # null discount
          - {order_id: 5, quantity: null, unit_price: 100, discount_rate: 0.1} # null qty
    expect:
      rows:
        - {order_id: 1, discount_amount: 30}
        - {order_id: 2, discount_amount: 0}   # clamped to zero
        - {order_id: 3, discount_amount: 0}
        - {order_id: 4, discount_amount: 0}   # no discount rate = no discount
        - {order_id: 5, discount_amount: 0}   # null qty = no discount

What to do instead: For every unit test, include at least one fixture row with NULL inputs, one with boundary values, and one with an input your source system “never sends.” These three additional rows per test catch the bugs that actually reach production.

The best trigger for adding edge-case rows: a production bug. When a bug is discovered, reproduce it as a unit test fixture before fixing the code. Verify the test fails. Fix the model. Verify the test passes. This builds a regression suite organically, focused on actual failure modes rather than imagined ones.

Anti-Pattern 3: Hardcoding Thresholds That Drift

This one is subtle because it works perfectly when you first deploy it:

- dbt_expectations.expect_table_row_count_to_be_between:
    min_value: 1000
    max_value: 2000

Your table has 1,500 rows. The test passes. Six months later, your table has 2,100 rows because the business is growing. The test fails. Not because anything is wrong — because your threshold is stale.

The team updates the max to 3,000. Another six months, another failure, another manual update. Eventually someone sets it to 10,000,000 out of frustration, and the test stops catching anything.

This is the threshold maintenance treadmill. Any test with hardcoded bounds on a metric that naturally changes over time will either require constant updates or get widened to the point of uselessness.

What to do instead: For metrics that drift over time (row counts, aggregate values, distribution statistics), use Elementary’s anomaly detection instead of static thresholds.

# Instead of hardcoded bounds:
- dbt_expectations.expect_table_row_count_to_be_between:
    min_value: 1000
    max_value: 2000

# Use adaptive detection:
- elementary.volume_anomalies:
    timestamp_column: created_at
    training_period:
      period: day
      count: 30

Elementary learns what’s normal from your historical data and alerts when metrics deviate beyond expected ranges. No hardcoded thresholds to maintain. No manual updates as data grows.

When static thresholds are still right: Values that have genuine, fixed bounds. Revenue should be non-negative. Conversion rates should be between 0 and 1. A percentage column should be between 0 and 100. These bounds don’t drift — they’re domain constraints. Use dbt_expectations.expect_column_values_to_be_between for these.

Anti-Pattern 4: Testing Warehouse Functions

Don’t unit test SUM(). Don’t unit test DATE_TRUNC(). Don’t unit test COALESCE(). These are warehouse-level functions that BigQuery, Snowflake, and every other data warehouse test extensively in their own test suites. They work. Your unit test proving that SUM(10, 20, 30) = 60 is testing BigQuery, not your code.

The boundary between “testing the warehouse” and “testing your logic” can be fuzzy. Here’s the rule of thumb: if you replaced your specific values with different values and the test would still be meaningful, you’re testing the warehouse function. If the test only makes sense with your specific values because of the business logic around them, you’re testing your logic.

# Testing the warehouse (don't do this):
unit_tests:
  - name: test_sum_works
    model: mrt__finance__monthly_revenue
    given:
      - input: ref('base__orders')
        rows:
          - {month: '2024-01', revenue: 100}
          - {month: '2024-01', revenue: 200}
    expect:
      rows:
        - {month: '2024-01', total_revenue: 300}

# Testing your logic (do this):
unit_tests:
  - name: test_revenue_excludes_cancelled_refunds
    model: mrt__finance__monthly_revenue
    given:
      - input: ref('base__orders')
        rows:
          - {month: '2024-01', revenue: 100, status: 'completed'}
          - {month: '2024-01', revenue: 200, status: 'cancelled'}
          - {month: '2024-01', revenue: 50, status: 'refunded', refund_type: 'partial'}
    expect:
      rows:
        - {month: '2024-01', total_revenue: 50}  # 100 - 50 partial refund

The first test proves SUM works. The second test proves your filtering and refund logic produces the correct revenue figure. The second test has value because the business rules around what counts as revenue are complex and could plausibly be wrong.

What to do instead: If the model contains no decisions or branches that could plausibly be wrong — if it is a straightforward application of warehouse functions to clean inputs — skip the unit test. Use data tests on the output instead: not_null on the aggregated column, expect_column_values_to_be_between for sanity bounds, unique on the grain.