ServicesAboutNotesContact Get in touch →
EN FR
Note

Dataform Testing Limitations

Dataform's built-in assertions cover three scenarios — uniqueness, null checks, and row conditions. Everything else requires custom implementation.

Planted
dataformbigquerydbttestingdata quality

Testing is the most significant capability gap between Dataform and dbt. Dataform provides three types of built-in assertions. dbt’s ecosystem provides dozens of test types across multiple packages, plus native unit testing. The difference is structural, not incremental.

What Dataform Provides

Dataform’s assertion system is configured directly in the SQLX config block:

config {
type: "table",
assertions: {
uniqueKey: ["customer_id"],
nonNull: ["customer_id", "email"],
rowConditions: ['email LIKE "%@%.%"']
}
}

uniqueKey validates that the specified column (or column combination) contains no duplicates. This is equivalent to dbt’s unique generic test.

nonNull ensures listed columns contain no null values. Equivalent to dbt’s not_null generic test.

rowConditions evaluates arbitrary SQL expressions against every row. Any row where the condition evaluates to false is flagged. This is the most flexible of the three, roughly comparable to dbt-utils’ expression_is_true.

That is the complete built-in testing vocabulary. Three assertion types covering the most basic data quality checks.

What Dataform Lacks

Compare this to what the dbt testing ecosystem provides:

Referential Integrity

dbt’s relationships test validates that foreign key values exist in their parent table. Dataform has no built-in equivalent. To check referential integrity, you must write a custom assertion file — a separate SQL query that returns rows violating the relationship. For a project with dozens of foreign key relationships, this means dozens of manually maintained assertion files.

Statistical and Pattern Validation

The dbt_expectations package provides 50+ tests ported from the Great Expectations Python library:

  • expect_column_values_to_be_between — range validation catching impossible values like negative revenue or conversion rates above 1.0
  • expect_column_values_to_match_regex — pattern validation for emails, SKUs, phone numbers
  • expect_column_mean_to_be_between — distribution shift detection where individual values pass but aggregates signal problems
  • expect_row_values_to_have_recent_data — freshness checks on any model

None of these have Dataform equivalents. Each one you need must be written from scratch as a custom assertion.

Anomaly Detection

Elementary provides adaptive anomaly detection using historical patterns rather than static thresholds:

  • volume_anomalies — row count deviation from historical patterns
  • freshness_anomalies — adaptive monitoring of update frequency
  • column_anomalies — statistical tracking of column-level metrics
  • schema_changes — unexpected column additions, deletions, or type changes

This entire category of “unknown unknowns” testing does not exist in the Dataform world.

Unit Testing

dbt 1.8 introduced native unit tests that validate transformation logic with mocked inputs before data touches the warehouse:

unit_tests:
- name: test_discount_calculation
model: mrt__finance__orders
given:
- input: ref('base__shopify__orders')
rows:
- {order_id: 1, subtotal: 100, discount_code: "SAVE20"}
expect:
rows:
- {order_id: 1, discount_amount: 20, final_total: 80}

Unit tests answer “is my SQL correct?” rather than “is my data healthy?” Dataform has no equivalent mechanism. You cannot test transformation logic in isolation without running it against real data.

Custom Assertions in Dataform

Dataform’s escape hatch is the custom assertion file. You create a .sqlx file with type: "assertion" that returns rows violating a condition:

config {
type: "assertion",
schema: "assertions"
}
-- Fails if any customer has negative lifetime value
SELECT customer_id, lifetime_value
FROM ${ref("mrt_customers")}
WHERE lifetime_value < 0

If the query returns rows, the assertion fails. This is equivalent to dbt’s singular tests. It works, but every assertion is a separate file with handwritten SQL. There is no parameterization, no reuse pattern, no test library to draw from.

The Devoteam dataform-assertions package is one of the few third-party options available, providing some reusable assertion patterns. But it is a single community package versus dbt’s ecosystem of hundreds.

The Practical Impact

For projects with basic testing needs — primary key uniqueness, null checks, a handful of business rules — Dataform’s assertions are adequate. Many small projects genuinely do not need statistical distribution tests or anomaly detection.

The gap becomes painful as complexity grows. A project with 100+ models, dozens of foreign key relationships, and business-critical data quality requirements will spend significant engineering time building custom assertions that dbt teams get out of the box. That engineering time is a real cost that offsets Dataform’s zero licensing price.

The testing gap is self-reinforcing. Teams that lack easy testing tools tend to write fewer tests. The friction of writing custom assertions for every new test type means many valid tests never get written. In dbt, adding a relationships test is one line of YAML. In Dataform, it requires a new file with a custom query. The difference in friction determines the difference in coverage.