dbt Migration Validation Patterns

Validating a dbt migration requires proving the new system produces the same results as the old one. The recommended approach is parallel execution — running both systems against the same source data — with three levels of comparison per model. The goal is identical outputs, provably.

Parallel Execution Setup

Keep your existing system (Dataform, legacy dbt project, or whatever you’re migrating from) running while building out the replacement. Configure the new dbt project to write to a separate dataset so both systems produce outputs from the same source data:

models:
  my_project:
    +schema: dbt_migration_validation

This creates a mirror of your production outputs in a validation dataset. Both systems read the same sources. Both produce the same models. The only difference should be the dataset name.

Run both systems on the same schedule for at least two weeks. One week catches most issues. Two weeks catches edge cases around weekly patterns, month boundaries, and weekend data that behaves differently from weekday data.

Comparison Queries

For each migrated model, run three levels of validation.

Level 1: Row Counts

The most basic check. If row counts differ, something is fundamentally wrong.

SELECT 'old_system' AS source, COUNT(*) AS row_count
FROM old_dataset.model_name
UNION ALL
SELECT 'new_dbt' AS source, COUNT(*) AS row_count
FROM dbt_migration_validation.model_name;

Row count mismatches usually trace back to:

Different deduplication logic (especially in incremental models)
Filter conditions that translated incorrectly
Source declarations pointing to slightly different tables
Timezone handling differences that shift records across date boundaries

Level 2: Aggregate Metrics

Row counts match, but do the numbers add up? Compare key business metrics:

SELECT
  ABS(old.total_revenue - new.total_revenue) AS revenue_diff,
  ABS(old.order_count - new.order_count) AS order_diff,
  ABS(old.avg_order_value - new.avg_order_value) AS aov_diff
FROM (
  SELECT
    SUM(revenue) AS total_revenue,
    COUNT(*) AS order_count,
    AVG(revenue) AS avg_order_value
  FROM old_dataset.orders
) old
CROSS JOIN (
  SELECT
    SUM(revenue) AS total_revenue,
    COUNT(*) AS order_count,
    AVG(revenue) AS avg_order_value
  FROM dbt_migration_validation.orders
) new;

Aggregate differences without row count differences point to:

Numeric precision changes (especially relevant when converting JavaScript arithmetic to Jinja)
NULL handling differences in SUM/AVG calculations
Rounding applied at different stages

Level 3: Row-Level Comparison

The most thorough check. Use dbt-audit-helper or manual EXCEPT queries to find exact row-level differences:

-- Find rows in old that aren't in new
SELECT * FROM old_dataset.model_name
EXCEPT DISTINCT
SELECT * FROM dbt_migration_validation.model_name;

-- Find rows in new that aren't in old
SELECT * FROM dbt_migration_validation.model_name
EXCEPT DISTINCT
SELECT * FROM old_dataset.model_name;

For tables with many columns, EXCEPT DISTINCT is expensive but definitive. For targeted comparison, pick the primary key plus the columns most likely to diverge (calculated fields, aggregations, timestamps).

The dbt-audit-helper package provides compare_relations and compare_column_values macros that automate this pattern and produce summary statistics.

ML Pipeline Regression Testing

If machine learning models consume your transformation outputs, standard row-level comparison is necessary but not sufficient. ML models are sensitive to changes that look trivial in a data comparison.

The cautionary tale: one team found that JavaScript and Jinja template behavior differed in ways that broke fraud detection model retraining. Numerical precision, null handling, and timestamp formatting diverged in subtle ways that didn’t show up in aggregate comparisons but shifted feature distributions enough to degrade model performance.

Run the ML training pipeline against dbt outputs before completing the migration. Compare:

Model performance metrics (accuracy, precision, recall, AUC) between training on old and new data
Feature distribution statistics (mean, median, standard deviation, percentiles) for each input feature
Prediction distributions on a held-out test set

If performance metrics shift by more than your acceptable tolerance (typically 1-2% for most production models), investigate the feature-level differences to find the root cause.

This testing adds time to the migration. It also prevents the scenario where you declare the migration complete, retire the old system, and discover three weeks later that model performance degraded because a generated WHERE clause threshold shifted by 0.0001.

Phased Cutover Strategy

Don’t migrate everything at once. Work in phases:

Phase 1: Base models. Convert source declarations and base models first. These are mechanical conversions with clear validation criteria. Run comparisons. Fix discrepancies.

Phase 2: Intermediate models. Convert joins and business logic. This is where most bugs hide. Validate aggregate metrics carefully.

Phase 3: Mart models. Convert the consumer-facing tables last. These are what stakeholders actually see, so differences here are the most visible.

Phase 4: Cutover. Point downstream consumers (BI tools, reverse ETL, ML pipelines) at the new datasets one at a time. Keep the old system running for at least one more week as a rollback option.

Each phase should have explicit sign-off criteria: row counts match, aggregate metrics match within tolerance, and no row-level differences beyond expected rounding.

When “Close Enough” Is the Right Answer

Perfect equivalence is sometimes impossible. Floating-point arithmetic, timezone handling, and NULL propagation can create differences that are technically correct in both systems but produce different results.

Document these known differences. If the old system rounds at step 3 and the new system rounds at step 5, the final values may differ by a fraction of a cent. That’s acceptable if stakeholders agree. What’s not acceptable is undocumented differences that surface as “the numbers changed” after cutover.

Create a migration validation log — a document listing every model, its validation status, any known differences, and the sign-off from the model’s owner. This becomes your audit trail and your rollback reference if something surfaces post-migration.