Validating a dbt migration requires proving the new system produces the same results as the old one. The recommended approach is parallel execution — running both systems against the same source data — with three levels of comparison per model. The goal is identical outputs, provably.
Parallel Execution Setup
Keep your existing system (Dataform, legacy dbt project, or whatever you’re migrating from) running while building out the replacement. Configure the new dbt project to write to a separate dataset so both systems produce outputs from the same source data:
models: my_project: +schema: dbt_migration_validationThis creates a mirror of your production outputs in a validation dataset. Both systems read the same sources. Both produce the same models. The only difference should be the dataset name.
Run both systems on the same schedule for at least two weeks. One week catches most issues. Two weeks catches edge cases around weekly patterns, month boundaries, and weekend data that behaves differently from weekday data.
Comparison Queries
For each migrated model, run three levels of validation.
Level 1: Row Counts
The most basic check. If row counts differ, something is fundamentally wrong.
SELECT 'old_system' AS source, COUNT(*) AS row_countFROM old_dataset.model_nameUNION ALLSELECT 'new_dbt' AS source, COUNT(*) AS row_countFROM dbt_migration_validation.model_name;Row count mismatches usually trace back to:
- Different deduplication logic (especially in incremental models)
- Filter conditions that translated incorrectly
- Source declarations pointing to slightly different tables
- Timezone handling differences that shift records across date boundaries
Level 2: Aggregate Metrics
Row counts match, but do the numbers add up? Compare key business metrics:
SELECT ABS(old.total_revenue - new.total_revenue) AS revenue_diff, ABS(old.order_count - new.order_count) AS order_diff, ABS(old.avg_order_value - new.avg_order_value) AS aov_diffFROM ( SELECT SUM(revenue) AS total_revenue, COUNT(*) AS order_count, AVG(revenue) AS avg_order_value FROM old_dataset.orders) oldCROSS JOIN ( SELECT SUM(revenue) AS total_revenue, COUNT(*) AS order_count, AVG(revenue) AS avg_order_value FROM dbt_migration_validation.orders) new;Aggregate differences without row count differences point to:
- Numeric precision changes (especially relevant when converting JavaScript arithmetic to Jinja)
- NULL handling differences in SUM/AVG calculations
- Rounding applied at different stages
Level 3: Row-Level Comparison
The most thorough check. Use dbt-audit-helper or manual EXCEPT queries to find exact row-level differences:
-- Find rows in old that aren't in newSELECT * FROM old_dataset.model_nameEXCEPT DISTINCTSELECT * FROM dbt_migration_validation.model_name;
-- Find rows in new that aren't in oldSELECT * FROM dbt_migration_validation.model_nameEXCEPT DISTINCTSELECT * FROM old_dataset.model_name;For tables with many columns, EXCEPT DISTINCT is expensive but definitive. For targeted comparison, pick the primary key plus the columns most likely to diverge (calculated fields, aggregations, timestamps).
The dbt-audit-helper package provides compare_relations and compare_column_values macros that automate this pattern and produce summary statistics.
ML Pipeline Regression Testing
If machine learning models consume your transformation outputs, standard row-level comparison is necessary but not sufficient. ML models are sensitive to changes that look trivial in a data comparison.
The cautionary tale: one team found that JavaScript and Jinja template behavior differed in ways that broke fraud detection model retraining. Numerical precision, null handling, and timestamp formatting diverged in subtle ways that didn’t show up in aggregate comparisons but shifted feature distributions enough to degrade model performance.
Run the ML training pipeline against dbt outputs before completing the migration. Compare:
- Model performance metrics (accuracy, precision, recall, AUC) between training on old and new data
- Feature distribution statistics (mean, median, standard deviation, percentiles) for each input feature
- Prediction distributions on a held-out test set
If performance metrics shift by more than your acceptable tolerance (typically 1-2% for most production models), investigate the feature-level differences to find the root cause.
This testing adds time to the migration. It also prevents the scenario where you declare the migration complete, retire the old system, and discover three weeks later that model performance degraded because a generated WHERE clause threshold shifted by 0.0001.
Phased Cutover Strategy
Don’t migrate everything at once. Work in phases:
Phase 1: Base models. Convert source declarations and base models first. These are mechanical conversions with clear validation criteria. Run comparisons. Fix discrepancies.
Phase 2: Intermediate models. Convert joins and business logic. This is where most bugs hide. Validate aggregate metrics carefully.
Phase 3: Mart models. Convert the consumer-facing tables last. These are what stakeholders actually see, so differences here are the most visible.
Phase 4: Cutover. Point downstream consumers (BI tools, reverse ETL, ML pipelines) at the new datasets one at a time. Keep the old system running for at least one more week as a rollback option.
Each phase should have explicit sign-off criteria: row counts match, aggregate metrics match within tolerance, and no row-level differences beyond expected rounding.
When “Close Enough” Is the Right Answer
Perfect equivalence is sometimes impossible. Floating-point arithmetic, timezone handling, and NULL propagation can create differences that are technically correct in both systems but produce different results.
Document these known differences. If the old system rounds at step 3 and the new system rounds at step 5, the final values may differ by a fraction of a cent. That’s acceptable if stakeholders agree. What’s not acceptable is undocumented differences that surface as “the numbers changed” after cutover.
Create a migration validation log — a document listing every model, its validation status, any known differences, and the sign-off from the model’s owner. This becomes your audit trail and your rollback reference if something surfaces post-migration.