The migration from dbt to Dataform follows a predictable sequence, but the effort is not uniformly distributed. Steps 1 and 2 — auditing and running the automated tool — are fast. Step 3 — converting macros to JavaScript includes — is where projects slow down or stall entirely. Steps 4 through 6 are mechanical but require careful execution.
Before starting, verify that migration makes sense for your project. Check whether you rely on features that have no Dataform equivalent: snapshots, heavy package usage, microbatch incremental strategy, or Slim CI. If any of these apply significantly, migration may cost more than it saves.
Step 1: Audit Your Project
Before touching any code, understand exactly what you have. Surprises during migration are expensive.
# Count models by typefind models -name "*.sql" | wc -l
# List macro filesfind macros -name "*.sql"
# Check packages.yml for dependenciescat packages.ymlDocument:
- Total model count — scales the overall effort estimate
- Number of custom macros — the primary cost driver; budget 40-60% of total migration time if you have 20+ custom macros
- External packages used — each package dependency is a conversion task; dbt_utils is manageable, dbt_expectations is significant, source-specific packages (dbt_ga4, dbt_shopify) are substantial
- Incremental model strategies — standard
updated_at > last_runis low complexity; anything involving microbatch or complex merge logic needs manual work - Snapshot tables — each snapshot needs a manual SCD2 implementation in Dataform
- CI/CD setup — if you’re relying on dbt Cloud’s Slim CI, budget time to rebuild CI automation
This audit determines whether you’re looking at a 1-2 week project or a 2-3 month one. Don’t skip it.
Step 2: Run the Automated Migration Tool
The ra_dbt_to_dataform tool handles the mechanical parts of conversion:
# Clone the migration toolgit clone https://github.com/rittmananalytics/ra_dbt_to_dataform.gitcd ra_dbt_to_dataform
# Install dependenciespip install -r requirements.txt
# Run migrationpython migrate.py --dbt-project /path/to/dbt --output /path/to/dataformThe tool converts:
- Model references —
{{ ref('model') }}to${ref("model")} - Source declarations — generating Dataform declaration files from your
sources.yml - Common
dbt_utilsfunctions —surrogate_key,star, and a subset of date utilities - Basic incremental logic —
{% if is_incremental() %}to${when(incremental(), ...)}
It won’t handle:
- Seeds (CSV files become BigQuery tables + declaration files, requiring manual work)
- Snapshots (no equivalent; manual SCD2 implementation required)
- Complex custom macros (the tool uses GPT-4 for macro translation — an acknowledgment that Jinja-to-JavaScript conversion is too irregular for deterministic tooling)
- Semantic layer definitions
Treat the tool output as a starting point, not a finished product. Plan to review every converted file. The tool handles the structural conversion reliably but can produce JavaScript include files that compile but behave differently from the original Jinja. Numeric precision, null handling, and string coercion behave differently between the two templating systems — see JavaScript vs Jinja in Analytics Engineering for the specifics.
Step 3: Convert Macros to JavaScript Includes
This step dominates the migration timeline for any non-trivial project. Every custom Jinja macro becomes a JavaScript function in Dataform’s includes/ directory.
The conversion pattern is consistent for utility functions:
dbt macro (macros/generate_surrogate_key.sql):
{% macro generate_surrogate_key(field_list) %} TO_HEX(MD5(CONCAT( {% for field in field_list %} COALESCE(CAST({{ field }} AS STRING), '') {% if not loop.last %}, '|', {% endif %} {% endfor %} ))){% endmacro %}Dataform JavaScript (includes/utils.js):
function generateSurrogateKey(fields) { const fieldExpressions = fields .map(f => `COALESCE(CAST(${f} AS STRING), '')`) .join(", '|', "); return `TO_HEX(MD5(CONCAT(${fieldExpressions})))`;}
module.exports = { generateSurrogateKey };Usage in SQLX:
config { type: "table" }
js { const { generateSurrogateKey } = require("includes/utils");}
SELECT ${generateSurrogateKey(["customer_id", "order__created_at"])} AS surrogate_key, customer_id, order__created_at, order__total_usdFROM ${ref("base__source__orders")}Simple utility functions like this translate cleanly. The painful cases:
- Macros that call
run_query()— Jinja macros can execute SQL at compile time. Dataform has no equivalent; the logic must be restructured. - Macros using dbt context objects —
this,target,model,graphhave no direct Dataform equivalents. - Package macros — Any macro from dbt_utils, dbt_expectations, or other packages must be reimplemented or replaced with custom assertions.
- JavaScript methods with no Jinja equivalent — The reverse problem: Jinja has
forloops,ifblocks,set, andnamespace, but no.map(),.filter(), or.reduce(). If your macros use complex array or object manipulation, you rewrite algorithms, not just syntax.
Budget conservatively. A project with 20 custom macros and several package dependencies can spend most of its migration timeline on this step alone.
Step 4: Recreate Tests as Assertions
Dataform’s built-in assertions handle three cases inline in the config block:
config { type: "table", assertions: { uniqueKey: ["order_id"], nonNull: ["order_id", "customer_id", "order__created_at"], rowConditions: [ "order__total_usd >= 0", "order__created_at <= CURRENT_DATE()" ] }}This covers uniqueness, nulls, and simple row-level conditions — equivalent to dbt’s unique, not_null, and expression_is_true tests.
For anything beyond these three types — referential integrity, regex pattern matching, distribution checks, cross-table comparisons — you need separate assertion files:
-- definitions/assertions/assert_valid_customer_emails.sqlxconfig { type: "assertion" }
SELECT customer_id, customer__emailFROM ${ref("mrt__marketing__customers")}WHERE customer__email NOT LIKE '%@%.%' OR customer__email IS NULLIf the query returns rows, the assertion fails. This is Dataform’s equivalent of dbt’s singular tests. It works, but every non-standard test is a new file with handwritten SQL. There’s no parameterization or test library to draw from.
Tests from dbt_expectations that need custom assertion files include:
- Distribution checks (mean/median within range)
- Regex pattern validation
- Cross-table comparisons (referential integrity)
- Freshness checks beyond what’s in the assertions block
See Dataform Testing Limitations for the full scope of what you lose on the testing side.
Step 5: Set Up Orchestration
dbt Cloud has built-in scheduling. Dataform does not — or rather, it has basic scheduling via workflow configurations, but triggering from git events, building PR environments, and running on custom schedules requires external orchestration.
Two GCP-native options:
Cloud Composer (managed Airflow) — For complex pipelines with conditional branching, cross-system dependencies, or advanced retry logic:
from airflow.providers.google.cloud.operators.dataform import ( DataformCreateCompilationResultOperator, DataformCreateWorkflowInvocationOperator,)
compile_task = DataformCreateCompilationResultOperator( task_id="compile", project_id="my-project", region="us-central1", repository_id="my-repo",)
run_task = DataformCreateWorkflowInvocationOperator( task_id="run", project_id="my-project", region="us-central1", repository_id="my-repo", compilation_result="{{ task_instance.xcom_pull('compile') }}",)Composer runs $300-400/month minimum. See Cloud Composer Cost and Capabilities for when that cost is justified.
Cloud Scheduler + Cloud Workflows — For simpler scheduling needs without Composer’s overhead:
main: steps: - compile: call: http.post args: url: https://dataform.googleapis.com/v1beta1/projects/PROJECT/locations/REGION/repositories/REPO/compilationResults auth: type: OAuth2 - run: call: http.post args: url: https://dataform.googleapis.com/v1beta1/projects/PROJECT/locations/REGION/repositories/REPO/workflowInvocationsCloud Workflows Orchestration costs $0.01 per 1,000 steps — essentially free for typical transformation pipelines. Cloud Scheduler handles the cron trigger. This combination covers most Dataform scheduling needs without Composer’s fixed cost.
If you need CI-like behavior (build only changed models on PRs), you’ll need GitHub Actions or Cloud Build calling the Dataform REST API. Budget additional setup time for that infrastructure.
Step 6: Parallel Run and Validate
Don’t cut over immediately. Run both systems simultaneously until you’re confident the outputs match.
- Deploy the Dataform project to a separate dataset (e.g.,
analytics_dataform) - Schedule both dbt and Dataform to run on the same cadence
- Compare outputs using row counts and checksums
A practical validation query:
SELECT 'dbt' AS source, COUNT(*) AS row_count, FARM_FINGERPRINT(TO_JSON_STRING(ARRAY_AGG(t ORDER BY order_id))) AS checksumFROM `analytics.mrt__sales__orders` tUNION ALLSELECT 'dataform' AS source, COUNT(*) AS row_count, FARM_FINGERPRINT(TO_JSON_STRING(ARRAY_AGG(t ORDER BY order_id))) AS checksumFROM `analytics_dataform.mrt__sales__orders` t- Validate downstream dashboards against both sources
- Monitor for 2-4 weeks before decommissioning dbt
The dbt Migration Validation Patterns note covers this step in depth — comparison query levels (row counts, aggregates, row-level EXCEPT queries), ML pipeline regression testing, and phased cutover strategy. Read that before declaring the migration complete.
The most common surprises during parallel running: incremental model state doesn’t transfer (the first Dataform run must be a full refresh on any incremental model, which can be expensive for large tables), and subtle behavioral differences in JavaScript vs Jinja arithmetic that produce slightly different results on calculated fields.