You’ve been running Dataform on BigQuery, and it’s worked fine. The SQL compiles, the schedules run, and your stakeholders get their data on time. But something has shifted. Maybe your company acquired a Snowflake shop. Maybe you’re tired of implementing testing logic that dbt packages handle out of the box. Maybe you’ve noticed that every analytics engineer job posting lists dbt as a requirement.
Whatever the reason, you’re considering a migration. This guide walks through the process from evaluation to validation, covering both the mechanical conversions and the parts that take real effort.
When migration actually makes sense
Not every Dataform project should migrate. The decision comes down to three questions.
Are you going multi-warehouse? Dataform only works with BigQuery. If your organization is adopting Snowflake, Databricks, or Redshift alongside BigQuery, dbt’s adapter architecture becomes essential. This is the clearest migration signal.
Do you need the ecosystem? dbt’s package hub offers 200+ packages covering everything from data quality testing (dbt_expectations, Elementary) to marketing attribution (Velir’s dbt-ga4) to utility functions (dbt_utils). If you’re manually implementing functionality that exists as a dbt package, migration pays for itself in reduced maintenance.
Does career portability matter? dbt proficiency appears in most analytics engineering job postings. Dataform expertise remains valuable but concentrated in GCP-heavy organizations. If your team members care about resume building, dbt experience has broader applicability.
When to stay put
Migration carries real costs. Stay with Dataform if:
- Your JavaScript includes are complex. Dataform’s ability to generate models programmatically with standard JavaScript has no direct dbt equivalent. Converting sophisticated
.jsfiles that dynamically create dozens of models requires substantial rewriting. - ML pipelines depend on templating behavior. One team’s migration took two months, plus three additional weeks fixing issues where JavaScript templating differed from Jinja in ways that broke model retraining. The fraud that slipped through during that period cost more than years of licensing fees.
- Your use case is simple. If you’re running basic transformation models without complex incremental logic, Dataform’s free tier on BigQuery makes economic sense.
The break-even calculation: if migration takes three months of engineering time and you’re comparing against dbt Cloud licensing at $100/user/month, you need a 10-person team running for over two years before licensing savings alone justify the switch. Multi-warehouse needs or ecosystem requirements change this math immediately.
Understanding the conceptual translation
Before touching code, understand how Dataform concepts map to dbt. Some translations are direct; others require rethinking your approach.
Direct equivalents
| Dataform | dbt | Notes |
|---|---|---|
${ref("model")} | {{ ref('model') }} | Same concept, different syntax |
config { type: "table" } | {{ config(materialized='table') }} | Materialization declaration |
config { type: "view" } | {{ config(materialized='view') }} | Default in both tools |
.sqlx files | .sql files | File extension change |
definitions/ folder | models/ folder | Directory naming |
Conceptual shifts
Sources vs. declarations. Dataform uses declaration files to define source tables. dbt uses YAML source definitions that serve the same purpose but add freshness checks and documentation in a single location.
Dataform declaration:
declare({ database: "my-project", schema: "analytics_123456789", name: "events_*"});dbt source:
sources: - name: ga4 database: my-project schema: analytics_123456789 tables: - name: events identifier: "events_*" freshness: warn_after: {count: 24, period: hour}Jinja vs. JavaScript templating. Dataform lets you write actual JavaScript (loops, conditionals, functions) anywhere in your project. dbt uses Jinja2 templating, which looks similar but behaves differently. This philosophical gap, explored in my JavaScript vs Jinja comparison, affects every aspect of migration.
Dataform conditional:
${when(incremental(), `AND updated_at > (SELECT MAX(updated_at) FROM ${self()})`)}dbt conditional:
{% if is_incremental() %} AND updated_at > (SELECT MAX(updated_at) FROM {{ this }}){% endif %}The syntax is different but manageable. The real challenge comes with dynamic model generation, which we’ll cover later.
The gaps you must plan for
Some Dataform features have no direct dbt equivalent:
- Seeds. dbt seeds are CSV files that load as tables. Dataform has no equivalent, so you likely have BigQuery tables serving this purpose that you’ll declare as sources.
- Snapshots. dbt’s snapshot feature for SCD Type 2 tables requires manual implementation in Dataform. If you’ve built custom SCD logic, you’ll convert it to dbt’s simpler snapshot syntax.
- Packages. Dataform has no package ecosystem. Any custom includes you’ve written in JavaScript need conversion to dbt macros or replacement with existing packages.
Setting up your dbt environment for BigQuery
With the conceptual mapping clear, set up your dbt project.
dbt Core vs. dbt Cloud
For BigQuery teams migrating from Dataform, start with dbt Core unless you need Cloud-specific features immediately (see my dbt Core vs Cloud comparison for details). Reasons:
- Zero additional cost during migration (you’re already accustomed to free with Dataform)
- Local development matches your current workflow
- Upgrade to Cloud later if you need the semantic layer, Mesh, or enterprise governance
Install dbt with the BigQuery adapter:
pip install dbt-bigqueryProject initialization
Create a new dbt project:
dbt init my_projectConfigure your BigQuery connection in ~/.dbt/profiles.yml:
my_project: outputs: dev: type: bigquery method: oauth project: my-gcp-project dataset: dbt_dev threads: 4 location: US target: devDirectory structure comparison
Your Dataform project structure maps to dbt like this:
# Dataform # dbtdefinitions/ models/ sources/ base/ staging/ intermediate/ reporting/ marts/includes/ macros/dataform.json dbt_project.ymlThe models/ directory replaces definitions/. Organize by layer (base, intermediate, marts) rather than by source system to match dbt conventions. For a deeper look, see my guide to dbt model layers.
Converting your models step by step
Work through conversion in phases, validating each before moving to the next.
Phase 1: Basic table and view conversions
Start with simple models that have no incremental logic or complex templating. These are mechanical conversions.
Dataform:
config { type: "table", schema: "reporting"}
SELECT customer_id, SUM(order_total) AS lifetime_valueFROM ${ref("base_orders")}GROUP BY 1dbt:
{{ config( materialized='table', schema='reporting') }}
SELECT customer_id, SUM(order_total) AS customer__lifetime_valueFROM {{ ref('base__shopify__orders') }}GROUP BY 1The conversion is straightforward: change the config block syntax, swap ${ref()} for {{ ref() }}, and save with a .sql extension.
Phase 2: Source declarations
Replace Dataform declarations with dbt source YAML files. Create one source file per source system in your base folder.
version: 2
sources: - name: ga4_raw database: "{{ var('ga4_project') }}" schema: "{{ var('ga4_dataset') }}" tables: - name: events identifier: "events_*" description: "Raw GA4 event export"In your models, replace source references:
-- Dataform: ${ref("analytics_123456789", "events_*")}-- dbt:SELECT event_date, event_timestamp, event_name, user_pseudo_idFROM {{ source('ga4_raw', 'events') }}Define the variables in dbt_project.yml:
vars: ga4_project: my-gcp-project ga4_dataset: analytics_123456789Phase 3: Incremental models
Dataform’s incremental syntax translates to dbt’s is_incremental() macro (covered in depth in my dbt incremental models guide):
Dataform:
config { type: "incremental", uniqueKey: ["event_id"], updatePartitionFilter: "event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)"}
SELECT event_id, event_date, event_name, user_pseudo_idFROM ${ref("base_events")}${when(incremental(), `WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)`)}dbt:
{{ config( materialized='incremental', unique_key='event_id', partition_by={ 'field': 'event_date', 'data_type': 'date' }, incremental_strategy='merge') }}
SELECT event_id, event_date, event_name, user_pseudo_idFROM {{ ref('base__ga4__events') }}{% if is_incremental() %}WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY){% endif %}Key differences:
- dbt requires explicit
partition_byconfiguration for BigQuery - The
incremental_strategydefaults tomergefor BigQuery but making it explicit improves readability updatePartitionFilterbecomes part of yourWHEREclause logic
Phase 4: Testing migration
Dataform’s inline assertions become dbt’s YAML-based tests.
Dataform:
config { type: "table", assertions: { uniqueKey: ["customer_id"], nonNull: ["customer_id", "email"], rowConditions: ['email LIKE "%@%.%"'] }}dbt:
version: 2
models: - name: mrt__sales__customers columns: - name: customer_id tests: - unique - not_null - name: customer__email tests: - not_nullFor custom row conditions, use the dbt_expectations package:
- name: customer__email tests: - dbt_expectations.expect_column_values_to_match_regex: regex: "^.+@.+\\..+$"Install packages by adding to packages.yml:
packages: - package: calogica/dbt_expectations version: 0.10.4Then run dbt deps to install.
Handling the hard parts
The previous sections cover mechanical conversions. Now for the parts that require actual thought.
Macro conversion
Dataform’s JavaScript includes become dbt macros. The logic remains similar; the syntax changes significantly.
Dataform include:
function unnest_event_param(param_name, value_type = 'string_value') { return `(SELECT value.${value_type} FROM UNNEST(event_params) WHERE key = '${param_name}')`;}
module.exports = { unnest_event_param };dbt macro:
-- macros/unnest_event_param.sql{% macro unnest_event_param(param_name, value_type='string_value') %}(SELECT value.{{ value_type }} FROM UNNEST(event_params) WHERE key = '{{ param_name }}'){% endmacro %}Usage changes from ${helpers.unnest_event_param('page_location')} to {{ unnest_event_param('page_location') }}.
Dynamic model generation
Dataform’s JavaScript excels at dynamic model generation, while dbt has no clean equivalent. If you have code like this:
const countries = ["US", "GB", "FR", "DE"];countries.forEach(country => { publish(`reporting_${country}`) .dependencies(["source_table"]) .query(ctx => `SELECT * FROM ${ctx.ref("source_table")} WHERE country = '${country}'`);});dbt has no direct equivalent. Your options:
- Write individual models. If you have few variations, just create separate
.sqlfiles. - Use dbt_codegen. Generate the YAML and SQL files once, then maintain them manually.
- External preprocessing. Use a Python script to generate
.sqlfiles before dbt runs.
For most cases, option 1 is the right answer. Resist the urge to over-engineer.
Pre-operations and post-operations
Dataform’s pre_operations and post_operations become dbt hooks:
Dataform:
config { type: "table", pre_operations: ["DELETE FROM ${self()} WHERE date < DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)"]}dbt:
{{ config( materialized='table', pre_hook="DELETE FROM {{ this }} WHERE date < DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)") }}For complex pre/post operations that span multiple statements, consider custom materializations instead.
Validating the migration
Don’t flip the switch all at once. Run systems in parallel and validate thoroughly.
Parallel execution
Keep Dataform running while building out dbt. Configure dbt to write to a separate dataset:
models: my_project: +schema: dbt_migration_validationRun both systems against the same source data.
Comparison queries
For each migrated model, run validation:
-- Row count comparisonSELECT 'dataform' AS source, COUNT(*) AS row_count FROM dataform_dataset.model_nameUNION ALLSELECT 'dbt' AS source, COUNT(*) AS row_count FROM dbt_migration_validation.model_name;
-- Column-level comparison for key metricsSELECT ABS(d.total_revenue - t.total_revenue) AS revenue_diff, ABS(d.order_count - t.order_count) AS order_diffFROM (SELECT SUM(revenue) AS total_revenue, COUNT(*) AS order_count FROM dataform_dataset.orders) dCROSS JOIN (SELECT SUM(revenue) AS total_revenue, COUNT(*) AS order_count FROM dbt_migration_validation.orders) t;ML pipeline regression testing
If machine learning models consume your transformation outputs, test thoroughly. The cautionary tale from the research: one team found that JavaScript and Jinja template behavior differed in ways that broke model retraining. Numerical precision, null handling, and timestamp formatting can all diverge in subtle ways.
Run your ML training pipeline against dbt outputs before declaring victory. Compare model performance metrics, not just row counts.
The realistic timeline
Based on project complexity, here’s what to expect:
| Project size | Timeline | Primary effort |
|---|---|---|
| Small (~20 models) | 1-2 weeks | Mostly automated conversion |
| Medium (50-100 models) | 2-4 weeks | Macro conversion, testing setup |
| Large (100+ models, complex macros) | 2-3 months | JavaScript rewrite, validation |
| Enterprise with ML dependencies | 3-6 months | Parallel running, stakeholder sign-off |
What extends timelines:
- Complex JavaScript includes requiring manual rewriting
- Custom incremental strategies beyond simple merge
- ML pipelines requiring regression validation
- Stakeholder approval processes
- Parallel running requirements for compliance
Available tooling
Two open-source tools can accelerate migration:
ra_dbt_to_dataform - Despite the name suggesting the opposite direction, the same team provides patterns for conversion. Uses GPT-4 for complex macro conversion.
dataform-to-dbt - Node.js tool that handles refs, assertions, and view materializations. Run with npx dataform-to-dbt. Limitations: doesn’t handle JavaScript includes, incremental models, or complex pre-operations.
These tools handle perhaps 60-70% of a typical project. Plan for the remainder to be manual work.
Making the decision
Migration from Dataform to dbt is neither universally right nor universally wrong. It comes down to your specific situation.
Migrate when you’re going multi-warehouse, need the package ecosystem, or want enterprise features like the semantic layer and Mesh. The 2-3 month investment pays dividends in reduced maintenance and expanded capabilities.
Stay put when your Dataform project is stable, you’re BigQuery-only with no plans to change, and you’re not missing functionality. The “if it ain’t broke” principle applies.
Reconsider the timeline when you have complex JavaScript generation, ML pipeline dependencies, or stakeholders requiring extended parallel running. The two-month estimate becomes six months in these scenarios.
The tools transform SQL identically at the warehouse level. What differs is the ecosystem, the commercial model, and the career implications. Make the choice based on your organization’s trajectory, not on which syntax looks cleaner.