The dbt Cloud licensing bill arrives and someone on your team asks the question: “Couldn’t we just use Dataform? It’s free.”
At $100/user/month, a 10-person analytics team pays $12,000 annually for dbt Cloud. Dataform offers comparable transformation capabilities at zero licensing cost for BigQuery users. The math looks compelling. But migration decisions deserve more scrutiny than a simple cost comparison.
This guide covers the migration process for both simple and complex dbt projects, maps concepts between tools, and lays out when migration makes sense (and when it doesn’t).
Why teams consider migrating from dbt to Dataform
Three factors typically drive migration conversations:
Licensing costs. dbt Cloud’s $100/user/month adds up. Teams paying for features they don’t fully use (like the semantic layer or advanced CI) feel the pinch most acutely. dbt Core remains free, but you lose compilation speed improvements from dbt Fusion and native CI/CD features.
GCP consolidation. Dataform lives inside the BigQuery console. For teams already committed to GCP, having transformations managed alongside the warehouse simplifies operations. Native IAM integration, Dataplex metadata, and Cloud Composer orchestration create a cohesive stack.
The 2025 landscape shift. The dbt-Fivetran merger announced in October 2025 signals industry consolidation. Some teams worry about pricing changes or strategic pivots. Meanwhile, Google continues steady investment in Dataform: not aggressive feature development, but reliable maintenance and compliance certifications (SOC 1/2/3, HIPAA, ISO 27001).
None of these factors alone justify migration. The question is whether your specific situation tips the balance.
Before you start: the migration reality check
Migration timelines vary dramatically based on project complexity:
| Project Profile | Timeline | Effort Distribution |
|---|---|---|
| Small (~20 models, no custom macros) | 1-2 weeks | 80% automated, 20% validation |
| Medium (~50-100 models, some macros) | 2-4 weeks | 60% automated, 40% macro conversion |
| Large (100+ models, heavy macro usage) | 2-3 months | Manual rewrite of programmatic logic |
| Enterprise (packages, ML pipelines) | 3-6 months | Parallel running, stakeholder sign-off |
What you’ll lose
Before committing to migration, understand the ecosystem gaps:
- Package ecosystem: dbt has 200+ packages on hub.getdbt.com. Dataform has no centralized package hub.
- Testing depth: dbt_expectations provides 50+ tests. Dataform’s built-in assertions cover uniqueness, nulls, and row conditions. That’s it.
- CI/CD maturity: dbt Cloud offers Slim CI with a few clicks. Dataform requires manual setup with Cloud Build or GitHub Actions.
- Editor tooling: The dbt Power User extension has 1M+ installs. No comparable Dataform extension exists for Cursor or VS Code.
- Job market value: dbt skills appear in most analytics engineer job postings. Dataform expertise remains niche.
If these losses matter for your team, migration may cost more than it saves.
Mapping dbt concepts to Dataform
The core concepts translate, but syntax differs substantially.
Reference syntax
dbt:
SELECT customer_id, customer__name, customer__email, customer__statusFROM {{ ref('base__source__customers') }}WHERE customer__status IN {{ var('active_statuses') }}Dataform:
SELECT customer_id, customer__name, customer__email, customer__statusFROM ${ref("base__source__customers")}WHERE customer__status IN ${dataform.projectConfig.vars.active_statuses}Incremental models
dbt:
{{ config(materialized='incremental', unique_key='order_id') }}
SELECT order_id, customer_id, order__total_usd, order__created_at, order__updated_atFROM {{ ref('base__source__orders') }}{% if is_incremental() %} WHERE order__updated_at > (SELECT MAX(order__updated_at) FROM {{ this }}){% endif %}Dataform:
config { type: "incremental", uniqueKey: ["order_id"]}
SELECT order_id, customer_id, order__total_usd, order__created_at, order__updated_atFROM ${ref("base__source__orders")}${when(incremental(), `WHERE order__updated_at > (SELECT MAX(order__updated_at) FROM ${self()})`)}Schema tests vs assertions
dbt (schema.yml):
models: - name: mrt__marketing__customers columns: - name: customer_id tests: [unique, not_null] - name: customer__email tests: - not_null - uniqueDataform (inline):
config { type: "table", assertions: { uniqueKey: ["customer_id"], nonNull: ["customer_id", "customer__email"] }}Conversion complexity reference
| dbt Feature | Dataform Equivalent | Complexity |
|---|---|---|
{{ ref('model') }} | ${ref('table')} | Low (automated) |
{{ source('schema','table') }} | Declaration files | Medium |
| Jinja macros | JavaScript includes | High (manual) |
| YAML schema tests | Inline assertions | Medium |
is_incremental() | when(incremental(), ...) | Low |
| Seeds (CSV files) | BigQuery tables + declarations | Medium |
| Snapshots (SCD2) | Manual implementation | High |
| dbt packages | No equivalent | Critical gap |
The features that don’t translate
Some dbt capabilities require significant rework or have no Dataform equivalent.
Snapshots
dbt snapshots implement SCD Type 2 automatically. Dataform has no built-in snapshot functionality. You’ll need to implement slowly changing dimensions manually using incremental tables with custom merge logic.
dbt packages
If your project uses dbt_utils, dbt_expectations, or dbt_date, expect manual conversion. Common [[en/dbt-macros-guide|dbt_utils functions]] like surrogate_key or star need JavaScript equivalents or inline SQL. The migration tool handles some dbt_utils functions, but coverage is incomplete.
Microbatch incremental strategy
dbt’s microbatch processing, introduced in 2024, has no Dataform equivalent. If you’re processing data in time-bounded batches for performance or cost control, you’ll need to restructure your approach.
Slim CI
dbt Cloud’s Slim CI builds only modified models plus their dependents, creating automatic PR schemas. Replicating this in Dataform requires calling the Dataform REST API from external CI tools. Possible, but significantly more setup.
Step-by-step migration process
1. Audit your project
Before touching any code, inventory what you have:
# Count models by typefind models -name "*.sql" | wc -l
# List macro filesfind macros -name "*.sql"
# Check packages.yml for dependenciescat packages.ymlDocument:
- Total model count
- Number of custom macros
- External packages used
- Incremental model strategies
- Snapshot tables
- CI/CD complexity
2. Run the automated migration tool
The ra_dbt_to_dataform tool handles basic conversion:
# Clone the migration toolgit clone https://github.com/rittmananalytics/ra_dbt_to_dataform.gitcd ra_dbt_to_dataform
# Install dependenciespip install -r requirements.txt
# Run migration (uses GPT-4 for complex macro conversion)python migrate.py --dbt-project /path/to/dbt --output /path/to/dataformThe tool converts:
- Model references (
ref()syntax) - Source declarations
- Common
dbt_utilsfunctions - Basic incremental logic
It won’t handle:
- Seeds
- Snapshots
- Complex custom macros
- Semantic layer definitions
3. Convert macros to JavaScript includes
Dataform uses JavaScript files for reusable logic. Create an includes/ directory for shared functions.
dbt macro (generate_surrogate_key.sql):
{% macro generate_surrogate_key(field_list) %} TO_HEX(MD5(CONCAT({% for field in field_list %}COALESCE(CAST({{ field }} AS STRING), ''){% if not loop.last %}, '|', {% endif %}{% endfor %}))){% endmacro %}Dataform JavaScript (includes/utils.js):
function generateSurrogateKey(fields) { const fieldExpressions = fields .map(f => `COALESCE(CAST(${f} AS STRING), '')`) .join(", '|', "); return `TO_HEX(MD5(CONCAT(${fieldExpressions})))`;}
module.exports = { generateSurrogateKey };Usage in SQLX:
config { type: "table" }
js { const { generateSurrogateKey } = require("includes/utils");}
SELECT ${generateSurrogateKey(["customer_id", "order__created_at"])} AS surrogate_key, customer_id, order__created_at, order__total_usdFROM ${ref("base__source__orders")}4. Recreate tests as assertions
Dataform’s built-in assertions handle basic cases:
config { type: "table", assertions: { uniqueKey: ["order_id"], nonNull: ["order_id", "customer_id", "order__created_at"], rowConditions: [ "order__total_usd >= 0", "order__created_at <= CURRENT_DATE()" ] }}For complex validations, create separate assertion files:
-- definitions/assertions/assert_valid_customer_emails.sqlxconfig { type: "assertion" }
SELECT customer_id, customer__emailFROM ${ref("mrt__marketing__customers")}WHERE customer__email NOT LIKE '%@%.%' OR customer__email IS NULLTests from dbt_expectations like distribution checks, regex patterns, or cross-table comparisons require custom assertion files.
5. Set up orchestration
Dataform workflows don’t trigger from git events natively. Choose an orchestration approach:
Cloud Composer (managed Airflow):
from airflow.providers.google.cloud.operators.dataform import ( DataformCreateCompilationResultOperator, DataformCreateWorkflowInvocationOperator,)
compile_task = DataformCreateCompilationResultOperator( task_id="compile", project_id="my-project", region="us-central1", repository_id="my-repo",)
run_task = DataformCreateWorkflowInvocationOperator( task_id="run", project_id="my-project", region="us-central1", repository_id="my-repo", compilation_result="{{ task_instance.xcom_pull('compile') }}",)Cloud Scheduler + Workflows:
main: steps: - compile: call: http.post args: url: https://dataform.googleapis.com/v1beta1/projects/PROJECT/locations/REGION/repositories/REPO/compilationResults auth: type: OAuth2 - run: call: http.post args: url: https://dataform.googleapis.com/v1beta1/projects/PROJECT/locations/REGION/repositories/REPO/workflowInvocations6. Parallel run and validate
Don’t cut over immediately. Run both pipelines simultaneously:
- Deploy Dataform project to a separate dataset (e.g.,
analytics_dataform) - Schedule both dbt and Dataform to run on the same cadence
- Compare outputs using row counts and checksums
- Validate downstream dashboards against both sources
- Monitor for 2-4 weeks before decommissioning dbt
-- Validation querySELECT 'dbt' AS source, COUNT(*) AS row_count, FARM_FINGERPRINT(TO_JSON_STRING(ARRAY_AGG(t ORDER BY order_id))) AS checksumFROM `analytics.mrt__sales__orders` tUNION ALLSELECT 'dataform' AS source, COUNT(*) AS row_count, FARM_FINGERPRINT(TO_JSON_STRING(ARRAY_AGG(t ORDER BY order_id))) AS checksumFROM `analytics_dataform.mrt__sales__orders` tWhen migration doesn’t make sense
Migration isn’t always the right call. Strongly reconsider if:
You rely heavily on dbt packages. Converting dbt_utils, dbt_expectations, or specialized packages like dbt-ga4 requires substantial effort. If packages provide significant value, that value disappears post-migration.
Multi-warehouse is on the roadmap. Dataform only supports BigQuery. If there’s any chance you’ll add Snowflake, Databricks, or another warehouse in the next 2-3 years, dbt’s adapter ecosystem becomes valuable.
Team career development matters. dbt skills are near-universal in analytics engineer job postings. Dataform expertise is niche. If your team values career portability, dbt experience serves them better.
You use complex incremental strategies. Microbatch processing, sophisticated merge logic, or late-arriving data patterns are easier in dbt. Dataform’s incremental support is basic.
Migration cost exceeds licensing savings. A 2-3 month migration for a senior engineer costs $30,000-$50,000 in salary alone. Add debugging, validation, and downstream fixes. Compare to $12,000/year in licensing. If migration takes longer than 2 years to pay back, reconsider.
Making the decision
Use this framework to evaluate your situation:
Migrate to Dataform when:
- 100% BigQuery commitment with no multi-cloud plans
- Fewer than 50 models with minimal custom macros
- No reliance on dbt packages beyond dbt_utils basics
- Cost pressure is acute (startup, constrained budget)
- Team prefers JavaScript over Jinja
- GCP integration matters more than ecosystem breadth
Stay with dbt when:
- Using dbt Cloud features actively (semantic layer, Mesh, advanced CI)
- Heavy macro usage makes conversion painful
- ML pipelines depend on specific templating behavior
- Team is growing and needs hiring leverage
- Multi-warehouse strategy is possible
- Migration payback exceeds 2 years
The honest question
Ask yourself one question: Is your organization BigQuery-forever?
If yes (genuinely, strategically, for the foreseeable future) Dataform offers a credible path to eliminate licensing costs. The tool is mature, Google’s investment is steady, and BigQuery integration is excellent.
If there’s uncertainty, dbt’s flexibility carries option value. The licensing cost might be worth paying for the ability to change course later.
Neither answer is wrong. The mistake is migrating for cost savings alone without accounting for the ecosystem trade-offs and conversion effort. Run the numbers with realistic timelines, not optimistic ones.