Migrating from dbt to Dataform: A Practical Decision Framework

The dbt Cloud licensing bill arrives and someone on your team asks the question: “Couldn’t we just use Dataform? It’s free.”

At $100/user/month, a 10-person analytics team pays $12,000 annually for dbt Cloud. Dataform offers comparable transformation capabilities at zero licensing cost for BigQuery users. The math looks compelling. But migration decisions deserve more scrutiny than a simple cost comparison.

This guide covers the migration process for both simple and complex dbt projects, maps concepts between tools, and lays out when migration makes sense (and when it doesn’t).

Why teams consider migrating from dbt to Dataform

Three factors typically drive migration conversations:

Licensing costs. dbt Cloud’s $100/user/month adds up. Teams paying for features they don’t fully use (like the semantic layer or advanced CI) feel the pinch most acutely. dbt Core remains free, but you lose compilation speed improvements from dbt Fusion and native CI/CD features.

GCP consolidation. Dataform lives inside the BigQuery console. For teams already committed to GCP, having transformations managed alongside the warehouse simplifies operations. Native IAM integration, Dataplex metadata, and Cloud Composer orchestration create a cohesive stack.

The 2025 landscape shift. The dbt-Fivetran merger announced in October 2025 signals industry consolidation. Some teams worry about pricing changes or strategic pivots. Meanwhile, Google continues steady investment in Dataform: not aggressive feature development, but reliable maintenance and compliance certifications (SOC 1/2/3, HIPAA, ISO 27001).

None of these factors alone justify migration. The question is whether your specific situation tips the balance.

Before you start: the migration reality check

Migration timelines vary dramatically based on project complexity:

Project ProfileTimelineEffort Distribution
Small (~20 models, no custom macros)1-2 weeks80% automated, 20% validation
Medium (~50-100 models, some macros)2-4 weeks60% automated, 40% macro conversion
Large (100+ models, heavy macro usage)2-3 monthsManual rewrite of programmatic logic
Enterprise (packages, ML pipelines)3-6 monthsParallel running, stakeholder sign-off

What you’ll lose

Before committing to migration, understand the ecosystem gaps:

  • Package ecosystem: dbt has 200+ packages on hub.getdbt.com. Dataform has no centralized package hub.
  • Testing depth: dbt_expectations provides 50+ tests. Dataform’s built-in assertions cover uniqueness, nulls, and row conditions. That’s it.
  • CI/CD maturity: dbt Cloud offers Slim CI with a few clicks. Dataform requires manual setup with Cloud Build or GitHub Actions.
  • Editor tooling: The dbt Power User extension has 1M+ installs. No comparable Dataform extension exists for Cursor or VS Code.
  • Job market value: dbt skills appear in most analytics engineer job postings. Dataform expertise remains niche.

If these losses matter for your team, migration may cost more than it saves.

Mapping dbt concepts to Dataform

The core concepts translate, but syntax differs substantially.

Reference syntax

dbt:

SELECT
customer_id,
customer__name,
customer__email,
customer__status
FROM {{ ref('base__source__customers') }}
WHERE customer__status IN {{ var('active_statuses') }}

Dataform:

SELECT
customer_id,
customer__name,
customer__email,
customer__status
FROM ${ref("base__source__customers")}
WHERE customer__status IN ${dataform.projectConfig.vars.active_statuses}

Incremental models

dbt:

{{ config(materialized='incremental', unique_key='order_id') }}
SELECT
order_id,
customer_id,
order__total_usd,
order__created_at,
order__updated_at
FROM {{ ref('base__source__orders') }}
{% if is_incremental() %}
WHERE order__updated_at > (SELECT MAX(order__updated_at) FROM {{ this }})
{% endif %}

Dataform:

config {
type: "incremental",
uniqueKey: ["order_id"]
}
SELECT
order_id,
customer_id,
order__total_usd,
order__created_at,
order__updated_at
FROM ${ref("base__source__orders")}
${when(incremental(), `WHERE order__updated_at > (SELECT MAX(order__updated_at) FROM ${self()})`)}

Schema tests vs assertions

dbt (schema.yml):

models:
- name: mrt__marketing__customers
columns:
- name: customer_id
tests: [unique, not_null]
- name: customer__email
tests:
- not_null
- unique

Dataform (inline):

config {
type: "table",
assertions: {
uniqueKey: ["customer_id"],
nonNull: ["customer_id", "customer__email"]
}
}

Conversion complexity reference

dbt FeatureDataform EquivalentComplexity
{{ ref('model') }}${ref('table')}Low (automated)
{{ source('schema','table') }}Declaration filesMedium
Jinja macrosJavaScript includesHigh (manual)
YAML schema testsInline assertionsMedium
is_incremental()when(incremental(), ...)Low
Seeds (CSV files)BigQuery tables + declarationsMedium
Snapshots (SCD2)Manual implementationHigh
dbt packagesNo equivalentCritical gap

The features that don’t translate

Some dbt capabilities require significant rework or have no Dataform equivalent.

Snapshots

dbt snapshots implement SCD Type 2 automatically. Dataform has no built-in snapshot functionality. You’ll need to implement slowly changing dimensions manually using incremental tables with custom merge logic.

dbt packages

If your project uses dbt_utils, dbt_expectations, or dbt_date, expect manual conversion. Common [[en/dbt-macros-guide|dbt_utils functions]] like surrogate_key or star need JavaScript equivalents or inline SQL. The migration tool handles some dbt_utils functions, but coverage is incomplete.

Microbatch incremental strategy

dbt’s microbatch processing, introduced in 2024, has no Dataform equivalent. If you’re processing data in time-bounded batches for performance or cost control, you’ll need to restructure your approach.

Slim CI

dbt Cloud’s Slim CI builds only modified models plus their dependents, creating automatic PR schemas. Replicating this in Dataform requires calling the Dataform REST API from external CI tools. Possible, but significantly more setup.

Step-by-step migration process

1. Audit your project

Before touching any code, inventory what you have:

Terminal window
# Count models by type
find models -name "*.sql" | wc -l
# List macro files
find macros -name "*.sql"
# Check packages.yml for dependencies
cat packages.yml

Document:

  • Total model count
  • Number of custom macros
  • External packages used
  • Incremental model strategies
  • Snapshot tables
  • CI/CD complexity

2. Run the automated migration tool

The ra_dbt_to_dataform tool handles basic conversion:

Terminal window
# Clone the migration tool
git clone https://github.com/rittmananalytics/ra_dbt_to_dataform.git
cd ra_dbt_to_dataform
# Install dependencies
pip install -r requirements.txt
# Run migration (uses GPT-4 for complex macro conversion)
python migrate.py --dbt-project /path/to/dbt --output /path/to/dataform

The tool converts:

  • Model references (ref() syntax)
  • Source declarations
  • Common dbt_utils functions
  • Basic incremental logic

It won’t handle:

  • Seeds
  • Snapshots
  • Complex custom macros
  • Semantic layer definitions

3. Convert macros to JavaScript includes

Dataform uses JavaScript files for reusable logic. Create an includes/ directory for shared functions.

dbt macro (generate_surrogate_key.sql):

{% macro generate_surrogate_key(field_list) %}
TO_HEX(MD5(CONCAT({% for field in field_list %}COALESCE(CAST({{ field }} AS STRING), ''){% if not loop.last %}, '|', {% endif %}{% endfor %})))
{% endmacro %}

Dataform JavaScript (includes/utils.js):

function generateSurrogateKey(fields) {
const fieldExpressions = fields
.map(f => `COALESCE(CAST(${f} AS STRING), '')`)
.join(", '|', ");
return `TO_HEX(MD5(CONCAT(${fieldExpressions})))`;
}
module.exports = { generateSurrogateKey };

Usage in SQLX:

config { type: "table" }
js {
const { generateSurrogateKey } = require("includes/utils");
}
SELECT
${generateSurrogateKey(["customer_id", "order__created_at"])} AS surrogate_key,
customer_id,
order__created_at,
order__total_usd
FROM ${ref("base__source__orders")}

4. Recreate tests as assertions

Dataform’s built-in assertions handle basic cases:

config {
type: "table",
assertions: {
uniqueKey: ["order_id"],
nonNull: ["order_id", "customer_id", "order__created_at"],
rowConditions: [
"order__total_usd >= 0",
"order__created_at <= CURRENT_DATE()"
]
}
}

For complex validations, create separate assertion files:

-- definitions/assertions/assert_valid_customer_emails.sqlx
config { type: "assertion" }
SELECT
customer_id,
customer__email
FROM ${ref("mrt__marketing__customers")}
WHERE customer__email NOT LIKE '%@%.%'
OR customer__email IS NULL

Tests from dbt_expectations like distribution checks, regex patterns, or cross-table comparisons require custom assertion files.

5. Set up orchestration

Dataform workflows don’t trigger from git events natively. Choose an orchestration approach:

Cloud Composer (managed Airflow):

from airflow.providers.google.cloud.operators.dataform import (
DataformCreateCompilationResultOperator,
DataformCreateWorkflowInvocationOperator,
)
compile_task = DataformCreateCompilationResultOperator(
task_id="compile",
project_id="my-project",
region="us-central1",
repository_id="my-repo",
)
run_task = DataformCreateWorkflowInvocationOperator(
task_id="run",
project_id="my-project",
region="us-central1",
repository_id="my-repo",
compilation_result="{{ task_instance.xcom_pull('compile') }}",
)

Cloud Scheduler + Workflows:

main:
steps:
- compile:
call: http.post
args:
url: https://dataform.googleapis.com/v1beta1/projects/PROJECT/locations/REGION/repositories/REPO/compilationResults
auth:
type: OAuth2
- run:
call: http.post
args:
url: https://dataform.googleapis.com/v1beta1/projects/PROJECT/locations/REGION/repositories/REPO/workflowInvocations

6. Parallel run and validate

Don’t cut over immediately. Run both pipelines simultaneously:

  1. Deploy Dataform project to a separate dataset (e.g., analytics_dataform)
  2. Schedule both dbt and Dataform to run on the same cadence
  3. Compare outputs using row counts and checksums
  4. Validate downstream dashboards against both sources
  5. Monitor for 2-4 weeks before decommissioning dbt
-- Validation query
SELECT
'dbt' AS source,
COUNT(*) AS row_count,
FARM_FINGERPRINT(TO_JSON_STRING(ARRAY_AGG(t ORDER BY order_id))) AS checksum
FROM `analytics.mrt__sales__orders` t
UNION ALL
SELECT
'dataform' AS source,
COUNT(*) AS row_count,
FARM_FINGERPRINT(TO_JSON_STRING(ARRAY_AGG(t ORDER BY order_id))) AS checksum
FROM `analytics_dataform.mrt__sales__orders` t

When migration doesn’t make sense

Migration isn’t always the right call. Strongly reconsider if:

You rely heavily on dbt packages. Converting dbt_utils, dbt_expectations, or specialized packages like dbt-ga4 requires substantial effort. If packages provide significant value, that value disappears post-migration.

Multi-warehouse is on the roadmap. Dataform only supports BigQuery. If there’s any chance you’ll add Snowflake, Databricks, or another warehouse in the next 2-3 years, dbt’s adapter ecosystem becomes valuable.

Team career development matters. dbt skills are near-universal in analytics engineer job postings. Dataform expertise is niche. If your team values career portability, dbt experience serves them better.

You use complex incremental strategies. Microbatch processing, sophisticated merge logic, or late-arriving data patterns are easier in dbt. Dataform’s incremental support is basic.

Migration cost exceeds licensing savings. A 2-3 month migration for a senior engineer costs $30,000-$50,000 in salary alone. Add debugging, validation, and downstream fixes. Compare to $12,000/year in licensing. If migration takes longer than 2 years to pay back, reconsider.

Making the decision

Use this framework to evaluate your situation:

Migrate to Dataform when:

  • 100% BigQuery commitment with no multi-cloud plans
  • Fewer than 50 models with minimal custom macros
  • No reliance on dbt packages beyond dbt_utils basics
  • Cost pressure is acute (startup, constrained budget)
  • Team prefers JavaScript over Jinja
  • GCP integration matters more than ecosystem breadth

Stay with dbt when:

  • Using dbt Cloud features actively (semantic layer, Mesh, advanced CI)
  • Heavy macro usage makes conversion painful
  • ML pipelines depend on specific templating behavior
  • Team is growing and needs hiring leverage
  • Multi-warehouse strategy is possible
  • Migration payback exceeds 2 years

The honest question

Ask yourself one question: Is your organization BigQuery-forever?

If yes (genuinely, strategically, for the foreseeable future) Dataform offers a credible path to eliminate licensing costs. The tool is mature, Google’s investment is steady, and BigQuery integration is excellent.

If there’s uncertainty, dbt’s flexibility carries option value. The licensing cost might be worth paying for the ability to change course later.

Neither answer is wrong. The mistake is migrating for cost savings alone without accounting for the ecosystem trade-offs and conversion effort. Run the numbers with realistic timelines, not optimistic ones.