Migrating from Dataform to dbt: A Step-by-Step Guide

You’ve been running Dataform on BigQuery, and it’s worked fine. The SQL compiles, the schedules run, and your stakeholders get their data on time. But something has shifted. Maybe your company acquired a Snowflake shop. Maybe you’re tired of implementing testing logic that dbt packages handle out of the box. Maybe you’ve noticed that every analytics engineer job posting lists dbt as a requirement.

Whatever the reason, you’re considering a migration. This guide walks through the process from evaluation to validation, covering both the mechanical conversions and the parts that take real effort.

When migration actually makes sense

Not every Dataform project should migrate. The decision comes down to three questions.

Are you going multi-warehouse? Dataform only works with BigQuery. If your organization is adopting Snowflake, Databricks, or Redshift alongside BigQuery, dbt’s adapter architecture becomes essential. This is the clearest migration signal.

Do you need the ecosystem? dbt’s package hub offers 200+ packages covering everything from data quality testing (dbt_expectations, Elementary) to marketing attribution (Velir’s dbt-ga4) to utility functions (dbt_utils). If you’re manually implementing functionality that exists as a dbt package, migration pays for itself in reduced maintenance.

Does career portability matter? dbt proficiency appears in most analytics engineering job postings. Dataform expertise remains valuable but concentrated in GCP-heavy organizations. If your team members care about resume building, dbt experience has broader applicability.

When to stay put

Migration carries real costs. Stay with Dataform if:

Your JavaScript includes are complex. Dataform’s ability to generate models programmatically with standard JavaScript has no direct dbt equivalent. Converting sophisticated .js files that dynamically create dozens of models requires substantial rewriting.
ML pipelines depend on templating behavior. One team’s migration took two months, plus three additional weeks fixing issues where JavaScript templating differed from Jinja in ways that broke model retraining. The fraud that slipped through during that period cost more than years of licensing fees.
Your use case is simple. If you’re running basic transformation models without complex incremental logic, Dataform’s free tier on BigQuery makes economic sense.

The break-even calculation: if migration takes three months of engineering time and you’re comparing against dbt Cloud licensing at $100/user/month, you need a 10-person team running for over two years before licensing savings alone justify the switch. Multi-warehouse needs or ecosystem requirements change this math immediately.

Understanding the conceptual translation

Before touching code, understand how Dataform concepts map to dbt. Some translations are direct; others require rethinking your approach.

Direct equivalents

Dataform	dbt	Notes
`${ref("model")}`	`{{ ref('model') }}`	Same concept, different syntax
`config { type: "table" }`	`{{ config(materialized='table') }}`	Materialization declaration
`config { type: "view" }`	`{{ config(materialized='view') }}`	Default in both tools
`.sqlx` files	`.sql` files	File extension change
`definitions/` folder	`models/` folder	Directory naming

Conceptual shifts

Sources vs. declarations. Dataform uses declaration files to define source tables. dbt uses YAML source definitions that serve the same purpose but add freshness checks and documentation in a single location.

Dataform declaration:

declare({
  database: "my-project",
  schema: "analytics_123456789",
  name: "events_*"
});

dbt source:

sources:
  - name: ga4
    database: my-project
    schema: analytics_123456789
    tables:
      - name: events
        identifier: "events_*"
        freshness:
          warn_after: {count: 24, period: hour}

Jinja vs. JavaScript templating. Dataform lets you write actual JavaScript (loops, conditionals, functions) anywhere in your project. dbt uses Jinja2 templating, which looks similar but behaves differently. This philosophical gap, explored in my JavaScript vs Jinja comparison, affects every aspect of migration.

Dataform conditional:

${when(incremental(), `AND updated_at > (SELECT MAX(updated_at) FROM ${self()})`)}

dbt conditional:

{% if is_incremental() %}
    AND updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

The syntax is different but manageable. The real challenge comes with dynamic model generation, which we’ll cover later.

The gaps you must plan for

Some Dataform features have no direct dbt equivalent:

Seeds. dbt seeds are CSV files that load as tables. Dataform has no equivalent, so you likely have BigQuery tables serving this purpose that you’ll declare as sources.
Snapshots. dbt’s snapshot feature for SCD Type 2 tables requires manual implementation in Dataform. If you’ve built custom SCD logic, you’ll convert it to dbt’s simpler snapshot syntax.
Packages. Dataform has no package ecosystem. Any custom includes you’ve written in JavaScript need conversion to dbt macros or replacement with existing packages.

Setting up your dbt environment for BigQuery

With the conceptual mapping clear, set up your dbt project.

dbt Core vs. dbt Cloud

For BigQuery teams migrating from Dataform, start with dbt Core unless you need Cloud-specific features immediately (see my dbt Core vs Cloud comparison for details). Reasons:

Zero additional cost during migration (you’re already accustomed to free with Dataform)
Local development matches your current workflow
Upgrade to Cloud later if you need the semantic layer, Mesh, or enterprise governance

Install dbt with the BigQuery adapter:

pip install dbt-bigquery

Project initialization

Create a new dbt project:

dbt init my_project

Configure your BigQuery connection in ~/.dbt/profiles.yml:

my_project:
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: my-gcp-project
      dataset: dbt_dev
      threads: 4
      location: US
  target: dev

Directory structure comparison

Your Dataform project structure maps to dbt like this:

# Dataform                    # dbt
definitions/                  models/
  sources/                      base/
  staging/                      intermediate/
  reporting/                    marts/
includes/                     macros/
dataform.json                 dbt_project.yml

The models/ directory replaces definitions/. Organize by layer (base, intermediate, marts) rather than by source system to match dbt conventions. For a deeper look, see my guide to dbt model layers.

Converting your models step by step

Work through conversion in phases, validating each before moving to the next.

Phase 1: Basic table and view conversions

Start with simple models that have no incremental logic or complex templating. These are mechanical conversions.

Dataform:

config {
  type: "table",
  schema: "reporting"
}

SELECT
  customer_id,
  SUM(order_total) AS lifetime_value
FROM ${ref("base_orders")}
GROUP BY 1

dbt:

{{ config(
    materialized='table',
    schema='reporting'
) }}

SELECT
  customer_id,
  SUM(order_total) AS customer__lifetime_value
FROM {{ ref('base__shopify__orders') }}
GROUP BY 1

The conversion is straightforward: change the config block syntax, swap ${ref()} for {{ ref() }}, and save with a .sql extension.

Phase 2: Source declarations

Replace Dataform declarations with dbt source YAML files. Create one source file per source system in your base folder.

version: 2

sources:
  - name: ga4_raw
    database: "{{ var('ga4_project') }}"
    schema: "{{ var('ga4_dataset') }}"
    tables:
      - name: events
        identifier: "events_*"
        description: "Raw GA4 event export"

In your models, replace source references:

-- Dataform: ${ref("analytics_123456789", "events_*")}
-- dbt:
SELECT
  event_date,
  event_timestamp,
  event_name,
  user_pseudo_id
FROM {{ source('ga4_raw', 'events') }}

Define the variables in dbt_project.yml:

vars:
  ga4_project: my-gcp-project
  ga4_dataset: analytics_123456789

Phase 3: Incremental models

Dataform’s incremental syntax translates to dbt’s is_incremental() macro (covered in depth in my dbt incremental models guide):

Dataform:

config {
  type: "incremental",
  uniqueKey: ["event_id"],
  updatePartitionFilter: "event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)"
}

SELECT
  event_id,
  event_date,
  event_name,
  user_pseudo_id
FROM ${ref("base_events")}
${when(incremental(), `WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)`)}

dbt:

{{ config(
    materialized='incremental',
    unique_key='event_id',
    partition_by={
      'field': 'event_date',
      'data_type': 'date'
    },
    incremental_strategy='merge'
) }}

SELECT
  event_id,
  event_date,
  event_name,
  user_pseudo_id
FROM {{ ref('base__ga4__events') }}
{% if is_incremental() %}
WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)
{% endif %}

Key differences:

dbt requires explicit partition_by configuration for BigQuery
The incremental_strategy defaults to merge for BigQuery but making it explicit improves readability
updatePartitionFilter becomes part of your WHERE clause logic

Phase 4: Testing migration

Dataform’s inline assertions become dbt’s YAML-based tests.

Dataform:

config {
  type: "table",
  assertions: {
    uniqueKey: ["customer_id"],
    nonNull: ["customer_id", "email"],
    rowConditions: ['email LIKE "%@%.%"']
  }
}

dbt:

version: 2

models:
  - name: mrt__sales__customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: customer__email
        tests:
          - not_null

For custom row conditions, use the dbt_expectations package:

      - name: customer__email
        tests:
          - dbt_expectations.expect_column_values_to_match_regex:
              regex: "^.+@.+\\..+$"

Install packages by adding to packages.yml:

packages:
  - package: calogica/dbt_expectations
    version: 0.10.4

Then run dbt deps to install.

Handling the hard parts

The previous sections cover mechanical conversions. Now for the parts that require actual thought.

Macro conversion

Dataform’s JavaScript includes become dbt macros. The logic remains similar; the syntax changes significantly.

Dataform include:

function unnest_event_param(param_name, value_type = 'string_value') {
  return `(SELECT value.${value_type} FROM UNNEST(event_params) WHERE key = '${param_name}')`;
}

module.exports = { unnest_event_param };

dbt macro:

-- macros/unnest_event_param.sql
{% macro unnest_event_param(param_name, value_type='string_value') %}
(SELECT value.{{ value_type }} FROM UNNEST(event_params) WHERE key = '{{ param_name }}')
{% endmacro %}

Usage changes from ${helpers.unnest_event_param('page_location')} to {{ unnest_event_param('page_location') }}.

Dynamic model generation

Dataform’s JavaScript excels at dynamic model generation, while dbt has no clean equivalent. If you have code like this:

const countries = ["US", "GB", "FR", "DE"];
countries.forEach(country => {
  publish(`reporting_${country}`)
    .dependencies(["source_table"])
    .query(ctx => `SELECT * FROM ${ctx.ref("source_table")} WHERE country = '${country}'`);
});

dbt has no direct equivalent. Your options:

Write individual models. If you have few variations, just create separate .sql files.
Use dbt_codegen. Generate the YAML and SQL files once, then maintain them manually.
External preprocessing. Use a Python script to generate .sql files before dbt runs.

For most cases, option 1 is the right answer. Resist the urge to over-engineer.

Pre-operations and post-operations

Dataform’s pre_operations and post_operations become dbt hooks:

Dataform:

config {
  type: "table",
  pre_operations: ["DELETE FROM ${self()} WHERE date < DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)"]
}

dbt:

{{ config(
    materialized='table',
    pre_hook="DELETE FROM {{ this }} WHERE date < DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)"
) }}

For complex pre/post operations that span multiple statements, consider custom materializations instead.

Validating the migration

Don’t flip the switch all at once. Run systems in parallel and validate thoroughly.

Parallel execution

Keep Dataform running while building out dbt. Configure dbt to write to a separate dataset:

models:
  my_project:
    +schema: dbt_migration_validation

Run both systems against the same source data.

Comparison queries

For each migrated model, run validation:

-- Row count comparison
SELECT 'dataform' AS source, COUNT(*) AS row_count FROM dataform_dataset.model_name
UNION ALL
SELECT 'dbt' AS source, COUNT(*) AS row_count FROM dbt_migration_validation.model_name;

-- Column-level comparison for key metrics
SELECT
  ABS(d.total_revenue - t.total_revenue) AS revenue_diff,
  ABS(d.order_count - t.order_count) AS order_diff
FROM (SELECT SUM(revenue) AS total_revenue, COUNT(*) AS order_count FROM dataform_dataset.orders) d
CROSS JOIN (SELECT SUM(revenue) AS total_revenue, COUNT(*) AS order_count FROM dbt_migration_validation.orders) t;

ML pipeline regression testing

If machine learning models consume your transformation outputs, test thoroughly. The cautionary tale from the research: one team found that JavaScript and Jinja template behavior differed in ways that broke model retraining. Numerical precision, null handling, and timestamp formatting can all diverge in subtle ways.

Run your ML training pipeline against dbt outputs before declaring victory. Compare model performance metrics, not just row counts.

The realistic timeline

Based on project complexity, here’s what to expect:

Project size	Timeline	Primary effort
Small (~20 models)	1-2 weeks	Mostly automated conversion
Medium (50-100 models)	2-4 weeks	Macro conversion, testing setup
Large (100+ models, complex macros)	2-3 months	JavaScript rewrite, validation
Enterprise with ML dependencies	3-6 months	Parallel running, stakeholder sign-off

What extends timelines:

Complex JavaScript includes requiring manual rewriting
Custom incremental strategies beyond simple merge
ML pipelines requiring regression validation
Stakeholder approval processes
Parallel running requirements for compliance

Available tooling

Two open-source tools can accelerate migration:

ra_dbt_to_dataform - Despite the name suggesting the opposite direction, the same team provides patterns for conversion. Uses GPT-4 for complex macro conversion.

dataform-to-dbt - Node.js tool that handles refs, assertions, and view materializations. Run with npx dataform-to-dbt. Limitations: doesn’t handle JavaScript includes, incremental models, or complex pre-operations.

These tools handle perhaps 60-70% of a typical project. Plan for the remainder to be manual work.

Making the decision

Migration from Dataform to dbt is neither universally right nor universally wrong. It comes down to your specific situation.

Migrate when you’re going multi-warehouse, need the package ecosystem, or want enterprise features like the semantic layer and Mesh. The 2-3 month investment pays dividends in reduced maintenance and expanded capabilities.

Stay put when your Dataform project is stable, you’re BigQuery-only with no plans to change, and you’re not missing functionality. The “if it ain’t broke” principle applies.

Reconsider the timeline when you have complex JavaScript generation, ML pipeline dependencies, or stakeholders requiring extended parallel running. The two-month estimate becomes six months in these scenarios.

The tools transform SQL identically at the warehouse level. What differs is the ecosystem, the commercial model, and the career implications. Make the choice based on your organization’s trajectory, not on which syntax looks cleaner.