Dataform Dynamic Model Generation

Dynamic model generation — writing a JavaScript loop once to produce dozens of fully-functional, dependency-tracked models — is a structural difference between Dataform and dbt, distinct from syntax or config block placement differences.

The Pattern

Dataform’s .js files in the definitions/ directory run as JavaScript during compilation. They have access to the publish() function, which creates a model in the DAG. Anything you can express as a loop in JavaScript, you can express as a set of models in Dataform.

The canonical example: country-specific reporting tables.

const countries = ["US", "GB", "FR", "DE", "JP", "AU"];

countries.forEach(country => {
    publish(`reporting_${country}`)
        .dependencies(["base__ga4__events"])
        .query(ctx => `
            SELECT
                event_id,
                event_name,
                event_timestamp,
                user_pseudo_id
            FROM ${ctx.ref("base__ga4__events")}
            WHERE country_code = '${country}'
        `);
});

Six models. One file. Add "CA" to the array and recompile — seven models. Remove one — five models. Dataform’s DAG engine treats each published model as a first-class node with proper dependency tracking, scheduling, and assertion support.

The same pattern works for any systematic variation:

const sources = [
    { name: "stripe", schema: "stripe_prod" },
    { name: "shopify", schema: "shopify_production" },
    { name: "salesforce", schema: "salesforce_data" }
];

sources.forEach(({ name, schema }) => {
    publish(`stg__${name}__orders`)
        .type("view")
        .query(ctx => `
            SELECT *
            FROM ${ctx.ref(schema, "orders")}
            WHERE _fivetran_deleted IS FALSE
        `);
});

Three staging views. Change the sources array, change the models. The project configuration drives the model set rather than the file system.

More Complex Generation

The pattern scales to more complex cases. Incremental models with assertions, generated from a configuration object:

const tenants = [
    { id: "tenant_a", project: "project-a", dataset: "analytics_123" },
    { id: "tenant_b", project: "project-b", dataset: "analytics_456" }
];

tenants.forEach(tenant => {
    publish(`events_${tenant.id}`, {
        type: "incremental",
        uniqueKey: ["event_id"],
        bigquery: {
            partitionBy: "DATE(event_timestamp)",
            clusterBy: ["user_pseudo_id"]
        },
        assertions: {
            uniqueKey: ["event_id"],
            nonNull: ["event_id", "event_timestamp"]
        }
    })
    .query(ctx => `
        SELECT
            event_id,
            event_name,
            event_timestamp,
            user_pseudo_id
        FROM ${ctx.ref(tenant.project, tenant.dataset, "events_*")}
        ${ctx.when(ctx.incremental(), `WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM ${ctx.self()})`)}
    `);
});

Two models, each incremental with partitioning, clustering, and uniqueness assertions — all generated from a configuration array. Add a tenant, get a model.

This pattern is practical for:

Multi-tenant SaaS analytics where each customer has a separate schema
Regional data pipelines where the same transformation runs per geography
Multi-source pipelines where the same staging pattern applies to many source tables
A/B testing frameworks where separate models track control and treatment populations

What dbt Teams Do Instead

dbt has no equivalent mechanism. Jinja runs inside a single .sql file and produces a single model. It cannot create files, cannot add nodes to the DAG programmatically. This is a hard constraint.

The practical workarounds, from most to least recommended:

Write Individual Files

For fewer than 10 variations, just write them. Ten files with nearly identical SQL is verbose, but each model is visible in the DAG, has its own schema.yml entry, and can be selected and run independently. The duplication is real; so is the debuggability.

The break-even point where writing individual files becomes unreasonable varies by team. Some draw the line at 5, some at 20. It depends on how often the set changes and how similar the models are.

dbt_codegen for One-Time Scaffolding

dbt_codegen can generate YAML and model SQL from existing warehouse tables. This handles the initial creation of a large set of similar models, but it is a one-time scaffold, not ongoing generation. If you add a new country, you run codegen again, check the output, and commit the new file. The process is manual and error-prone at scale.

This is the right answer when the model set is stable. It is the wrong answer when the configuration changes frequently.

External Preprocessing

The closest equivalent to Dataform’s approach: a Python or shell script that generates .sql files before dbt run. Your CI pipeline runs the generation step, dbt picks up the generated files, and the DAG includes the generated models.

The disadvantage is that dbt has no awareness of the generation step. If the generator and the models fall out of sync, dbt runs successfully against stale generated files. You have added a build step that lives outside dbt’s graph and requires separate documentation, testing, and maintenance.

Some teams use this pattern successfully. It requires discipline about treating the generator as part of the codebase, not a one-time script.

dbt run-operation with Codegen

For some patterns, dbt run-operation generate_model_yaml scaffolds YAML from existing tables. The scope is limited to what the codegen package supports, and the result still requires manual file management.

Dynamic model generation is a capability advantage for Dataform on specific use cases: multi-tenant platforms, large-scale regional pipelines, projects where the model set is driven by configuration rather than a stable schema.

Most analytics engineering projects do not need it. Standard dimensional modeling has a stable model set that changes infrequently. The decision between Dataform and dbt rarely depends on this feature alone — platform commitment, testing requirements, ecosystem maturity, and team skills are more often the dominant factors.

The JavaScript vs Jinja in Analytics Engineering note covers the migration challenges when projects have relied on dynamic generation and need to move to dbt. Migration cost is high and is one of the primary reasons teams should evaluate this capability before committing to Dataform.