Dagster + dbt: turning your models into software-defined assets

Half of all Dagster Cloud users run dbt. That adoption reflects how well the integration fits analytics engineering workflows. The dagster-dbt integration treats each dbt model as a first-class data asset with automatic lineage, freshness tracking, and quality checks. For analytics engineers who already think in ref() dependencies and model layers, this maps directly to how you work.

My Dagster fundamentals guide covers the platform’s core concepts. This article goes deeper on the dbt integration itself: how the mapping works, how to customize it, and what you gain over running dbt on its own.

How dagster-dbt maps models to assets

The dagster-dbt package reads your dbt project’s manifest.json and creates one Dagster asset per model, seed, and snapshot. Dependencies from ref() and source() calls in your SQL become edges in the Dagster asset graph. The compute function runs dbt build under the hood.

In code, the setup is simple:

from dagster_dbt import DbtCliResource, DbtProject, dbt_assets
from pathlib import Path

my_project = DbtProject(project_dir=Path("./transform"))
my_project.prepare_if_dev()

@dbt_assets(manifest=my_project.manifest_path)
def my_dbt_assets(context, dbt: DbtCliResource):
    yield from dbt.cli(["build"], context=context).stream()

Three things happen here. DbtProject points Dagster at your dbt project directory. prepare_if_dev() runs dbt parse to generate manifest.json during local development (in production, you build the manifest at deploy time with dagster-dbt project prepare-and-package). The @dbt_assets decorator reads that manifest and creates one asset per node.

When you run dagster dev and open localhost:3000, every dbt model appears in the asset catalog with its upstream and downstream dependencies visualized. If mrt__marketing__campaign_performance depends on int__google_ads__clicks and int__meta_ads__impressions, the graph shows those relationships automatically, pulled straight from your SQL ref() calls.

Customizing the mapping with DagsterDbtTranslator

The default mapping works for most projects, but the DagsterDbtTranslator lets you control how dbt nodes become Dagster assets. You subclass it and override specific methods.

Custom asset keys

By default, Dagster uses the dbt model name as the asset key. If your project follows a layered naming convention with prefixes like base__, int__, and mrt__, the defaults work well. But if you need to map models into specific Dagster groups or adjust key prefixes:

from dagster_dbt import DagsterDbtTranslator, DbtCliResource, dbt_assets
from dagster import AssetKey

class CustomTranslator(DagsterDbtTranslator):
    def get_asset_key(self, dbt_resource_props):
        # Group assets by dbt model path: marts/marketing/model → ["marketing", "model"]
        node_path = dbt_resource_props["path"]
        components = Path(node_path).stem
        return AssetKey(components)

    def get_group_name(self, dbt_resource_props):
        # Use the dbt folder as the Dagster group
        return Path(dbt_resource_props["path"]).parts[0]

@dbt_assets(
    manifest=my_project.manifest_path,
    dagster_dbt_translator=CustomTranslator(),
)
def my_dbt_assets(context, dbt: DbtCliResource):
    yield from dbt.cli(["build"], context=context).stream()

Tags, owners, and metadata from dbt meta

Dagster reads dbt’s meta configuration. Properties you set in your schema.yml carry through to the Dagster UI:

models:
  - name: mrt__finance__monthly_revenue
    meta:
      dagster:
        owners: ["team:finance", "adrienne@example.com"]
        group: finance
    tags:
      - daily
      - critical
    columns:
      - name: revenue__total_usd
        description: "Total revenue in USD for the month"

Tags from dbt map to Dagster tags. Owners specified in meta.dagster.owners appear in the asset catalog and can be used to filter the lineage graph by team.

Filtering which models become assets

Not every dbt node needs to be a Dagster asset. Ephemeral models don’t produce tables, so they have no materialization to track. You can exclude them by overriding get_asset_key to return None, or use dbt selection syntax in the @dbt_assets decorator’s select parameter:

@dbt_assets(
    manifest=my_project.manifest_path,
    select="tag:dagster",  # Only models tagged with 'dagster' in dbt
)
def my_dbt_assets(context, dbt: DbtCliResource):
    yield from dbt.cli(["build"], context=context).stream()

This gives you fine-grained control. Tag critical models in dbt, and only those become tracked assets in Dagster.

dbt tests as asset checks

Since Dagster 1.7, dbt tests are automatically pulled in as asset checks. Every not_null, unique, accepted_values, and custom data test appears in the Dagster UI as a quality check attached to the relevant asset. No extra configuration needed.

What does this look like in practice? When you materialize a dbt model, its tests run as part of the build. In the Dagster UI, each asset shows a health badge. Green means materialized and all checks passed. Red means a check failed, and you can click through to see exactly which test failed and why.

Schema tests and data tests behave slightly differently. Schema tests (like not_null on a column) attach to the specific model they test. Generic data tests that reference multiple models attach to the primary model.

You can configure check severity through dbt’s existing severity config. A test with severity: warn won’t block downstream materializations, while severity: error will. This maps to Dagster’s blocking vs non-blocking check distinction, so your existing dbt test configuration carries over without changes.

For teams already using dbt testing strategies, this means one UI for both execution and quality. No more checking dbt Cloud for test results and a separate tool for pipeline health.

Freshness policies and scheduling

Dagster tracks asset freshness, not just execution timestamps. You can define freshness expectations in your dbt project’s meta section:

models:
  - name: mrt__marketing__daily_spend
    meta:
      dagster:
        freshness_policy:
          maximum_lag_minutes: 360  # Must be less than 6 hours old

This tells Dagster that mrt__marketing__daily_spend should never be more than 6 hours stale. If it is, the UI shows a freshness alert, and you can configure notifications through Dagster+ alerts.

For schedule-based orchestration, build_schedule_from_dbt_selection creates a Dagster schedule directly from dbt selection syntax:

from dagster_dbt import build_schedule_from_dbt_selection

daily_dbt_schedule = build_schedule_from_dbt_selection(
    [my_dbt_assets],
    job_name="daily_dbt_job",
    cron_schedule="0 6 * * *",
    dbt_select="tag:daily",
)

This runs all dbt models tagged daily at 6 AM. For more flexibility, automation conditions and sensors let you move beyond fixed schedules entirely.

A sensor can watch for an upstream event (a Fivetran sync completing, a file landing in GCS, a dlt pipeline finishing) and trigger dbt only when fresh data is available. No more running dbt at 6 AM and hoping the upstream load finished in time.

from dagster import sensor, RunRequest

@sensor(job=my_dbt_job)
def fivetran_complete_sensor(context):
    # Check if Fivetran sync completed since last check
    if fivetran_sync_completed():
        yield RunRequest(run_key=f"fivetran-{context.cursor}")

Project setup: two paths

The Components approach (recommended for new projects)

Since the 1.12 cycle, Dagster recommends the DbtProjectComponent for new dbt integrations. The dg CLI scaffolds everything:

dg scaffold defs dagster_dbt.DbtProjectComponent transform \
  --project-path ./transform

This creates a defs.yaml that handles manifest compilation and caching automatically. The component generates all assets from your dbt project with minimal Python.

The traditional approach

For existing projects or teams that want more control, the traditional scaffold generates Python files directly:

dagster-dbt project scaffold \
  --project-name my_project \
  --dbt-project-dir ./transform

This produces a Python package with the @dbt_assets decorator, a DbtProject definition, and resource configuration. You edit the Python files directly to customize behavior.

Manifest handling

The manifest is the bridge between dbt and Dagster. During local development, prepare_if_dev() or the Components framework handles manifest generation transparently. For production deploys, you build the manifest as part of your CI/CD pipeline. A typical approach:

# In CI/CD pipeline
dbt deps --project-dir ./transform
dagster-dbt project prepare-and-package \
  --file ./my_project/project.py

This compiles the manifest once at deploy time, so Dagster doesn’t need to run dbt parse in production.

Selective materializations and state-based runs

You don’t have to materialize every dbt model on every run. In the Dagster UI, you can select specific assets and materialize just those, plus their downstream dependencies. This is useful for ad-hoc fixes or targeted backfills.

For CI/CD workflows, Dagster supports state-based selection. Raul Salamanca documented on Medium how he replicated dbt Cloud’s state:modified behavior in Dagster for a BigQuery environment, running only changed models and their downstreams. This keeps CI builds fast even in large projects.

For time-based partitioning, Dagster’s partition system works with dbt incremental models. You can define daily partitions and materialize specific date ranges, which maps to passing --vars to dbt with the partition date.

Branch deployments for dbt CI/CD

Dagster+‘s branch deployments are particularly useful for dbt teams. When you open a PR, Dagster automatically creates an ephemeral preview environment with your modified dbt models and their lineage. You can see exactly which assets changed, inspect the new dependency graph, and even run test materializations before merging.

This is conceptually similar to dbt Cloud’s CI jobs that run modified models on PR, but with broader scope. Branch deployments cover the entire asset graph, not just dbt models. If your PR changes a Python asset upstream of a dbt model, you see the full impact.

For teams practicing thorough dbt testing, branch deployments add a visual review layer. Reviewers can inspect the asset graph diff alongside the code diff.

The full-stack picture

dbt transformation is rarely the whole pipeline. Data arrives from sources, gets transformed, and then feeds downstream systems. Dagster’s value shows when all of these participate in one asset graph.

A production pipeline on BigQuery typically looks like:

Ingestion: A Fivetran sync or dlt pipeline loads raw data
dbt transformation: Models run only when upstream data is fresh
Python assets: ML feature engineering, custom aggregations, API-driven enrichment
Serving: Sensors trigger Looker dashboard refreshes or reverse ETL exports

The cross-system dependency management is the part that no cron job or Cloud Run scheduler can replicate. When your dbt transformation should only run after a Fivetran sync completes, and your BI refresh should only trigger after specific mart models are fresh, you need an orchestrator that understands the data dependencies, not just the execution order.

When to choose this over dbt Cloud

dbt Cloud has a built-in scheduler that handles cron jobs, source freshness checks, and CI builds on PRs. For teams doing only SQL transformations with no cross-system dependencies, it works fine.

But dbt Cloud can’t orchestrate non-dbt tasks (ingestion, Python processing, API calls), manage cross-system dependencies (trigger dbt after a sync), run partitioned executions by date or region, or provide unified lineage across your full data stack.

The cost math often favors Dagster. A dbt project with 100 models running daily uses roughly 3,000 credits per month, fitting comfortably in Dagster+ Solo at $10/month. dbt Cloud Starter costs $100/user/month. For a 3-person team, that’s $300/month on dbt Cloud versus $100/month on Dagster+ Starter, and with Dagster you get orchestration across your entire pipeline, not just dbt.

The October 2025 Fivetran-dbt Labs merger adds a strategic dimension. As dbt Cloud becomes part of Fivetran’s integrated platform, teams that want vendor independence in their orchestration layer have additional reason to keep orchestration separate. Dagster or Airflow as the orchestration layer gives you the freedom to swap ingestion, transformation, and serving tools independently.

If your team writes only SQL, never needs cross-system coordination, and doesn’t care about asset-level lineage beyond what dbt provides, dbt Cloud is sufficient. The moment you need Python processing, event-driven triggers, or a unified view from source to dashboard, the dagster-dbt integration handles it well. Expect a few weeks of ramp-up on the Python and Dagster concepts, but Dagster University and the growing community make it manageable.