Dagster fundamentals for analytics engineers

Most orchestrators were built for data engineers managing infrastructure. They think in tasks, dependencies, and retries. Analytics engineers think in tables, models, and freshness. That gap is important, because picking an orchestrator that doesn’t match how your team works creates friction at every level: writing pipelines, debugging failures, onboarding new people.

Dagster closes that gap. Its core abstraction is the asset: a persistent data object like a BigQuery table or a dbt model. If you already work with dbt, Dagster’s mental model will feel familiar. Every ref() in your SQL becomes a dependency edge. Every model becomes a tracked asset with lineage, freshness history, and health status.

This guide covers what you need to know to evaluate and start using Dagster as an analytics engineer on dbt + BigQuery + GCP.

The asset-centric mental model

Traditional orchestrators like Airflow define what tasks to run and in what order. The orchestrator doesn’t know what data those tasks produce. A successful task run means the code executed without errors, not that the data is correct or fresh.

Dagster flips this. You define what data should exist and how to produce it. Each asset is a persistent data object (a BigQuery table, a GCS file, a dbt model) with a Python function that creates or updates it. The orchestrator tracks when each asset was last materialized, whether it’s fresh, and whether its upstream dependencies have changed.

This distinction plays out in real debugging scenarios. In a task-based system, “the pipeline succeeded at 8am” tells you tasks ran. In Dagster, you can ask: “Is the mrt__finance__revenue table current? When was it last materialized? Did all upstream assets complete first?” The system knows the answer because it tracks data state, not just execution state.

For analytics engineers who already think in dbt models and ref() dependencies, this is a natural fit. You’re already defining assets (each model produces a table). Dagster simply gives the orchestrator awareness of what you’re building, not just the commands you’re running.

Core concepts you need

Dagster has several abstractions, but analytics engineers can work effectively with just five.

Software-Defined Assets

The central building block. An asset is a Python function decorated with @dg.asset that produces a persistent data object. Each asset has a unique key, upstream dependencies (inferred from function arguments), and metadata like owners, tags, and freshness policies.

import dagster as dg
@dg.asset(
group_name="finance",
owners=["team:analytics"],
)
def mrt__finance__daily_revenue(
context: dg.AssetExecutionContext,
base__stripe__payments: pd.DataFrame,
) -> pd.DataFrame:
"""Daily revenue aggregated from Stripe payments."""
return base__stripe__payments.groupby("payment_date").agg(
total_revenue=("amount_usd", "sum")
)

The function argument base__stripe__payments tells Dagster this asset depends on another asset with that key. Dagster builds the dependency graph automatically.

Resources

External connections configured centrally and injected into assets. A BigQueryResource, a DbtCliResource, or a GCS client are all resources. You define them once and swap them between environments (dev uses a test dataset, prod uses the real one).

defs = dg.Definitions(
assets=[mrt__finance__daily_revenue],
resources={
"bigquery": BigQueryResource(project="my-gcp-project"),
"dbt": DbtCliResource(project_dir="./transform"),
},
)

Schedules and sensors

Schedules trigger materializations on cron expressions, much like a dbt Cloud job schedule. Sensors trigger materializations in response to events: a new file landing in GCS, a Fivetran sync completing, or an upstream asset finishing its materialization.

Sensors are where Dagster pulls ahead of cron-based approaches. Instead of running dbt on a fixed schedule and hoping upstream data has arrived, a sensor can watch for the actual data landing event and trigger the transformation only when there’s something new to process.

The Definitions object

The top-level registry that tells Dagster what exists: all your assets, resources, schedules, and sensors. Think of it as the dbt_project.yml equivalent. One Definitions object per code location, and Dagster reads it to build the asset graph.

Components

The newest major abstraction, reaching GA in the Dagster 1.12 cycle (2025). Components are YAML-configured or lightweight Python objects that generate assets, checks, and schedules. The DbtProjectComponent is the flagship example: point it at a dbt project, and it generates all assets from your manifest automatically.

Components lower the barrier to entry for SQL-first practitioners who’d rather write YAML than Python decorators.

The dbt integration

The dagster-dbt package treats dbt models as first-class assets, going further than any other orchestrator’s dbt integration. 50% of Dagster Cloud users run dbt, the highest adoption rate in the space.

How it works

Each dbt model, seed, and snapshot becomes an individual Dagster asset. Dependencies are derived from ref() and source() calls in your SQL. The compute function runs dbt build under the hood. In code:

from dagster_dbt import DbtCliResource, DbtProject, dbt_assets
from pathlib import Path
my_project = DbtProject(project_dir=Path("./transform"))
my_project.prepare_if_dev()
@dbt_assets(manifest=my_project.manifest_path)
def my_dbt_assets(context, dbt: DbtCliResource):
yield from dbt.cli(["build"], context=context).stream()

prepare_if_dev() generates the manifest.json during local development by running dbt parse. For production, dagster-dbt project prepare-and-package builds the manifest at deploy time.

dbt tests become asset checks

Since Dagster 1.7, dbt tests are automatically pulled in as Dagster asset checks by default. Every schema test and data test appears in the Dagster UI as a quality check attached to the relevant asset. dbt resource tags map to Dagster asset tags, and dbt group owners map to Dagster owners. No extra configuration needed.

This gives you a single UI for both execution and quality. Instead of checking dbt Cloud for test results and a separate monitoring tool for pipeline health, everything lives in one place.

The Components approach

For new projects in 2025+, the recommended path uses the dg CLI to scaffold a DbtProjectComponent:

Terminal window
dg scaffold defs dagster_dbt.DbtProjectComponent transform \
--project-path ./transform

This creates a defs.yaml that handles manifest compilation, caching, and asset generation automatically. Less Python boilerplate and a more declarative configuration.

Beyond dbt: the full-stack pipeline

The real value of Dagster over simpler orchestration approaches shows when your pipeline extends beyond transformation.

A typical Dagster + dbt + BigQuery pipeline:

  1. Ingestion: A sensor detects that a Fivetran or dlt sync has completed
  2. Transformation: dbt runs only when upstream data is actually fresh
  3. Python processing: ML features, custom aggregations, or API calls that SQL can’t handle
  4. Downstream triggers: Sensors kick off BI dashboard refreshes or reverse ETL exports

All of these participate in the same dependency graph. A single asset lineage view shows the full path from raw source to final dashboard.

If your team only runs dbt build on a schedule with no upstream or downstream dependencies, a Cloud Run job on a cron trigger is simpler and cheaper. Dagster earns its place when multiple systems need to coordinate.

The Dagster UI

The web UI (launched via dagster dev on localhost:3000) is where Dagster stands out most clearly for analytics engineers.

Asset Catalog: A searchable, filterable list of every asset in your project. Filter by group, code location, tags, or owners. Each asset shows its materialization history, metadata, and health status.

Global Asset Lineage: The full dependency graph across all assets, not just dbt models. You can overlay facets (owners, health status, automation conditions) to answer questions like “which assets owned by the finance team are currently stale?”

Run Details: Gantt charts showing execution timing, structured event logs, and compute logs. Failed runs show exactly which asset failed and why, with one-click re-execution of just the failed assets and their downstreams.

Health Indicators: Color-coded badges (materialized, stale, failed, fresh) on every asset. At a glance, you can tell whether your data is current without reading logs.

Dagster+ Pro adds BigQuery cost tracking per asset (answering “which models cost the most to run?”), column-level lineage, and a catalog mode designed for less-technical stakeholders who need visibility without the full engineering interface.

Pricing and GCP deployment

The credit model

Dagster+ pricing runs on credits, where 1 credit equals 1 asset materialization or 1 op execution:

PlanPriceCredits/monthUsers
Solo$10/mo7,5001
Starter$100/mo30,000Up to 3
ProContact SalesCustomCustom

Overage beyond plan credits costs $0.03 per credit. Serverless compute adds $0.005 per compute minute on Solo/Starter.

For context, a dbt project with 100 models running daily uses ~3,000 credits per month (100 models x 30 days). The Solo tier at $10/month handles that with room to spare. Compare this to dbt Cloud Starter at $100/user/month or Cloud Composer 3 starting around $377/month even when idle.

The free open-source version (Apache 2.0) works for self-hosting but lacks RBAC, branch deployments, alerts to external services, and catalog search.

Deploying on GCP

Two deployment modes:

Serverless: Fully managed by Dagster. Best for workloads that orchestrate external services (dbt, Fivetran, BigQuery) rather than running heavy compute. Limited to 4 CPUs per node.

Hybrid: Dagster hosts the control plane. Execution runs in your infrastructure via a Kubernetes agent on GKE, using Dagster’s Helm chart. Authentication works through Workload Identity. Run and event storage goes to Cloud SQL PostgreSQL, with GCS for I/O manager persistence.

For GCP-native teams, the Hybrid model on GKE fits cleanly into an existing GCP data platform architecture. A community dagster-contrib-gcp package also supports executing runs as Cloud Run jobs for teams that prefer serverless compute.

The learning curve

G2 reviewers consistently flag a steep learning curve, and that reputation is earned. For analytics engineers coming from SQL-only dbt workflows, the friction shows up in a few specific places:

Python proficiency required. You need comfort with decorators, type hints, and Python project structure. If your team writes only SQL and Jinja, there’s a ramp-up period.

Conceptual overhead. The shift from “I write SQL models” to “I define software-defined assets with resources, I/O managers, and configs” takes time to internalize. Assets, resources, definitions, ops, jobs: the vocabulary is large.

Manifest management. Understanding how manifest.json is generated, cached, and used across environments trips up newcomers. The Components approach reduces this friction, but it’s still a thing.

Pricing surprises. Each asset materialization AND each op execution counts as a credit. High-frequency materializations or large dbt projects with many models can consume credits faster than expected. Monitor your usage early.

The best onboarding path: Dagster University offers free courses covering Dagster Essentials, Dagster & dbt, Testing, and Data Ingestion. The Erewhon case study is encouraging: a one-person data team with a non-technical background built an entire data platform using Dagster University, YouTube, and ChatGPT. The learning curve is steep but not insurmountable.

Is it right for your team?

Dagster fits best when your team is dbt-centric, your pipeline extends beyond transformation (ingestion, Python processing, BI refresh), and you need asset-level lineage across the full stack. Teams of 2-15 analytics engineers get the most value.

Consider alternatives when:

  • You only run dbt on a schedule with no upstream or downstream dependencies. A cron trigger on Cloud Run costs $0-3/month and requires no new tooling.
  • You need the broadest integration ecosystem. Airflow’s 90+ provider packages cover more services than any alternative.
  • Speed of setup matters most. Prefect lets you orchestrate Python functions with less framework overhead, though its dbt integration is more operational than semantic.
  • Your organization already runs Airflow at scale. Airflow 3.0 added its own @asset decorator, and the astronomer-cosmos package provides solid dbt integration. Migration may not be justified.

The right moment to adopt Dagster is when you have 5+ interconnected data sources, cross-team collaboration needs, SLA commitments on data freshness, or event-driven scheduling requirements. Before that threshold, simpler approaches work fine and let you invest your complexity budget elsewhere. For a broader view of how Dagster compares in the current market, see my 2026 orchestration landscape overview.