ServicesAboutNotesContact Get in touch →
EN FR
Note

Dagster GCP Deployment

How to deploy Dagster on GCP — Serverless vs Hybrid modes, GKE with Helm, Workload Identity authentication, Cloud SQL for storage, and the community Cloud Run option.

Planted
gcpdata engineeringautomation

Dagster+ offers two deployment modes on GCP. The choice depends on whether you want Dagster to manage compute entirely, or whether you need execution to stay within your own infrastructure.

Serverless Mode

Dagster hosts everything. Your code runs on Dagster’s infrastructure. You push code, Dagster handles deployment, scaling, and execution.

Best for: Workloads that orchestrate external services rather than running heavy compute. If your Dagster pipeline primarily tells BigQuery to run queries (via dbt), triggers Fivetran syncs, and writes metadata, the actual compute is minimal — BigQuery and Fivetran do the heavy lifting. Dagster Serverless handles the orchestration layer cheaply.

Limitations:

  • Limited to 4 CPUs per node. For most dbt + BigQuery workflows this is fine, since the compute-intensive work happens in BigQuery, not in Dagster.
  • Code and data transit through Dagster’s infrastructure. If your security requirements mandate that all execution stays within your VPC, Serverless doesn’t fit.
  • Serverless compute costs ($0.005/minute) add to the credit-based pricing.

Setup: Minimal. Connect your Git repository, configure resources pointing at your GCP project, and Dagster handles the rest. No GKE cluster, no Helm chart, no infrastructure to manage.

For analytics engineering teams on dbt + BigQuery where the pipeline is primarily orchestration (scheduling dbt builds, coordinating Fivetran syncs, triggering downstream refreshes), Serverless is the simpler path to production deployment.

Hybrid Mode

Dagster hosts the control plane. Execution runs in your infrastructure. The Dagster+ control plane manages the web UI, scheduling, sensor evaluation, and run coordination. The actual computation — running dbt, executing Python assets, calling external APIs — happens on a Kubernetes agent in your GCP project.

Best for: Teams with security requirements (data stays in your VPC), heavy compute needs (more than 4 CPUs), or existing GKE infrastructure they want to reuse.

GKE Deployment

The standard Hybrid deployment runs a Dagster agent on Google Kubernetes Engine using Dagster’s official Helm chart:

Terminal window
helm repo add dagster https://dagster-io.github.io/helm
helm install dagster dagster/dagster \
--set dagsterCloud.deployment=prod \
--set dagsterCloud.agentToken=$DAGSTER_AGENT_TOKEN

The Helm chart deploys:

  • Agent pod that polls Dagster+ for runs to execute
  • Worker pods that spin up for each run, execute your code, and shut down
  • Configuration for resource limits, service accounts, and secrets

Authentication: Workload Identity

The recommended authentication pattern for GCP uses Workload Identity, which maps a Kubernetes service account to a GCP service account. Your Dagster agent pods authenticate to BigQuery, GCS, and other GCP services without service account keys.

# Helm values for Workload Identity
serviceAccount:
create: true
annotations:
iam.gke.io/gcp-service-account: dagster-agent@my-project.iam.gserviceaccount.com

On the GCP side, bind the GCP service account to the Kubernetes service account:

Terminal window
gcloud iam service-accounts add-iam-policy-binding \
dagster-agent@my-project.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:my-project.svc.id.goog[dagster/dagster-agent]"

This follows the same ADC resolution pattern used in Cloud Run Jobs. The service account needs:

  • roles/bigquery.dataEditor and roles/bigquery.jobUser for dbt execution
  • roles/storage.objectViewer for reading from GCS (if your pipeline uses GCS staging)
  • Any additional roles required by your Python assets

Storage: Cloud SQL + GCS

Dagster needs persistent storage for run history, event logs, and asset metadata. In Hybrid mode, you provide this:

  • Cloud SQL PostgreSQL for the run storage and event log storage. A small Cloud SQL instance (db-f1-micro or db-g1-small) handles most workloads at $10-30/month.
  • GCS bucket for I/O manager persistence — when assets pass data between steps, the serialized data lives in GCS.
# Helm values for Cloud SQL
postgresql:
enabled: false # Don't deploy PostgreSQL in the cluster
dagsterDaemon:
env:
DAGSTER_PG_HOST: /cloudsql/my-project:us-central1:dagster-db
DAGSTER_PG_DB: dagster
DAGSTER_PG_USER: dagster
DAGSTER_PG_PASSWORD:
secretKeyRef:
name: dagster-db-credentials
key: password

For Cloud SQL connectivity, use the Cloud SQL Auth Proxy as a sidecar container in the Dagster agent pod. This handles encrypted connections without exposing the database to the public internet.

Cloud Run Option (Community)

A community-maintained dagster-contrib-gcp package supports executing Dagster runs as Cloud Run jobs instead of GKE pods. This appeals to teams that prefer serverless compute and want to avoid managing a Kubernetes cluster.

The trade-offs versus GKE:

  • Simpler infrastructure. No GKE cluster to manage.
  • Cold start latency. Cloud Run jobs take seconds to spin up, which adds to execution time.
  • Less control. Kubernetes offers fine-grained resource configuration, scheduling, and pod affinity that Cloud Run doesn’t support.
  • Community-maintained. Not an official Dagster integration, so support and maintenance depend on community contributors.

For small teams running dbt + BigQuery workflows where the orchestration layer is lightweight, Cloud Run execution is a reasonable choice. For teams with heavy compute requirements or complex infrastructure needs, GKE is more robust.

Choosing a Mode

FactorServerlessHybrid (GKE)Hybrid (Cloud Run)
Setup complexityMinimalHighMedium
Infrastructure managementNoneGKE + Cloud SQLCloud SQL
Data residencyDagster’s infraYour GCP projectYour GCP project
Max compute per node4 CPUsConfigurableCloud Run limits
Monthly infra costIncluded in pricing$50-200+ (GKE + Cloud SQL)$10-50 (Cloud SQL)
Best forOrchestration-light workloadsEnterprise, security-sensitiveSmall teams, simple pipelines

For GCP-native analytics engineering teams, the decision comes down to security requirements and infrastructure. If data can transit through Dagster’s infrastructure and compute needs are modest, Serverless is the simpler option. If data must stay in a VPC or GKE is already in use for other workloads, Hybrid on GKE is a natural fit.

The pricing note covers the cost implications of each mode, and the GCP orchestration framework positions Dagster relative to GCP-native alternatives.