Dagster GCP Deployment

Dagster+ offers two deployment modes on GCP. The choice depends on whether you want Dagster to manage compute entirely, or whether you need execution to stay within your own infrastructure.

Serverless Mode

Dagster hosts everything. Your code runs on Dagster’s infrastructure. You push code, Dagster handles deployment, scaling, and execution.

Best for: Workloads that orchestrate external services rather than running heavy compute. If your Dagster pipeline primarily tells BigQuery to run queries (via dbt), triggers Fivetran syncs, and writes metadata, the actual compute is minimal — BigQuery and Fivetran do the heavy lifting. Dagster Serverless handles the orchestration layer cheaply.

Limitations:

Limited to 4 CPUs per node. For most dbt + BigQuery workflows this is fine, since the compute-intensive work happens in BigQuery, not in Dagster.
Code and data transit through Dagster’s infrastructure. If your security requirements mandate that all execution stays within your VPC, Serverless doesn’t fit.
Serverless compute costs ($0.005/minute) add to the credit-based pricing.

Setup: Minimal. Connect your Git repository, configure resources pointing at your GCP project, and Dagster handles the rest. No GKE cluster, no Helm chart, no infrastructure to manage.

For analytics engineering teams on dbt + BigQuery where the pipeline is primarily orchestration (scheduling dbt builds, coordinating Fivetran syncs, triggering downstream refreshes), Serverless is the simpler path to production deployment.

Hybrid Mode

Dagster hosts the control plane. Execution runs in your infrastructure. The Dagster+ control plane manages the web UI, scheduling, sensor evaluation, and run coordination. The actual computation — running dbt, executing Python assets, calling external APIs — happens on a Kubernetes agent in your GCP project.

Best for: Teams with security requirements (data stays in your VPC), heavy compute needs (more than 4 CPUs), or existing GKE infrastructure they want to reuse.

GKE Deployment

The standard Hybrid deployment runs a Dagster agent on Google Kubernetes Engine using Dagster’s official Helm chart:

helm repo add dagster https://dagster-io.github.io/helm
helm install dagster dagster/dagster \
  --set dagsterCloud.deployment=prod \
  --set dagsterCloud.agentToken=$DAGSTER_AGENT_TOKEN

The Helm chart deploys:

Agent pod that polls Dagster+ for runs to execute
Worker pods that spin up for each run, execute your code, and shut down
Configuration for resource limits, service accounts, and secrets

Authentication: Workload Identity

The recommended authentication pattern for GCP uses Workload Identity, which maps a Kubernetes service account to a GCP service account. Your Dagster agent pods authenticate to BigQuery, GCS, and other GCP services without service account keys.

# Helm values for Workload Identity
serviceAccount:
  create: true
  annotations:
    iam.gke.io/gcp-service-account: dagster-agent@my-project.iam.gserviceaccount.com

On the GCP side, bind the GCP service account to the Kubernetes service account:

gcloud iam service-accounts add-iam-policy-binding \
  dagster-agent@my-project.iam.gserviceaccount.com \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:my-project.svc.id.goog[dagster/dagster-agent]"

This follows the same ADC resolution pattern used in Cloud Run Jobs. The service account needs:

roles/bigquery.dataEditor and roles/bigquery.jobUser for dbt execution
roles/storage.objectViewer for reading from GCS (if your pipeline uses GCS staging)
Any additional roles required by your Python assets

Storage: Cloud SQL + GCS

Dagster needs persistent storage for run history, event logs, and asset metadata. In Hybrid mode, you provide this:

Cloud SQL PostgreSQL for the run storage and event log storage. A small Cloud SQL instance (db-f1-micro or db-g1-small) handles most workloads at $10-30/month.
GCS bucket for I/O manager persistence — when assets pass data between steps, the serialized data lives in GCS.

# Helm values for Cloud SQL
postgresql:
  enabled: false  # Don't deploy PostgreSQL in the cluster
dagsterDaemon:
  env:
    DAGSTER_PG_HOST: /cloudsql/my-project:us-central1:dagster-db
    DAGSTER_PG_DB: dagster
    DAGSTER_PG_USER: dagster
    DAGSTER_PG_PASSWORD:
      secretKeyRef:
        name: dagster-db-credentials
        key: password

For Cloud SQL connectivity, use the Cloud SQL Auth Proxy as a sidecar container in the Dagster agent pod. This handles encrypted connections without exposing the database to the public internet.

Cloud Run Option (Community)

A community-maintained dagster-contrib-gcp package supports executing Dagster runs as Cloud Run jobs instead of GKE pods. This appeals to teams that prefer serverless compute and want to avoid managing a Kubernetes cluster.

The trade-offs versus GKE:

Simpler infrastructure. No GKE cluster to manage.
Cold start latency. Cloud Run jobs take seconds to spin up, which adds to execution time.
Less control. Kubernetes offers fine-grained resource configuration, scheduling, and pod affinity that Cloud Run doesn’t support.
Community-maintained. Not an official Dagster integration, so support and maintenance depend on community contributors.

For small teams running dbt + BigQuery workflows where the orchestration layer is lightweight, Cloud Run execution is a reasonable choice. For teams with heavy compute requirements or complex infrastructure needs, GKE is more robust.

Choosing a Mode

Factor	Serverless	Hybrid (GKE)	Hybrid (Cloud Run)
Setup complexity	Minimal	High	Medium
Infrastructure management	None	GKE + Cloud SQL	Cloud SQL
Data residency	Dagster’s infra	Your GCP project	Your GCP project
Max compute per node	4 CPUs	Configurable	Cloud Run limits
Monthly infra cost	Included in pricing	$50-200+ (GKE + Cloud SQL)	$10-50 (Cloud SQL)
Best for	Orchestration-light workloads	Enterprise, security-sensitive	Small teams, simple pipelines

For GCP-native analytics engineering teams, the decision comes down to security requirements and infrastructure. If data can transit through Dagster’s infrastructure and compute needs are modest, Serverless is the simpler option. If data must stay in a VPC or GKE is already in use for other workloads, Hybrid on GKE is a natural fit.

The pricing note covers the cost implications of each mode, and the GCP orchestration framework positions Dagster relative to GCP-native alternatives.