Deploying dbt Core on Cloud Run Jobs: Complete Setup Guide

Cloud Functions times out after 9 minutes. Cloud Composer costs $300-400 per month just to exist. For most dbt Core deployments on GCP, neither option makes sense.

Cloud Run Jobs fills this gap. With execution limits up to 168 hours, container flexibility, and pay-per-execution pricing, it handles dbt workloads at a fraction of Composer’s cost. A typical deployment runs under $5 per month.

This guide walks through the complete setup: Dockerfile, authentication with Workload Identity, scheduling, event-driven triggers, and monitoring.

Why Cloud Run Jobs for dbt

The execution limit alone makes Cloud Run Jobs viable where Cloud Functions isn’t. Cloud Functions maxes out at 9 minutes (540 seconds) for HTTP-triggered functions. Cloud Run Jobs allows up to 168 hours (7 full days), which covers any reasonable dbt run.

Container support means you control the environment completely. Pin exact versions of dbt, adapters, and dependencies. Include custom packages. Run pre- or post-dbt scripts in the same container.

The pricing model rewards efficiency. You pay only for execution time, not for idle infrastructure. A daily dbt run that takes 10 minutes costs pennies. Compare that to Composer’s minimum $300-400 monthly for an environment sitting idle 23+ hours per day.

When does Composer justify its cost? When you need complex orchestration spanning multiple systems: extraction, transformation, reverse ETL, with sophisticated retry logic and backfill capabilities. For a detailed comparison, see Cloud Run Jobs vs. Composer for dbt. If your dbt project runs independently on a schedule, Cloud Run Jobs is the better choice.

Repository and container strategy

Separate your dbt project from your Docker image definition. This two-repository approach enables independent development cycles. Data analysts update SQL models without touching infrastructure. Platform engineers update the container without modifying transformation logic.

Structure looks like this:

dbt-project-repo/
├── models/
├── macros/
├── tests/
├── dbt_project.yml
└── profiles.yml

dbt-runner-repo/
├── Dockerfile
├── cloudbuild.yaml
└── scripts/
    └── run-dbt.sh

The runner repository builds an image that clones the dbt project at runtime, or you can bake the models into the image during build. Runtime cloning adds flexibility; baking in models adds reproducibility. For most teams, baking models into the image during CI/CD provides better version control.

The Dockerfile

Multi-stage builds keep images small while ensuring reproducibility:

# Build stage
FROM python:3.11-slim as builder

WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    && rm -rf /var/lib/apt/lists/*

# Install dbt with pinned versions
RUN pip install --no-cache-dir \
    dbt-core==1.9.0 \
    dbt-bigquery==1.9.0

# Copy dbt project
COPY dbt_project/ /app/dbt_project/
COPY profiles.yml /app/profiles.yml

# Runtime stage
FROM python:3.11-slim

WORKDIR /app

# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin/dbt /usr/local/bin/dbt

# Copy dbt project
COPY --from=builder /app/dbt_project /app/dbt_project
COPY --from=builder /app/profiles.yml /app/profiles.yml

# Set working directory to dbt project
WORKDIR /app/dbt_project

# Default command
CMD ["dbt", "build", "--profiles-dir", "/app"]

Pin exact versions. Using dbt-core==1.9.0 ensures reproducible builds. latest tags create debugging nightmares when behavior changes between runs.

Build and push to Artifact Registry:

# Create repository if it doesn't exist
gcloud artifacts repositories create dbt-images \
    --repository-format=docker \
    --location=us-central1 \
    --description="dbt Docker images"

# Build and push
gcloud builds submit \
    --tag us-central1-docker.pkg.dev/PROJECT_ID/dbt-images/dbt-runner:v1.0.0

Configuring profiles.yml for Cloud Run

The profiles.yml tells dbt how to connect to BigQuery. On Cloud Run, use OAuth with Workload Identity (no service account keys to manage):

dbt_project:
  target: prod
  outputs:
    prod:
      type: bigquery
      method: oauth
      project: "{{ env_var('GCP_PROJECT') }}"
      dataset: "{{ env_var('DBT_DATASET', 'analytics') }}"
      location: "{{ env_var('BQ_LOCATION', 'US') }}"
      threads: 4
      timeout_seconds: 300
      priority: interactive
      retries: 1

The method: oauth setting tells dbt-bigquery to use the default credentials provided by the runtime environment. On Cloud Run, that’s the attached service account’s credentials, obtained automatically through Workload Identity.

Environment variable substitution with env_var() keeps the profiles.yml environment-agnostic. The same image works across dev, staging, and prod by changing environment variables at deploy time.

Setting up IAM and Workload Identity

Create a dedicated service account for your dbt workload:

# Set variables
export PROJECT_ID=your-project-id
export SA_NAME=dbt-runner
export SA_EMAIL=$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com

# Create service account
gcloud iam service-accounts create $SA_NAME \
    --display-name="dbt Cloud Run Runner" \
    --description="Service account for dbt Cloud Run Jobs"

Grant the minimum required permissions. For BigQuery transformations:

# BigQuery permissions for running dbt
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:$SA_EMAIL" \
    --role="roles/bigquery.dataEditor"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:$SA_EMAIL" \
    --role="roles/bigquery.jobUser"

# If dbt needs to create datasets
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:$SA_EMAIL" \
    --role="roles/bigquery.dataOwner"

Workload Identity works automatically on Cloud Run. The service account you attach to the job provides OAuth credentials without any key files. This eliminates the security risk of long-lived credentials and the operational burden of key rotation.

Deploying the Cloud Run Job

Create the Cloud Run Job with your container:

gcloud run jobs create dbt-daily \
    --image=us-central1-docker.pkg.dev/$PROJECT_ID/dbt-images/dbt-runner:v1.0.0 \
    --region=us-central1 \
    --service-account=$SA_EMAIL \
    --memory=2Gi \
    --cpu=2 \
    --max-retries=2 \
    --task-timeout=3600 \
    --set-env-vars="GCP_PROJECT=$PROJECT_ID,DBT_DATASET=analytics,BQ_LOCATION=US"

Key configuration choices:

Memory and CPU: Start with 2GB memory and 2 CPUs. dbt’s memory usage scales with model complexity and parallelism. If you see out-of-memory errors, increase memory. If runs take too long, increase CPUs and threads in profiles.yml together.

Task timeout: Set this higher than your longest expected run, with buffer for variance. 3600 seconds (1 hour) works for most projects. The maximum is 168 hours.

Max retries: Setting --max-retries=2 means Cloud Run will retry failed executions twice. dbt’s exit code (non-zero on failure) triggers this automatically.

Test the job manually:

gcloud run jobs execute dbt-daily --region=us-central1 --wait

The --wait flag blocks until completion, showing logs in your terminal.

Scheduling with Cloud Scheduler

Cloud Scheduler triggers your dbt job on a cron schedule. First, create a service account for the scheduler with permission to invoke the job:

# Create scheduler service account
gcloud iam service-accounts create dbt-scheduler \
    --display-name="dbt Scheduler Invoker"

# Grant permission to invoke the Cloud Run Job
gcloud run jobs add-iam-policy-binding dbt-daily \
    --region=us-central1 \
    --member="serviceAccount:dbt-scheduler@$PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/run.invoker"

Create the scheduled job:

gcloud scheduler jobs create http dbt-daily-schedule \
    --location=us-central1 \
    --schedule="0 6 * * *" \
    --uri="https://us-central1-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/$PROJECT_ID/jobs/dbt-daily:run" \
    --http-method=POST \
    --oauth-service-account-email=dbt-scheduler@$PROJECT_ID.iam.gserviceaccount.com

Common cron patterns:

0 6 * * * (daily at 6 AM)
0 */4 * * * (every 4 hours)
0 6 * * 1-5 (weekdays at 6 AM)
*/30 * * * * (every 30 minutes)

Times are in the timezone of your scheduler location. Set explicitly with --time-zone="America/New_York" if needed.

Event-driven triggers with Eventarc

For pipelines where dbt should run when upstream data arrives, Eventarc provides event-driven triggering.

Trigger dbt when a file lands in Cloud Storage:

gcloud eventarc triggers create dbt-on-upload \
    --location=us-central1 \
    --destination-run-job=dbt-daily \
    --destination-run-region=us-central1 \
    --event-filters="type=google.cloud.storage.object.v1.finalized" \
    --event-filters="bucket=your-data-bucket" \
    --service-account=$SA_EMAIL

Trigger dbt when a BigQuery load job completes:

gcloud eventarc triggers create dbt-on-bq-load \
    --location=us-central1 \
    --destination-run-job=dbt-daily \
    --destination-run-region=us-central1 \
    --event-filters="type=google.cloud.audit.log.v1.written" \
    --event-filters="serviceName=bigquery.googleapis.com" \
    --event-filters="methodName=google.cloud.bigquery.v2.JobService.InsertJob" \
    --service-account=$SA_EMAIL

Event-driven patterns work well for near-real-time data freshness. Combine with scheduled runs as a fallback to ensure models run even if events are missed.

Monitoring and alerting

Cloud Run Jobs automatically sends container output to Cloud Logging. Configure dbt to output useful logs:

# In your CMD or run script
dbt build --profiles-dir /app 2>&1 | tee /dev/stderr

Create a log-based alert for failures:

gcloud logging metrics create dbt-failures \
    --description="dbt Cloud Run Job failures" \
    --log-filter='resource.type="cloud_run_job" AND resource.labels.job_name="dbt-daily" AND severity>=ERROR'

gcloud alpha monitoring policies create \
    --display-name="dbt Job Failed" \
    --condition-display-name="Error rate > 0" \
    --condition-filter='metric.type="logging.googleapis.com/user/dbt-failures"' \
    --condition-threshold-value=0 \
    --condition-threshold-comparison=COMPARISON_GT \
    --notification-channels=YOUR_CHANNEL_ID

Key metrics to track in Cloud Monitoring:

Execution count: run.googleapis.com/job/completed_execution_count
Execution duration: run.googleapis.com/job/completed_execution_duration
Memory utilization: run.googleapis.com/job/memory/utilization

Set alerts on duration anomalies. A run that normally takes 10 minutes suddenly taking 45 minutes often indicates a problem before it becomes a failure.

Complete deployment script

Here’s everything in one script:

#!/bin/bash
set -e

export PROJECT_ID=your-project-id
export REGION=us-central1
export SA_NAME=dbt-runner
export SA_EMAIL=$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com
export IMAGE=us-central1-docker.pkg.dev/$PROJECT_ID/dbt-images/dbt-runner:v1.0.0

# Create service account
gcloud iam service-accounts create $SA_NAME \
    --display-name="dbt Cloud Run Runner" \
    --project=$PROJECT_ID || true

# Grant BigQuery permissions
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:$SA_EMAIL" \
    --role="roles/bigquery.dataEditor"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:$SA_EMAIL" \
    --role="roles/bigquery.jobUser"

# Create Artifact Registry repository
gcloud artifacts repositories create dbt-images \
    --repository-format=docker \
    --location=$REGION \
    --project=$PROJECT_ID || true

# Build and push image
gcloud builds submit \
    --tag $IMAGE \
    --project=$PROJECT_ID

# Create Cloud Run Job
gcloud run jobs create dbt-daily \
    --image=$IMAGE \
    --region=$REGION \
    --service-account=$SA_EMAIL \
    --memory=2Gi \
    --cpu=2 \
    --max-retries=2 \
    --task-timeout=3600 \
    --set-env-vars="GCP_PROJECT=$PROJECT_ID,DBT_DATASET=analytics" \
    --project=$PROJECT_ID

# Create scheduler service account
gcloud iam service-accounts create dbt-scheduler \
    --display-name="dbt Scheduler Invoker" \
    --project=$PROJECT_ID || true

gcloud run jobs add-iam-policy-binding dbt-daily \
    --region=$REGION \
    --member="serviceAccount:dbt-scheduler@$PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/run.invoker" \
    --project=$PROJECT_ID

# Create schedule
gcloud scheduler jobs create http dbt-daily-schedule \
    --location=$REGION \
    --schedule="0 6 * * *" \
    --uri="https://$REGION-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/$PROJECT_ID/jobs/dbt-daily:run" \
    --http-method=POST \
    --oauth-service-account-email=dbt-scheduler@$PROJECT_ID.iam.gserviceaccount.com \
    --project=$PROJECT_ID

echo "Deployment complete. Test with:"
echo "gcloud run jobs execute dbt-daily --region=$REGION --wait"

Cost breakdown

For a daily dbt job running 15 minutes with 2 vCPUs and 2GB memory:

Cloud Run execution: ~$0.50/month
Cloud Scheduler: ~$0.10/month (3 free jobs, $0.10 per additional)
Artifact Registry storage: ~$0.50/month
Cloud Build: Free tier covers most usage
Total: Under $5/month

Compare to Cloud Composer 3’s minimum $300-400/month for an idle environment. Cloud Run Jobs costs 1-2% of Composer for straightforward dbt orchestration.

Cloud Run Jobs covers most dbt Core deployment needs on GCP. Container flexibility, 7-day execution limits, and pay-per-execution pricing remove the tradeoff between simplicity and capability that defined earlier options.

When do you outgrow this setup? When orchestration complexity genuinely demands Airflow: multi-system pipelines with sophisticated dependencies, backfill requirements, or compliance mandates for task-level audit logging. Most teams reach that threshold later than they expect, if at all.

For a broader view of where Cloud Run Jobs fits in the GCP data platform landscape, see GCP Data Platform Architecture: Strategic Patterns.