ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Service Account Setup for Multi-Project GCP Architectures

How to create and configure a dbt service account when your source data, transformation output, and compute infrastructure live in separate GCP projects.

Planted
dbtgcpbigquerydata engineeringautomation

The typical dbt service account setup assumes a single GCP project: dbt reads from tables, writes to tables, all in one place. But real GCP architectures often span multiple projects — raw data in source projects, transformation output in a dedicated transform project, and compute infrastructure (Cloud Functions or Cloud Run) potentially in its own project.

When projects multiply, the IAM setup gets more specific. The service account needs different roles in each project, and the gcloud commands to set this up are tedious to run manually and easy to get wrong.

The Three-Project Pattern

The common setup for a GCP data platform with Cloud Functions has three projects playing distinct roles:

Source projects host the raw data dbt reads from. This might be one project, or several if your data comes from multiple teams or products. The dbt service account needs to read data here but should never write anything. Roles needed:

  • roles/bigquery.dataViewer — read table data
  • roles/bigquery.jobUser — run query jobs (BigQuery compute is billed to the project where you run the job, not where the data lives, so you need job permissions in source projects if you’re querying there)

Transform project hosts the datasets where dbt creates tables and views — your base, intermediate, and mart layers. The service account needs full write access here. Roles needed:

  • roles/bigquery.dataEditor — create, update, and delete tables and views
  • roles/bigquery.user — run jobs and list datasets

Function project is where the Cloud Function runs. The service account needs permission to be invoked as a Cloud Function. Roles needed:

  • roles/cloudfunctions.invoker — allows Cloud Scheduler (or other callers) to trigger the function

In many deployments the transform project and function project are the same. Keeping them separate only makes sense if your infrastructure is managed at the organization level with strict project-per-service policies.

The Setup Script

This script handles all of the above from Cloud Shell. Set the variables for your environment, then run it once:

Terminal window
# Variables to set up
TRANSFORM_PROJECT_ID="transform-project-id"
FUNCTION_PROJECT_ID="function-project-id"
SOURCE_PROJECT_IDS=("source-project-id-1" "source-project-id-2" "source-project-id-N")
SERVICE_ACCOUNT_NAME="dbt-transform-sa"
SERVICE_ACCOUNT_DISPLAY_NAME="dbt Transformation Service Account"
# Create service account in the data transformations project
gcloud iam service-accounts create "$SERVICE_ACCOUNT_NAME" \
--description="Service account for dbt data transformations" \
--display-name="$SERVICE_ACCOUNT_DISPLAY_NAME" \
--project="$TRANSFORM_PROJECT_ID"
# Format the service account email
SERVICE_ACCOUNT_EMAIL="${SERVICE_ACCOUNT_NAME}@${TRANSFORM_PROJECT_ID}.iam.gserviceaccount.com"
echo "Service account $SERVICE_ACCOUNT_EMAIL created."
# Loop through the source project IDs to assign roles
for PROJECT_ID in "${SOURCE_PROJECT_IDS[@]}"
do
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
--role="roles/bigquery.dataViewer"
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
--role="roles/bigquery.jobUser"
echo "Roles assigned in the source project $PROJECT_ID."
done
# Assign roles in the transformation project
gcloud projects add-iam-policy-binding "$TRANSFORM_PROJECT_ID" \
--member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
--role="roles/bigquery.dataEditor"
gcloud projects add-iam-policy-binding "$TRANSFORM_PROJECT_ID" \
--member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
--role="roles/bigquery.user"
echo "Roles assigned in the transformation project."
# Assign Cloud Function Invoker role in the project where Cloud Function will run
gcloud projects add-iam-policy-binding "$FUNCTION_PROJECT_ID" \
--member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
--role="roles/cloudfunctions.invoker"
echo "Cloud Function Invoker role assigned."
echo "Setup complete."

The service account gets created in the transform project, which is a reasonable home for it — that’s where the bulk of its work happens. From there, it gets granted roles in each source project via cross-project IAM bindings, which is a standard GCP pattern.

Why Not Just Use One Project?

If you’re reading this and your setup is a single project, you don’t need any of this complexity. Create one service account, give it bigquery.dataEditor and bigquery.jobUser on the project, and you’re done.

The multi-project setup matters when:

  • Data ownership is distributed. Different teams own their raw data in separate projects and control who can read it. Your dbt service account gets explicit access to the datasets it needs, without blanket access to everything.
  • Billing is separated. Query costs in GCP are billed to the project that runs the query. If data engineering compute should show up separately from product or analytics costs, running queries from a dedicated transform project isolates those costs.
  • Security boundaries are enforced at the organization level. Some organizations require that production data lives in projects with stricter controls than the ones data engineers use day-to-day. The source data stays in a locked-down project; the service account only gets read access to what dbt actually needs.

The BigQuery IAM Patterns note covers the underlying permission model — the separation between data access roles and compute roles — in more depth. The multi-project pattern here is an application of those principles across project boundaries.

Attaching the Service Account

When deploying the Cloud Function, attach the service account explicitly:

Terminal window
gcloud functions deploy dbt_run \
--region=europe-west1 \
--service-account=dbt-transform-sa@transform-project-id.iam.gserviceaccount.com \
--gen2 \
--runtime=python310 \
--entry-point=run_dbt \
--trigger-http \
--timeout=3500 \
--memory=1G

The --service-account flag sets which identity the function runs as. With method: oauth in profiles.yml, dbt-bigquery picks up that identity through Application Default Credentials automatically — no key files needed.

For Cloud Run Jobs, the equivalent flag is --service-account on gcloud run jobs create. The IAM role requirements are similar but slightly different: instead of roles/cloudfunctions.invoker, the scheduler service account needs roles/run.invoker on the Cloud Run Job. The underlying pattern — separate read-only permissions on source projects, write permissions on the transform project — is identical.