ServicesAboutNotesContact Get in touch →
EN FR
Note

BigQuery Multi-Environment Patterns

Three patterns for separating dev, staging, and production in BigQuery — separate projects, dataset prefixes, and central data lake with department marts.

Planted
bigquerydbtgcpdata engineering

BigQuery environment separation typically requires at least two environments (development and production) and often three (development, staging, production). Google Cloud recommends separate GCP projects for the strongest isolation: independent billing, quotas, security boundaries, and slot reservations. Simpler approaches suit smaller teams.

Pattern A: Separate Projects per Environment

Organization
└── Data Platform Folder
├── analytics-dev
├── analytics-staging
└── analytics-prod

Each environment gets its own project. Development work happens in analytics-dev. Staging validates changes in analytics-staging. Production serves dashboards and reports from analytics-prod.

This separation provides:

  • Independent billing so you can see exactly what development costs versus production.
  • Isolated quotas so a runaway dev query can’t impact production. Each project has its own slot allocation.
  • Clear security boundaries. Production service accounts never touch dev data, and vice versa.
  • Separate slot reservations. If you use capacity pricing, you can reserve dedicated slots for production while dev uses on-demand.

The downside is complexity: more service accounts, more IAM policies, more profiles.yml targets. For teams larger than three or four people this is generally the appropriate choice; smaller teams can use simpler patterns.

In dbt’s profiles.yml, this means separate targets pointing to different projects:

my_analytics:
target: dev
outputs:
dev:
type: bigquery
project: analytics-dev
dataset: "dbt_{{ env_var('USER', 'developer') }}"
location: US
prod:
type: bigquery
project: analytics-prod
dataset: analytics
location: US

Pattern B: Separate Datasets Within One Project

analytics-data
├── dev_base
├── dev_marts
├── prod_base
└── prod_marts

A single project contains all environments, separated by dataset naming. Development models write to dev_* datasets; production writes to prod_* datasets.

This is simpler to manage. One billing account, one set of quotas, fewer service accounts. Dataset-level permissions still isolate access: you can grant developers write access to dev_* datasets while restricting production datasets to automated service accounts.

Downsides: shared quotas (heavy dev usage affects prod performance) and less clear billing separation. Appropriate for small teams (1–3 people), early-stage projects, or situations where environment isolation is not critical.

Pattern C: Central Data Lake with Department Data Marts

data-lake-storage (central team)
└── raw datasets, base datasets
↓ (Data Viewer access)
├── marketing-analytics (Marketing team)
├── finance-analytics (Finance team)
└── product-analytics (Product team)

A central data engineering team manages ingestion, cleaning, and core modeling in a dedicated storage project. Department teams run their own transformations and queries in separate projects, building domain-specific marts.

Storage costs go to the central project. Compute costs flow to department billing accounts. This cleanly separates platform costs (“how much does our data infrastructure cost?”) from consumption costs (“how much does the marketing team spend on analytics?”).

The central team grants bigquery.dataViewer on curated datasets to department projects. Each department manages their own dbt projects, their own models, their own dashboards, sourcing from the central data lake.

This pattern requires more organizational coordination and scales to large enterprises with chargeback requirements. Project-level billing separation makes cost attribution straightforward.

Cross-Project Querying

Querying across projects requires no special configuration. Just use fully qualified table names:

SELECT
c.customer_id,
c.customer_name,
o.order_id,
o.order_date
FROM `other-project.shared_dataset.mrt__sales__customers` c
JOIN `my-project.local_dataset.mrt__sales__orders` o
ON c.customer_id = o.customer_id

Same-region cross-project queries incur zero network transfer costs. The data doesn’t move between projects; BigQuery routes the query to where the data lives. This is a direct consequence of the separation of storage and compute — Dremel can read from Colossus regardless of which project “owns” the data.

Permissions required:

  • bigquery.dataViewer on the source dataset (to read the data)
  • bigquery.jobUser on the project running the query (to execute queries)

The querying project pays for compute. See BigQuery IAM Patterns for the full role breakdown.

Which Pattern to Choose

The decision comes down to team size and organizational needs:

FactorPattern A (Projects)Pattern B (Datasets)Pattern C (Central Lake)
Team size4+ people1-3 peopleMultiple teams
Billing isolationFullLimitedFull, by department
Quota isolationFullNoneFull
Setup complexityHighLowHigh
Operational overheadModerateLowHigh
Best forStandard teamsSolo/small teamsEnterprise orgs

Most teams start with Pattern B and move to Pattern A when team size or isolation requirements warrant it. Pattern C adds value only with genuine multi-team chargeback requirements.

Regardless of pattern, all environments must use the same region. Cross-region joins are impossible; dev in US with prod in EU makes migration testing invalid.