GCP IAM Least Privilege for Data Teams

IAM debt accumulates on GCP data platforms through Editor roles granted for convenience, shared service accounts created to avoid the overhead of creating separate ones, and service account keys committed to repositories to unblock deployments. Fixing IAM debt requires two phases: auditing what is currently configured, then implementing patterns that prevent reaccumulation. The notes in this hub cover both phases in sequence.

Phase 1: Audit

IAM Debt Audit for GCP Data Platforms — Run these queries first. Three bash/SQL scripts that surface the highest-risk items: principals with Editor or Owner roles, service accounts with downloadable keys, and service accounts shared across multiple workloads. The results become your remediation backlog.

Phase 2: Implement Least-Privilege Patterns

2-Layer RBAC with Google Groups — Bind IAM roles to groups representing job functions, not individual users. When someone joins, add them to the group. When they leave, remove them. The role bindings never change. Includes IAM conditions for scoping access to specific dataset prefixes.

Per-Workload Service Account Naming Conventions — One service account per workload, named with a compute-platform prefix (crj-, cmp-, wlif-). When a service account name appears in INFORMATION_SCHEMA.JOBS, it tells you exactly which workload ran the query and where to look. Also covers service account impersonation for local development without distributing keys.

Workload Identity Federation for CI/CD — Replace service account keys in GitHub Actions and other CI systems with keyless OIDC authentication. No credentials to store, rotate, or leak. The key expires after one hour; the next run gets a fresh one.

Phase 3: Data-Level Security

BigQuery Column-Level Security with Policy Tags — Tag sensitive columns in Data Catalog and control access via IAM rather than views. No more creating filtered views to hide SSN columns; the tag enforces access at the storage layer for any query against the table.

BigQuery Row Access Policies — Replace per-segment views (sales_emea, sales_apac) with dynamic row filtering that applies automatically based on querying user identity. Add SESSION_USER() for manager-sees-their-team patterns that reference live lookup tables.

BigQuery Dynamic Data Masking — Show sensitive column structure without exposing values. Analysts can write queries that JOIN on email addresses (using SHA256 hashes) without reading actual email values. Three masking modes: deterministic hash, type-appropriate defaults, or null.

Phase 4: Prevent Reaccumulation

IAM Drift Monitoring for GCP — IAM debt reaccumulates without active monitoring. IAM Recommender analyzes 90 days of usage to flag over-permissioned principals. INFORMATION_SCHEMA.JOBS surfaces unexpected service accounts and access patterns. Audit log sinks to BigQuery capture every IAM change for review.

The BigQuery IAM Patterns note covers the underlying role model — the distinction between data access and compute access that makes BigQuery’s IAM work differently from most GCP services.

For dbt teams: row access policies and policy tags disappear when dbt drops and recreates tables. Secured Table Materialization in dbt solves this by reapplying security configuration as part of the materialization step.

For local development authentication without keys: GCP Application Default Credentials explains the ADC credential chain and how impersonation fits into it.

Phase 1: Audit

Phase 2: Implement Least-Privilege Patterns

Phase 3: Data-Level Security

Phase 4: Prevent Reaccumulation

Related Notes