GCP IAM for Data Teams: Least Privilege Done Right

The IAM configuration that got your data platform running isn’t the IAM configuration that should keep it running. Every team I’ve audited has accumulated what I call IAM debt: Editor roles granted because they “just worked,” shared service accounts because creating new ones felt bureaucratic, and service account keys committed to repositories because the deployment needed to ship.

This debt compounds. When you can’t answer “who can access this table?” or explain a spike in BigQuery costs from queries nobody remembers running, you’re paying interest on shortcuts taken months or years ago.

Fixing IAM debt requires two phases: audit what you have, then implement patterns that prevent reaccumulation. This guide covers both, with specific focus on BigQuery and the data platform services around it.


The IAM audit reveals what you’ve been ignoring

Start with a baseline understanding of your current state. Run these queries against your GCP organization to surface the most common problems.

List all principals with Editor or Owner roles at the project level:

Terminal window
gcloud projects get-iam-policy YOUR_PROJECT_ID \
--flatten="bindings[].members" \
--format="table(bindings.role,bindings.members)" \
--filter="bindings.role:(roles/editor OR roles/owner)"

Find service accounts with keys:

Terminal window
gcloud iam service-accounts list --project=YOUR_PROJECT_ID \
--format="value(email)" | while read sa; do
keys=$(gcloud iam service-accounts keys list --iam-account="$sa" \
--format="value(name)" --filter="keyType=USER_MANAGED" 2>/dev/null)
if [ -n "$keys" ]; then
echo "Service account with keys: $sa"
fi
done

Identify service accounts used across multiple workloads by checking recent BigQuery job history:

SELECT
user_email,
COUNT(DISTINCT job_id) AS job_count,
COUNT(DISTINCT REGEXP_EXTRACT(query, r'FROM `([^`]+)`')) AS tables_accessed
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
AND user_email LIKE '%.iam.gserviceaccount.com'
GROUP BY user_email
ORDER BY job_count DESC

A service account running hundreds of different queries against dozens of tables is almost certainly shared across workloads. That’s your first remediation target.


The 2-layer RBAC pattern structures access effectively

The core mistake teams make is binding IAM roles directly to individual users. This creates management overhead (add permissions one by one for each new hire) and audit confusion (why does this specific person have Data Editor on this dataset?).

The 2-layer pattern separates object access from job function:

Layer 1: Predefined IAM roles handle what actions are possible on BigQuery resources. The three you’ll use most:

  • roles/bigquery.dataViewer reads tables and views
  • roles/bigquery.dataEditor reads, writes, and deletes table data
  • roles/bigquery.dataOwner provides full control including sharing

Layer 2: Google Groups represent job functions. Create groups like:

  • data-loaders@yourdomain.com for service accounts and humans that write raw data
  • data-engineers@yourdomain.com for those who transform and model data
  • data-analysts@yourdomain.com for those who query production tables but can’t modify

Bind roles to groups, not individuals. When someone joins the team, add them to the appropriate group. When they leave, remove them. The role bindings never change.

Terminal window
# Grant the analysts group viewer access to a production dataset
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="group:data-analysts@yourdomain.com" \
--role="roles/bigquery.dataViewer" \
--condition='expression=resource.name.startsWith("projects/YOUR_PROJECT_ID/datasets/prod_"),title=prod-datasets-only'

The IAM condition restricts the role to datasets with the prod_ prefix. Analysts get production data access without being able to read development datasets.


Per-workload service accounts replace shared credentials

The shared service account problem is pervasive. A single etl-service-account@project.iam.gserviceaccount.com runs Airflow DAGs, Cloud Run Jobs, scheduled queries, and maybe a few cron jobs on a Compute Engine instance nobody remembers creating.

When something breaks or costs spike, you can’t determine which workload caused it. When you need to rotate credentials, you risk breaking everything.

Create one service account per workload, with permissions scoped to exactly what that workload needs.

Naming convention that makes logs readable:

crj-dbt-daily@project.iam.gserviceaccount.com # Cloud Run Job for daily dbt
cmp-extraction-dag@project.iam.gserviceaccount.com # Composer DAG for extraction
wlif-github-actions@project.iam.gserviceaccount.com # Workload Identity Federation for CI

The prefix indicates the compute platform (crj for Cloud Run Jobs, cmp for Composer, wlif for Workload Identity Federation). When you see a query in INFORMATION_SCHEMA.JOBS, the service account name tells you exactly which workload ran it.

Replace keys with impersonation:

Service account keys are credentials that can be copied, committed to repositories, and used from anywhere. Service account impersonation grants temporary credentials that require the caller to authenticate first.

Terminal window
# Grant a user permission to impersonate a service account
gcloud iam service-accounts add-iam-policy-binding \
crj-dbt-daily@project.iam.gserviceaccount.com \
--member="user:engineer@yourdomain.com" \
--role="roles/iam.serviceAccountTokenCreator"

Now the engineer can run dbt locally with production credentials:

Terminal window
gcloud auth application-default login --impersonate-service-account=crj-dbt-daily@project.iam.gserviceaccount.com
dbt run

The audit log shows both the human identity and the service account they impersonated. No keys to rotate. No credentials to leak.


Column-level security uses policy tags, not views

Before policy tags, teams created views to hide sensitive columns:

-- Don't do this anymore
CREATE VIEW safe_customers AS
SELECT customer_id, signup_date, country
-- SSN and email deliberately omitted
FROM raw_customers

This creates maintenance burden (new columns require view updates), performance overhead (views on views on views), and governance gaps (who remembers which views hide which columns?).

Data Catalog policy tags provide column-level security at the storage layer. Setting them up takes four steps.

  1. Create a taxonomy with hierarchical categories:
Terminal window
gcloud data-catalog taxonomies create "PII" \
--location=us \
--description="Personally identifiable information"
gcloud data-catalog taxonomies policy-tags create "High_Sensitivity" \
--taxonomy="projects/YOUR_PROJECT/locations/us/taxonomies/PII" \
--description="SSN, passport numbers, financial accounts"
gcloud data-catalog taxonomies policy-tags create "Medium_Sensitivity" \
--parent-policy-tag="projects/YOUR_PROJECT/locations/us/taxonomies/PII/policyTags/High_Sensitivity" \
--description="Email, phone, address"
  1. Enable access control on the taxonomy:
Terminal window
gcloud data-catalog taxonomies set-iam-policy \
"projects/YOUR_PROJECT/locations/us/taxonomies/PII" \
policy.json
  1. Tag columns in your BigQuery schemas. You can do this in the Console, via API, or through Terraform:
resource "google_bigquery_table" "customers" {
# ... table config ...
schema = jsonencode([
{
name = "email"
type = "STRING"
policyTags = {
names = ["projects/YOUR_PROJECT/locations/us/taxonomies/PII/policyTags/Medium_Sensitivity"]
}
}
])
}
  1. Grant access to users who need to see tagged columns:
Terminal window
gcloud data-catalog taxonomies policy-tags add-iam-policy-binding \
"projects/YOUR_PROJECT/locations/us/taxonomies/PII/policyTags/Medium_Sensitivity" \
--member="group:data-analysts@yourdomain.com" \
--role="roles/datacatalog.categoryFineGrainedReader"

Tag at the highest appropriate level in your hierarchy. If a group should see all Medium_Sensitivity data, grant access at that level. The permission cascades to child tags. Managing ten categories is tractable; managing hundreds of individual column tags is not.


Row-level security filters without maintaining separate views

Column-level security controls which fields a user sees. Row-level security controls which records.

The traditional approach creates filtered views per segment:

-- Regional views nobody wants to maintain
CREATE VIEW sales_emea AS SELECT * FROM sales WHERE region = 'EMEA';
CREATE VIEW sales_apac AS SELECT * FROM sales WHERE region = 'APAC';
CREATE VIEW sales_americas AS SELECT * FROM sales WHERE region = 'AMERICAS';

BigQuery Row Access Policies replace this with dynamic filtering based on the querying user:

CREATE ROW ACCESS POLICY region_filter
ON project.dataset.sales
GRANT TO ("group:sales-emea@yourdomain.com")
FILTER USING (region = 'EMEA');
CREATE ROW ACCESS POLICY region_filter_apac
ON project.dataset.sales
GRANT TO ("group:sales-apac@yourdomain.com")
FILTER USING (region = 'APAC');

Now when a user in the sales-emea group queries the sales table, they only see EMEA rows. No view management. The filter applies automatically.

For more dynamic filtering based on user attributes, use SESSION_USER():

CREATE ROW ACCESS POLICY manager_sees_their_team
ON project.dataset.employee_metrics
GRANT TO ("group:managers@yourdomain.com")
FILTER USING (
manager_email = SESSION_USER()
OR SESSION_USER() IN (SELECT email FROM project.dataset.hr_admins)
);

Managers see metrics for their direct reports. HR admins see everyone. The policy references lookup tables that HR maintains, not static filters that engineers update.

Row and column security combine cleanly. A sales analyst might see only EMEA rows (row-level) and not see the margin column (column-level). Both policies apply independently.


Dynamic data masking shows structure without exposing values

Sometimes analysts need to work with sensitive data’s structure without seeing actual values. They’re building queries, testing joins, or validating data quality on synthetic examples.

Dynamic data masking extends column-level security. Instead of blocking access entirely, it shows obscured values:

-- Create a data policy with masking
CREATE DATA POLICY pii_mask
ON project.dataset.customers
COLUMN email
USING SHA256; -- Shows hash instead of actual email

Grant maskedReader to analysts who should see masked data:

Terminal window
gcloud bigquery data-policies add-iam-policy-binding \
--data-policy=pii_mask \
--location=us \
--member="group:data-analysts@yourdomain.com" \
--role="roles/bigquery.maskedReader"

Now analysts see a7f3d2e1b4c5... instead of user@example.com. They can write queries, test joins on the hashed values, and verify row counts. When they need actual values for a specific use case, they request elevated access through your approval workflow.

Masking options include:

  • SHA256 produces a deterministic hash (same input = same output, useful for joins)
  • DEFAULT_MASKING_VALUE returns type-appropriate defaults (empty string, 0, null)
  • NULLIFY always returns null

Assign masked reader at the data policy level, not the project level. Project-level grants give masked access to all masked columns in the project, which likely exceeds what any single analyst needs.


Common security shortcuts to eliminate

These patterns appear in almost every audit. Each seems reasonable in isolation but creates cumulative risk.

Service account keys in repositories

The “easy” way to authenticate CI/CD. The secure way: Workload Identity Federation. GitHub Actions, GitLab CI, and most CI platforms support OIDC authentication that exchanges their tokens for GCP credentials without storing keys.

# GitHub Actions with Workload Identity Federation
- uses: google-github-actions/auth@v2
with:
workload_identity_provider: 'projects/123456/locations/global/workloadIdentityPools/github/providers/github'
service_account: 'wlif-github-actions@project.iam.gserviceaccount.com'

BigQuery Data Viewer without Job User

A common confusion. roles/bigquery.dataViewer lets users see table metadata and data. But running a query requires roles/bigquery.jobUser to create jobs. Users with only Data Viewer can browse tables in the Console but can’t actually query them.

Always grant both, or use roles/bigquery.user which includes job creation plus the ability to create datasets for personal use.

Policy tags at project level

Granting roles/datacatalog.categoryFineGrainedReader at the project level gives access to all policy tags in that project. If you later add a new tag category for more sensitive data, everyone with project-level access can see it automatically.

Grant at the policy tag level for principle of least privilege.

Default Compute Engine service account

The default service account (PROJECT_NUMBER-compute@developer.gserviceaccount.com) gets Editor role on the project by default. This can’t be reduced without breaking some GCP features. Don’t use it for workloads.

Create dedicated service accounts for each Compute Engine workload. If legacy systems depend on the default account, plan a migration.


Monitoring signals catch permission drift

IAM debt reaccumulates. Someone grants Editor “temporarily” and forgets to remove it. A new workload reuses an existing service account because it was faster. Quarterly reviews prevent drift.

IAM Recommender analyzes actual usage and identifies over-privileged principals:

Terminal window
gcloud recommender recommendations list \
--project=YOUR_PROJECT_ID \
--location=global \
--recommender=google.iam.policy.Recommender

It might tell you that a service account with Data Editor role has only read data for 90 days. You can safely reduce to Data Viewer.

INFORMATION_SCHEMA.JOBS shows who runs what in BigQuery:

SELECT
user_email,
DATE(creation_time) AS query_date,
COUNT(*) AS query_count,
SUM(total_bytes_billed) / POW(10,12) AS tb_billed
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)
GROUP BY 1, 2
ORDER BY tb_billed DESC

Unexpected service accounts or users appearing in this list warrant investigation.

Audit logs capture all IAM changes. Set up a log sink to BigQuery and query for recent permission grants:

SELECT
timestamp,
protoPayload.authenticationInfo.principalEmail AS grantor,
protoPayload.serviceData.policyDelta.bindingDeltas
FROM `your_audit_dataset.cloudaudit_googleapis_com_activity`
WHERE protoPayload.methodName = 'SetIamPolicy'
AND timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
ORDER BY timestamp DESC

Teams that monitor these signals typically achieve 30-40% cost reductions in their first quarter of focused optimization, largely by identifying and removing unnecessary access that enabled runaway queries.


The path forward

Cleaning up IAM debt feels tedious until you experience an incident that clean IAM would have prevented: a leaked service account key, a query that scanned petabytes because someone had broader access than needed, an audit that couldn’t determine who accessed sensitive data.

The audit queries in this guide surface the highest-risk items: Editor roles, shared service accounts, and service account keys. From there, migrate one workload at a time to per-workload service accounts with minimal permissions, and add policy tags for sensitive columns.

The goal is knowing who can access what, and making sure that answer is intentional rather than accumulated accident.