ServicesAboutNotesContact Get in touch →
EN FR
Note

BigQuery Column-Level Security with Policy Tags

Replace view-based column hiding with Data Catalog policy tags — storage-layer security that survives schema changes and doesn't require view maintenance.

Planted
bigquerygcpdata engineering

The traditional approach to hiding sensitive columns is a view that omits them:

-- Don't do this anymore
CREATE VIEW safe_customers AS
SELECT customer_id, signup_date, country
-- SSN and email deliberately omitted
FROM raw_customers

This creates three problems: maintenance burden (every new column requires view updates), performance overhead (views on views compound), and governance gaps (who knows which views hide which columns, and why?). When your raw_customers table adds a date_of_birth column, that view silently exposes it until someone notices.

Data Catalog policy tags solve this at the storage layer. The column itself becomes tagged, and access to that column is governed by IAM — independently of how the table is queried. No view maintenance required.

Setting Up a Taxonomy

Policy tags live within a taxonomy — a hierarchical classification of sensitivity levels. Design your taxonomy to match your actual data governance requirements, not an aspirational one.

A practical PII taxonomy:

Terminal window
# Create the taxonomy
gcloud data-catalog taxonomies create "PII" \
--location=us \
--description="Personally identifiable information"
# Create sensitivity levels
gcloud data-catalog taxonomies policy-tags create "High_Sensitivity" \
--taxonomy="projects/YOUR_PROJECT/locations/us/taxonomies/PII" \
--description="SSN, passport numbers, financial account numbers"
gcloud data-catalog taxonomies policy-tags create "Medium_Sensitivity" \
--parent-policy-tag="projects/YOUR_PROJECT/locations/us/taxonomies/PII/policyTags/High_Sensitivity" \
--description="Email, phone number, home address"

The parent-child relationship matters for permission grants: granting access at High_Sensitivity cascades to child tags. Design the hierarchy so that “more access” means being granted at a higher level in the tree, not by accumulating individual tag grants.

Enabling Access Control

Creating the taxonomy doesn’t automatically enforce access. You must enable access control to activate the policy tags:

Terminal window
gcloud data-catalog taxonomies set-iam-policy \
"projects/YOUR_PROJECT/locations/us/taxonomies/PII" \
policy.json

Where policy.json specifies who can manage the taxonomy (your data governance team) but deliberately omits categoryFineGrainedReader — that grant goes on individual policy tags, not the taxonomy itself.

Tagging Columns

Tag columns in BigQuery schema definitions. Terraform is the most maintainable approach for production schemas:

resource "google_bigquery_table" "customers" {
dataset_id = "raw"
table_id = "customers"
project = var.project_id
schema = jsonencode([
{
name = "customer_id"
type = "STRING"
# No policy tag — ID is not sensitive
},
{
name = "email"
type = "STRING"
policyTags = {
names = ["projects/${var.project_id}/locations/us/taxonomies/PII/policyTags/Medium_Sensitivity"]
}
},
{
name = "ssn"
type = "STRING"
policyTags = {
names = ["projects/${var.project_id}/locations/us/taxonomies/PII/policyTags/High_Sensitivity"]
}
}
])
}

Once tagged, any user who queries the table without the appropriate fine-grained reader role gets an error on that column. The tag enforces access at query time — not at permission-grant time.

Granting Access

Grant datacatalog.categoryFineGrainedReader on the specific policy tag, not the project:

Terminal window
# Grant Medium_Sensitivity access to the analysts group
gcloud data-catalog taxonomies policy-tags add-iam-policy-binding \
"projects/YOUR_PROJECT/locations/us/taxonomies/PII/policyTags/Medium_Sensitivity" \
--member="group:data-analysts@yourdomain.com" \
--role="roles/datacatalog.categoryFineGrainedReader"
# Grant High_Sensitivity access only to data engineers
gcloud data-catalog taxonomies policy-tags add-iam-policy-binding \
"projects/YOUR_PROJECT/locations/us/taxonomies/PII/policyTags/High_Sensitivity" \
--member="group:data-engineers@yourdomain.com" \
--role="roles/datacatalog.categoryFineGrainedReader"

The common mistake is granting categoryFineGrainedReader at the project level — which gives access to every policy tag in the project. If you later add a Financial_Data taxonomy, everyone with project-level access can see it automatically. Grant at the tag level for actual least privilege.

Governance Properties

With policy tags, column-level security is centralized in Data Catalog rather than distributed across view definitions. You can answer “who can see email addresses?” by looking at the Medium_Sensitivity policy tag’s IAM bindings — one place, complete answer.

When you add a new sensitive column to a table, tag it on creation. The access control is immediate and doesn’t require creating or updating views. When the sensitivity classification of a column changes — say, a column that used to contain anonymized IDs now contains real identifiers — update the tag and the protection is instant across all tables using it.

For the dbt use case, policy tags need to survive table rebuilds. When dbt drops and recreates a table, policy tags assigned through the schema definition are preserved if the table schema is managed through Terraform or explicitly in BigQuery schema files. If you’re managing tables entirely through dbt, see Secured Table Materialization in dbt for the pattern that explicitly reapplies policy tags after each rebuild.

Tags also combine with BigQuery Row Access Policies and BigQuery Dynamic Data Masking. A user might see only certain rows (row policy), have email visible as a hash rather than plaintext (masking), and be blocked entirely from the SSN column (policy tag). Each layer applies independently.