ServicesAboutNotesContact Get in touch →
EN FR
Note

BigQuery Dynamic Data Masking

Show sensitive column structure without exposing values — SHA256 hashing, nullification, and default masking for analysts who need to write queries but not read PII.

Planted
bigquerygcpdata engineering

Column-level policy tags create a hard boundary: users either see the column or they don’t. That binary works well for high-sensitivity data (SSNs, financial account numbers) where there’s no legitimate reason for most users to see even the structure of the values.

But sometimes analysts need to work with sensitive data without seeing actual values. They’re building queries that join on email addresses, testing whether their logic handles null values correctly, or validating that row counts match between environments. Blocking the column entirely prevents legitimate work. Exposing actual values violates privacy policy.

Dynamic data masking covers this gap. Instead of blocking access, it shows obscured values — hashes, defaults, or nulls — that let analysts work with the column’s shape without seeing its content.

How Masking Works

Masking builds on top of column-level policy tags. The column must already be tagged with a policy tag. You then create a data policy on that column that defines what masked users see:

CREATE DATA POLICY pii_email_mask
ON project.dataset.customers
COLUMN email
USING SHA256;

Grant the maskedReader role to groups who should see masked data:

Terminal window
gcloud bigquery data-policies add-iam-policy-binding \
--data-policy=pii_email_mask \
--location=us \
--member="group:data-analysts@yourdomain.com" \
--role="roles/bigquery.maskedReader"

Now when analysts query the email column, they see a7f3d2e1b4c5... instead of user@example.com. The hash is always the same for the same input, which matters for joining.

Users with the full categoryFineGrainedReader grant on the underlying policy tag see actual values. Users with maskedReader see the mask. Users with neither see an error.

Masking Options

Three masking approaches, each with different use cases:

SHA256 — deterministic hashing. The same email always produces the same hash. This means analysts can:

  • Write queries that JOIN on email (the join works because the hash is consistent)
  • Count distinct emails
  • Identify whether two rows have the same email

They cannot recover the original email from the hash, and they cannot match the hash to emails in other systems (external email lists, CRM exports) where values are unmasked.

Use SHA256 when the column is used as a join key and analysts legitimately need to join on it, just not read the actual value.

DEFAULT_MASKING_VALUE — returns type-appropriate defaults. Strings become empty strings, numbers become 0, dates become the epoch, booleans become false. This obscures both the value and the fact that a value exists.

Use default masking when analysts need to know the column exists and its type, but don’t need to distinguish between specific masked rows.

NULLIFY — always returns NULL. The column appears in the schema, but every row shows NULL for masked users.

Use nullification when you want the column completely invisible in terms of values, but still want analysts to be able to reference the column name in queries without errors.

Grant at the Data Policy Level, Not the Project

The common over-permissioning mistake with masking:

Terminal window
# DON'T do this — grants maskedReader on ALL masked columns in the project
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="group:data-analysts@yourdomain.com" \
--role="roles/bigquery.maskedReader"

Project-level maskedReader gives masked access to every data policy in the project. If you add a new data policy for a more sensitive column next quarter, everyone with project-level access automatically sees that column (masked). That’s not least privilege.

Grant at the data policy level:

Terminal window
gcloud bigquery data-policies add-iam-policy-binding \
--data-policy=pii_email_mask \
--location=us \
--member="group:data-analysts@yourdomain.com" \
--role="roles/bigquery.maskedReader"

This grants masked access only to that specific data policy. Adding new data policies doesn’t automatically extend access.

Designing the Access Tiers

With masking in place, you have three tiers for sensitive columns:

TierGrantWhat they see
No accessNo grantQuery error on the column
Masked accessmaskedReader on data policyHash / empty / null
Full accesscategoryFineGrainedReader on policy tagActual values

Assign tiers to groups based on legitimate need:

  • Data analysts get masked access for columns they need to query but not read
  • Data engineers get full access for columns involved in transformations
  • Compliance / legal get full access as needed for specific investigations
  • BI tools and dashboards depend on what the dashboard shows — masked if aggregated, full if individual records

With masking in place, analysts can perform joins, aggregations, row counts, and query validation without full PII access. Full access becomes a deliberate, auditable escalation rather than the default grant.

Combining with Row Policies

Masking and row access policies compose. An analyst might see only the rows for their region (row policy) and see the email column as a hash (masking). Both policies apply at query time, independently.

A complex access model:

  • Analysts see only rows from their assigned region
  • They see email as SHA256 (joinable but not readable)
  • They are blocked entirely from the SSN column

That’s three separate controls: one row access policy, one data policy with SHA256 masking, one policy tag with no masked reader grant. Each is managed and audited independently.