Adrienne Vermorel

BigQuery Cost Optimization: The 80/20 Guide

Shopify once discovered a query that would have cost them $949,000 per month. The fix? Adding clustering on the columns used in WHERE clauses. That single change reduced bytes scanned from 75 GB to 508 MB, a 150x improvement from a configuration that takes minutes to implement.

This story captures the essence of BigQuery cost optimization: the biggest wins come from a small set of techniques that most teams either skip or misconfigure. Meanwhile, teams spend hours debating storage tiers when 85-90% of their BigQuery bill comes from compute costs.

This guide covers the 20% of optimizations that deliver 80% of savings. We’ll skip the basics (you probably already know what partitioning is). Instead, we’ll focus on the configurations that actually move the needle, the anti-patterns that create $20,000 surprise bills, and the dbt-specific strategies that remain underexplored in most BigQuery guides.


Part 1: Understanding Where Your Money Actually Goes

The BigQuery Cost Breakdown Most Teams Get Wrong

BigQuery costs fall into three buckets: compute (bytes scanned), storage, and streaming ingestion. Here’s what most teams miss: compute typically represents 85-90% of total spend. Yet optimization efforts often start with storage cleanup.

The math makes this clear. At $6.25 per TiB for on-demand queries, scanning 10 TB daily costs $1,875 per month in compute alone. That same 10 TB stored costs $200 per month. A 50% reduction in bytes scanned saves $937 monthly; a 50% reduction in storage saves $100.

The key mental model: BigQuery charges for data read, not data returned. Adding LIMIT 1000 to a query doesn’t reduce costs because BigQuery still scans the entire dataset before applying the limit. Selecting 5 columns instead of 50 from a wide table can cut costs by 90%. The query result size is irrelevant; the scan size determines your bill.

How to Find Your Cost Culprits

Before optimizing anything, identify where money actually goes. This query surfaces your top 10 most expensive queries from the past 30 days:

SELECT
user_email,
query,
total_bytes_processed,
ROUND(total_bytes_processed / POW(1024, 4), 2) AS tib_processed,
ROUND(total_bytes_processed / POW(1024, 4) * 6.25, 2) AS estimated_cost_usd,
creation_time,
total_slot_ms
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
AND job_type = 'QUERY'
AND state = 'DONE'
AND total_bytes_processed > 0
ORDER BY total_bytes_processed DESC
LIMIT 10;

For cost attribution by user or service account, aggregate the data:

SELECT
user_email,
COUNT(*) AS query_count,
ROUND(SUM(total_bytes_processed) / POW(1024, 4), 2) AS total_tib,
ROUND(SUM(total_bytes_processed) / POW(1024, 4) * 6.25, 2) AS total_cost_usd
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
AND job_type = 'QUERY'
AND state = 'DONE'
ORDER BY total_cost_usd DESC
LIMIT 20;

Run these queries weekly. The results often reveal that 3-5 queries or users account for 70%+ of compute spend. Focus optimization efforts there first.


Part 2: The Three Levers That Control 90% of Query Costs

Partitioning: Your First Line of Defense

Partitioning divides a table into segments based on a column’s values, allowing BigQuery to skip irrelevant segments entirely. A well-partitioned table with proper query filters typically reduces bytes scanned by 70-90%.

When to partition: Tables exceeding 100 GB with consistent filtering patterns on a specific column. Google’s recommender only suggests partitioning for tables above this threshold because smaller tables don’t benefit enough to justify the metadata overhead.

Partition types:

  • Time-unit partitioning (DAY, MONTH, YEAR): Best for event data, logs, and transactions where you filter on a timestamp
  • Integer-range partitioning: Useful for ID-based filtering when time isn’t relevant
  • Ingestion-time partitioning: Partitions by load time rather than data content; useful when source data lacks a reliable timestamp

The 10,000 partition limit: BigQuery allows maximum 10,000 partitions per table. Daily partitioning covers about 27 years; hourly partitioning exhausts the limit in about 14 months. Choose granularity based on your retention period and query patterns.

Creating a partitioned table with required partition filters:

CREATE TABLE `project.dataset.events`
(
event_id STRING,
user_id STRING,
event_type STRING,
event_timestamp TIMESTAMP,
properties JSON
)
PARTITION BY DATE(event_timestamp)
CLUSTER BY user_id, event_type
OPTIONS (
require_partition_filter = true
);

The require_partition_filter = true option prevents accidental full-table scans. Queries without a filter on the partition column will fail rather than scan everything.

In dbt, configure partitioning in your model:

{{ config(
materialized='table',
partition_by={
"field": "event_timestamp",
"data_type": "timestamp",
"granularity": "day"
},
require_partition_filter=true
) }}
SELECT
event_id,
user_id,
event_type,
event_timestamp,
properties
FROM {{ source('raw', 'events') }}

Clustering: The Compounding Multiplier

Clustering sorts data within partitions (or across the entire table for non-partitioned tables) based on up to four columns. Unlike partitioning, clustering doesn’t create hard boundaries. It organizes data to minimize bytes scanned when filtering on clustered columns.

A table can be both partitioned and clustered, and this combination delivers the best results. Partitioning provides coarse-grained filtering (skip entire date ranges), while clustering provides fine-grained filtering within those partitions (skip blocks that don’t contain relevant user_ids).

Column order matters significantly. BigQuery prioritizes the first clustering column, then the second within blocks of the first, and so on. Place your most frequently filtered column first.

Minimum table size: Clustering benefits tables larger than 64 MB. Smaller tables fit in a single block anyway, making clustering pointless.

The dry run gotcha: Unlike partitioning, clustering doesn’t provide accurate cost estimates in dry runs. A dry run might estimate 50 GB scanned, but the actual query scans 5 GB thanks to clustering. Conservative estimates shouldn’t discourage you from clustering.

CREATE TABLE `project.dataset.user_events`
(
event_date DATE,
user_id STRING,
event_type STRING,
event_name STRING,
properties JSON
)
PARTITION BY event_date
CLUSTER BY user_id, event_type;

In dbt, combine partitioning and clustering:

{{ config(
materialized='incremental',
partition_by={
"field": "event_date",
"data_type": "date"
},
cluster_by=["user_id", "event_type"],
require_partition_filter=true
) }}

Shopify’s famous cost reduction came entirely from adding clustering. Their query filtered on columns that weren’t clustered, forcing BigQuery to scan 75 GB. After adding clustering on those filter columns, scans dropped to 508 MB: a 150x reduction that prevented $949,000 in monthly costs.

Column Selection: The Forgotten Multiplier

BigQuery’s columnar storage means each column is stored separately. Selecting all 50 columns from a table costs 50x more than selecting a single column. This makes SELECT * one of the most expensive patterns in BigQuery.

Consider a 5 TB table with 10 columns of roughly equal size:

QueryBytes ScannedCost
SELECT *5 TB$31.25
SELECT col1, col21 TB$6.25
SELECT col1500 GB$3.13

The LIMIT myth: Many teams assume LIMIT 1000 reduces costs. BigQuery scans the full dataset first, then applies the limit to the output. This query costs exactly the same whether you limit to 10 rows or 10 million:

-- Both queries scan identical bytes
SELECT * FROM large_table LIMIT 10;
SELECT * FROM large_table LIMIT 10000000;

**Practical alternatives to SELECT ***:

  1. Preview tab: The BigQuery console’s preview tab displays sample data without any query costs (completely free)
  2. TABLESAMPLE: For statistical sampling, SELECT * FROM table TABLESAMPLE SYSTEM (1 PERCENT) scans only 1% of data
  3. Explicit column lists: Always specify columns in production queries

In dbt models, avoid the lazy pattern of selecting everything from base models:

-- Expensive: scans all columns from base model
SELECT * FROM {{ ref('base__analytics__events') }}
-- Better: explicit columns, only what's needed
SELECT
event_id,
user_id,
event_timestamp,
event_type
FROM {{ ref('base__analytics__events') }}

Part 3: Storage Costs (The Changes Most Guides Miss)

Physical vs. Logical Billing

In July 2023, Google introduced physical bytes storage billing as an alternative to the default logical billing. This change remains underexplored in most optimization guides but can deliver 30-50% storage savings.

Logical billing (default): You pay for uncompressed data size at $0.02/GB per month. Time travel and fail-safe storage are included free.

Physical billing: You pay for compressed data at $0.04/GB per month. Time travel (7 days) and fail-safe (7 days) storage are billed separately.

Physical billing wins when your compression ratio exceeds 2:1. Since physical costs 2x per GB but data typically compresses 6-17x, most organizations save significantly by switching.

Query your actual compression ratio before deciding:

SELECT
table_schema,
table_name,
total_logical_bytes / POW(1024, 3) AS logical_gb,
total_physical_bytes / POW(1024, 3) AS physical_gb,
ROUND(total_logical_bytes / total_physical_bytes, 2) AS compression_ratio,
-- If ratio > 2, physical billing saves money
CASE
WHEN total_logical_bytes / total_physical_bytes > 2
THEN 'Switch to physical'
ELSE 'Keep logical'
END AS recommendation
FROM `project.region-us`.INFORMATION_SCHEMA.TABLE_STORAGE
WHERE total_logical_bytes > 0
ORDER BY total_logical_bytes DESC;

Trade-off to consider: Under physical billing, time travel and fail-safe storage appear as separate line items. For tables with frequent updates (which generate time travel overhead), this can offset compression savings.

Long-Term Storage: Free 50% Discount After 90 Days

BigQuery automatically transitions data to long-term storage after 90 consecutive days without modification. The price drops from $0.02/GB to $0.01/GB: a 50% discount requiring zero configuration.

What resets the 90-day timer:

  • Any DML operation (INSERT, UPDATE, DELETE, MERGE)
  • Streaming inserts
  • CREATE OR REPLACE TABLE

What does NOT reset the timer:

  • Querying the table
  • Creating views on the table
  • Exporting data
  • Copying the table

For partitioned tables, each partition evaluates independently. A table with 365 daily partitions will have ~275 partitions at long-term pricing and ~90 at active pricing, assuming only recent data receives writes.

Table Expiration: Preventing Storage Creep

Temporary tables, ETL artifacts, and development experiments accumulate silently. One startup discovered they were paying $3,000 monthly for temporary tables nobody queried in months.

Set dataset-level defaults:

ALTER SCHEMA `project.staging`
SET OPTIONS (
default_table_expiration_days = 7
);

Override at table level when needed:

CREATE TABLE `project.staging.temp_analysis`
OPTIONS (
expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
) AS
SELECT * FROM source_table WHERE condition;

In dbt, use the hours_to_expiration config for temporary models:

{{ config(
materialized='table',
hours_to_expiration=168 -- 7 days
) }}

Part 4: When to Switch from On-Demand to Slots

The Break-Even Math

BigQuery’s Editions pricing (introduced July 2023) replaced legacy flat-rate with three tiers:

EditionCost per Slot-HourBest For
Standard$0.04Dev/test, ad-hoc
Enterprise$0.06Production, ML workloads
Enterprise Plus$0.10Regulated industries, DR

100 slots running continuously on Standard Edition PAYG costs $2,920 monthly. At $6.25 per TiB on-demand, that’s equivalent to processing 467 TB. If you process more than 400-500 TB monthly with consistent patterns, slots likely save money.

But the calculation shifts dramatically for burst workloads. Processing 20 TB in under an hour:

  • On-demand: $125 (20 × $6.25)
  • 100 slots for 1 hour (Standard): ~$5

That’s 95% savings for burst scenarios, making slots attractive even at lower monthly volumes if workloads concentrate into windows.

Autoscaling Strategy

Autoscaling became generally available in February 2025, enabling dynamic slot allocation that scales from zero to your configured maximum.

Key configuration decisions:

Baseline slots: Set to zero for workloads with idle periods. Slots scale from zero instantly with no warmup penalty. Only pay for what you use.

Maximum slots: Your ceiling during peak demand. Start conservative and increase based on observed wait times.

Target utilization: Aim for 60-80% during peak hours. Consistently higher indicates under-provisioning (queries queuing); consistently lower suggests over-provisioning (paying for unused capacity).

-- Create a reservation with autoscaling
CREATE RESERVATION `project.region-us.prod_reservation`
OPTIONS (
slot_capacity = 0, -- baseline
edition = 'ENTERPRISE',
autoscale = (
max_slots = 500
)
);
-- Create an assignment for your project
CREATE ASSIGNMENT `project.region-us.prod_reservation.prod_assignment`
OPTIONS (
assignee = 'projects/my-project',
job_type = 'QUERY'
);

Committed Use Discounts (New in 2025)

Google Cloud Next 2025 introduced spend-based committed use discounts for BigQuery:

  • 1-year commitment: 10% discount over PAYG
  • 3-year commitment: 20% discount over PAYG

Unlike slot-based commitments, these are dollar-denominated and flexible across BigQuery Editions, Cloud Composer 3, and Data Governance products. This simplifies capacity planning: commit to a monthly spend level rather than guessing slot counts.

For predictable workloads, combine commitment for baseline capacity with autoscaling for bursts. You capture savings on predictable usage while maintaining flexibility for spikes.

Edition Feature Comparison

Beyond price, editions differ in capabilities:

Standard Edition:

  • Maximum 1,600 slots
  • No BigQuery ML
  • No materialized views with automatic refresh
  • Best for: Development, ad-hoc analysis, cost-sensitive workloads

Enterprise Edition:

  • Unlimited slots (with appropriate quota)
  • BigQuery ML included
  • BI Engine acceleration
  • Full-text search
  • 99.99% SLA
  • Best for: Production workloads, ML pipelines

Enterprise Plus:

  • Everything in Enterprise
  • Cross-region disaster recovery
  • Compliance certifications (FedRAMP, CJIS, IL4)
  • Best for: Regulated industries, critical workloads

Important note: On-demand pricing maintains feature parity with Enterprise Plus (except continuous queries). If you need compliance features but have unpredictable usage, on-demand may be more cost-effective than Enterprise Plus slots.


Part 5: dbt-Specific Optimizations

Most guides treat dbt and BigQuery cost optimization separately. Yet real customer results demonstrate the impact: Bilt Rewards cut $20,000 monthly, Enpal reduced data spend by 70%, and Symend decreased daily warehouse usage by 70% through proper dbt configurations.

Materialization Impact on Costs

Your materialization choice determines baseline costs for every model:

Table: One-time full scan during build, then storage costs. Every downstream query reads from the materialized table for free (no recomputation).

View: Zero build cost, zero storage, but full upstream scan on every query. This becomes a cost trap for complex transformations or models queried frequently.

Ephemeral: No storage, interpolated as a CTE. CTEs referenced multiple times execute once per reference, potentially doubling costs. Ephemeral models are also difficult to debug since they don’t exist as queryable objects.

The view chain anti-pattern: dbt_project_evaluator flags chains of 4+ views as problematic. Nothing computes until a query reaches a table, triggering cascading execution of all upstream view logic. Five chained views don’t cost 5x; they can cost 5x on every single downstream query.

Given that storage costs are negligible compared to compute (Part 1), defaulting to tables makes sense:

dbt_project.yml
models:
my_project:
base:
+materialized: table
intermediate:
+materialized: table
marts:
+materialized: table
+partition_by:
field: created_date
data_type: date

Views can work for truly simple transformations that are rarely queried, but tables eliminate the risk of cascading compute costs.

Incremental Models: The Biggest Win

Full table refreshes scan and rewrite entire tables on every run. For a 500 GB table running daily, that’s $3.13 per run, $94 monthly, just for the rebuild before any queries.

Incremental models process only new or changed data. The savings scale with table size and run frequency.

The default strategy trap: BigQuery’s default incremental strategy is merge, which scans the entire destination table on every run to identify matching rows. For large tables, this defeats the purpose.

Switch to insert_overwrite for partitioned tables:

{{ config(
materialized='incremental',
incremental_strategy='insert_overwrite',
partition_by={
"field": "event_date",
"data_type": "date"
},
cluster_by=["user_id", "event_type"]
) }}
SELECT
event_date,
user_id,
event_type,
COUNT(*) as event_count
FROM {{ source('raw', 'events') }}
WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)
GROUP BY 1, 2, 3

When using insert_overwrite, add copy_partitions: true to use BigQuery’s Copy Table API instead of INSERT statements:

{{ config(
materialized='incremental',
incremental_strategy='insert_overwrite',
partition_by={"field": "event_date", "data_type": "date"},
partitions=["date_sub(current_date, interval 1 day)", "current_date"],
copy_partitions=true
) }}

The is_incremental() trap: The common dbt pattern for incremental models still causes full source table scans:

-- This pattern scans the entire source table!
{% if is_incremental() %}
WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}

The subquery SELECT MAX(event_timestamp) FROM {{ this }} doesn’t help BigQuery prune partitions in the source table. Use static partition values or the _dbt_max_partition variable instead:

{{ config(
materialized='incremental',
incremental_strategy='insert_overwrite',
partition_by={"field": "event_date", "data_type": "date"}
) }}
SELECT *
FROM {{ source('raw', 'events') }}
{% if is_incremental() %}
-- Static partition reference enables pruning
WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)
{% endif %}

Or use dbt’s built-in partition variable:

{% if is_incremental() %}
WHERE event_date >= DATE('{{ var("_dbt_max_partition") }}')
{% endif %}

Cost Attribution with dbt

Tracking costs per model requires tagging queries with metadata. Configure dbt to add job labels:

dbt_project.yml
query-comment:
comment: "dbt: {{ node.unique_id }}"
append: true
# Or more detailed:
query-comment:
comment: |
{
"dbt_model": "{{ node.unique_id }}",
"dbt_invocation_id": "{{ invocation_id }}",
"dbt_target": "{{ target.name }}"
}
append: true

Then query costs by model:

SELECT
JSON_EXTRACT_SCALAR(query, '$.dbt_model') AS dbt_model,
COUNT(*) AS runs,
ROUND(SUM(total_bytes_processed) / POW(1024, 4), 4) AS total_tib,
ROUND(SUM(total_bytes_processed) / POW(1024, 4) * 6.25, 2) AS total_cost_usd
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND query LIKE '%dbt_model%'
GROUP BY 1
ORDER BY total_cost_usd DESC;

For comprehensive monitoring, the dbt-bigquery-monitoring community package automates this pattern across projects and regions.


Part 6: The Expensive Mistakes

Community forums reveal patterns in expensive BigQuery mistakes that differ from what optimization guides typically cover. A Hacker News thread on BigQuery costs surfaced practitioners sharing stories of $8,000-$20,000 single-month surprises.

The LIMIT Clause Myth

This misconception causes the most unexpected bills. Testing confirms: SELECT * FROM table LIMIT 100 scans the entire table. LIMIT only affects output size, not bytes processed.

Google’s own example: if 10 users run SELECT * queries 10 times monthly on a 10 TB table, compute costs jump from $200 (storage only) to over $5,000 monthly.

Use the preview tab (completely free), enforce partition filters, or select only needed columns. Never use LIMIT as a cost control mechanism.

CTE Double-Counting

BigQuery does not materialize non-recursive CTEs. Each reference re-executes the underlying scan. A CTE referenced twice costs twice as much.

-- This scans the base table TWICE
WITH user_aggregates AS (
SELECT user_id, complex_aggregation
FROM base__analytics__user_events
GROUP BY user_id
)
SELECT a.user_id, a.complex_aggregation, b.complex_aggregation
FROM user_aggregates a
JOIN user_aggregates b ON a.user_id = b.user_id; -- Second scan!

Materialize expensive CTEs to temporary tables:

-- Materialize to temp table (scans once)
CREATE TEMP TABLE user_aggregates AS
SELECT user_id, complex_aggregation
FROM base__analytics__user_events
GROUP BY user_id;
-- Now both references read from temp table (cheap)
SELECT a.user_id, a.complex_aggregation, b.complex_aggregation
FROM user_aggregates a
JOIN user_aggregates b ON a.user_id = b.user_id;

Streaming vs. Batch Loading

Streaming inserts cost $0.01 per 200 MB with a 1 KB minimum per row. Batch loading using BigQuery’s shared slot pool is completely free.

For a table receiving 10 GB of data daily:

  • Streaming: $0.50/day, $15/month
  • Batch loading: $0/month

One practitioner documented cutting 95% of BigQuery costs by switching from streaming to file loads with micro-batching.

When streaming is justified:

  • True real-time requirements (sub-minute latency)
  • Data that loses value after minutes, not hours
  • Systems where batch orchestration adds unacceptable complexity

When to use batch instead:

  • Latency tolerance of 15+ minutes
  • Large volume ingestion (streaming costs scale linearly)
  • Data warehouse refresh patterns (hourly, daily)

Missing Partition Filters

The same query on the same table can cost 17x more depending on which column you filter:

-- Scans only relevant partitions: 884 MB
SELECT * FROM events
WHERE event_date = '2024-01-15';
-- Scans entire table: 15 GB (17x more expensive!)
SELECT * FROM events
WHERE DATE(event_timestamp) = '2024-01-15';

The second query applies a function to the partition column, preventing BigQuery from pruning partitions.

Common filter patterns that break pruning:

-- BAD: Function on partition column
WHERE DATE(created_timestamp) = '2024-01-15'
WHERE EXTRACT(YEAR FROM event_date) = 2024
-- GOOD: Direct comparison
WHERE created_timestamp BETWEEN '2024-01-15' AND '2024-01-16'
WHERE event_date = '2024-01-15'
-- BAD: OR with non-partition column
WHERE event_date = '2024-01-15' OR user_id = 'abc'
-- GOOD: Separate queries or restructure logic
WHERE event_date = '2024-01-15'

Self-Joins

Google’s documentation explicitly warns: self-joins “can square the number of output rows,” making costs explode unpredictably.

-- Self-join: potentially scans table twice, output rows can explode
SELECT
a.user_id,
a.event_timestamp AS first_event,
b.event_timestamp AS next_event
FROM events a
JOIN events b
ON a.user_id = b.user_id
AND b.event_timestamp > a.event_timestamp;

Replace with window functions:

-- Window function: single scan, predictable output
SELECT
user_id,
event_timestamp AS first_event,
LEAD(event_timestamp) OVER (
PARTITION BY user_id
ORDER BY event_timestamp
) AS next_event
FROM events;

Window functions (LEAD, LAG, ROW_NUMBER, RANK) accomplish most self-join use cases with a single table scan.


Part 7: Governance and Guardrails

For organizations with ad-hoc analytics users, governance controls may deliver more savings than technical optimizations. A single analyst running unfiltered SELECT * queries can generate bills exceeding months of engineering effort on partitioning.

Query-Level Controls

max_bytes_billed sets a hard limit per query. Queries that would exceed this limit fail before scanning data:

-- Query fails if it would scan more than 10 GB
SELECT * FROM large_table
OPTIONS (max_bytes_billed = 10737418240); -- 10 GB in bytes

In dbt, configure in your profile:

profiles.yml
my_project:
target: prod
outputs:
prod:
type: bigquery
method: oauth
project: my-project
dataset: production
maximum_bytes_billed: 107374182400 # 100 GB

For dbt runs, this prevents any single model from scanning more than the specified limit. Useful for catching misconfigured incremental models that accidentally run full refreshes.

Project-Level Quotas

As of September 2025, new BigQuery projects default to a 200 TiB daily quota. For existing projects, configure custom quotas through the Admin SDK or Cloud Console.

Consider implementing:

  • Per-user daily limits: Prevent individual analysts from monopolizing budget
  • Per-project limits: Cap spend on development vs. production projects
  • Alert thresholds: Notify before hitting hard limits

Access Patterns That Prevent Expensive Mistakes

Service accounts per integration: Create separate service accounts for dbt, Looker, Fivetran, etc. Query INFORMATION_SCHEMA by service account email to attribute costs per system.

Authorized views for SELECT * protection: Instead of granting direct table access, create views that expose only necessary columns:

CREATE VIEW `project.views.user_summary` AS
SELECT user_id, signup_date, user_type
FROM `project.raw.users`;
-- Grant view access, not table access
GRANT roles/bigquery.dataViewer
ON `project.views.user_summary`
TO 'user:analyst@company.com';

Reservation assignments for capacity isolation: Assign different teams to separate reservations. This prevents one team’s heavy workload from consuming another team’s slots.


Part 8: Monitoring and Continuous Optimization

The Queries You Should Run Weekly

Top 10 expensive queries this week:

SELECT
user_email,
SUBSTR(query, 1, 200) AS query_preview,
ROUND(total_bytes_processed / POW(1024, 4), 4) AS tib_scanned,
ROUND(total_bytes_processed / POW(1024, 4) * 6.25, 2) AS cost_usd,
creation_time
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND job_type = 'QUERY'
AND total_bytes_processed > 0
ORDER BY total_bytes_processed DESC
LIMIT 10;

Cost trend by dataset:

SELECT
DATE(creation_time) AS date,
referenced_table.dataset_id,
ROUND(SUM(total_bytes_processed) / POW(1024, 4), 2) AS tib_scanned,
ROUND(SUM(total_bytes_processed) / POW(1024, 4) * 6.25, 2) AS cost_usd
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT,
UNNEST(referenced_tables) AS referenced_table
WHERE
creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
AND job_type = 'QUERY'
GROUP BY 1, 2
ORDER BY date DESC, cost_usd DESC;

Tables missing partitioning or clustering (optimization candidates):

SELECT
table_schema,
table_name,
ROUND(total_logical_bytes / POW(1024, 3), 2) AS size_gb,
CASE WHEN partition_column IS NULL THEN 'Missing' ELSE 'OK' END AS partitioning,
CASE WHEN clustering_columns IS NULL THEN 'Missing' ELSE 'OK' END AS clustering
FROM `project.region-us`.INFORMATION_SCHEMA.TABLE_STORAGE
LEFT JOIN `project.region-us`.INFORMATION_SCHEMA.PARTITIONS USING (table_schema, table_name)
LEFT JOIN (
SELECT table_schema, table_name, STRING_AGG(clustering_ordinal_position) AS clustering_columns
FROM `project.region-us`.INFORMATION_SCHEMA.COLUMNS
WHERE clustering_ordinal_position IS NOT NULL
GROUP BY 1, 2
) USING (table_schema, table_name)
WHERE total_logical_bytes > 107374182400 -- > 100 GB
ORDER BY total_logical_bytes DESC;

Leveraging Google’s Built-In Recommendations

BigQuery’s recommender automatically suggests partitioning and clustering for tables that would benefit. Access recommendations through:

  • Cloud Console → BigQuery → Table details → Recommendations tab
  • INFORMATION_SCHEMA.TABLE_OPTIONS for programmatic access
  • Recommender API for automated processing

The recommender analyzes query patterns and suggests columns for partitioning/clustering based on actual filter usage, not guesses. Google’s cost optimization best practices documentation covers additional techniques.

Building a Cost Dashboard

Key metrics to track:

  1. Daily compute spend: Catch anomalies before they compound
  2. Bytes scanned trend: Leading indicator of cost changes
  3. Slot utilization (if using reservations): Identify under/over provisioning
  4. Cost per query average: Track optimization progress over time
  5. Top cost contributors: Users, service accounts, datasets, models

Set alert thresholds at:

  • 50% of daily budget: Warning notification
  • 80% of daily budget: Critical notification to team lead
  • 100%: Auto-triggered investigation process

Conclusion: The Optimization Hierarchy

A clear hierarchy should guide your optimization priorities:

Priority 1: Query Patterns (80%+ impact)

  • Column selection: Stop using SELECT *
  • Partition filters: Always filter on partition columns
  • Clustering: Add to tables over 64 MB that you filter frequently

These changes take minutes to implement and capture the majority of savings.

Priority 2: dbt Configurations (70% savings potential)

  • Switch to incremental models for large, append-heavy tables
  • Use insert_overwrite strategy with copy_partitions: true
  • Fix is_incremental() patterns that cause full scans

Priority 3: Governance (prevents surprise bills)

  • Set max_bytes_billed on all projects
  • Implement daily quotas for ad-hoc users
  • Create authorized views instead of direct table access

Priority 4: Pricing Model (20-40% savings)

  • Evaluate slots if processing > 400 TB monthly or burst-heavy workloads
  • Configure autoscaling with zero baseline for variable workloads
  • Layer committed use discounts for predictable baseline

Priority 5: Storage (10-30% savings)

  • Evaluate physical vs. logical billing using compression ratio query
  • Set expiration policies on temporary tables
  • Let long-term storage discounts work automatically

Start Here

Run the INFORMATION_SCHEMA query from Part 1 to find your top 5 most expensive queries. For each one, check:

  1. Does it use SELECT * when it doesn’t need all columns?
  2. Does it filter on the partition column?
  3. Is the table clustered on the columns used in WHERE clauses?

Fixing those three issues on your top 5 queries likely captures most of your available savings. Everything else is optimization at the margin.


Appendix: Quick Reference

Pricing Cheat Sheet (as of 2026)

ItemCost
On-demand query$6.25/TiB
Active storage (logical)$0.02/GB/month
Long-term storage (logical)$0.01/GB/month
Physical storage$0.04/GB/month
Streaming inserts$0.01/200 MB
Standard Edition slots$0.04/slot-hour
Enterprise Edition slots$0.06/slot-hour
Enterprise Plus slots$0.10/slot-hour
BI Engine$0.0416/GB/hour

Key Thresholds

DecisionThreshold
Consider partitioningTable > 100 GB
Consider clusteringTable > 64 MB
Consider slots over on-demand> 400-500 TB/month or burst workloads
Physical billing winsCompression ratio > 2:1
Long-term storage kicks in90 days without modification

dbt Config Quick Reference

# Optimized incremental model for BigQuery
{{ config(
materialized='incremental',
incremental_strategy='insert_overwrite',
partition_by={
"field": "event_date",
"data_type": "date",
"granularity": "day"
},
cluster_by=["user_id", "event_type"],
require_partition_filter=true,
copy_partitions=true
) }}