ServicesAboutNotesContact Get in touch →
EN FR
Note

BigQuery Cost Attribution with INFORMATION_SCHEMA

Using INFORMATION_SCHEMA queries to find expensive queries, attribute costs by user and dataset, identify unoptimized tables, and build a weekly cost review practice.

Planted
bigquerygcpcost optimizationdata engineering

BigQuery’s INFORMATION_SCHEMA views expose query costs, user spend, table partitioning status, and cost trends over time. This note covers the key queries for cost attribution and table optimization review.

Finding Your Cost Culprits

Before optimizing anything, identify where money actually goes. This query surfaces your top 10 most expensive queries from the past 30 days:

SELECT
user_email,
query,
total_bytes_processed,
ROUND(total_bytes_processed / POW(1024, 4), 2) AS tib_processed,
ROUND(total_bytes_processed / POW(1024, 4) * 6.25, 2) AS estimated_cost_usd,
creation_time,
total_slot_ms
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
AND job_type = 'QUERY'
AND state = 'DONE'
AND total_bytes_processed > 0
ORDER BY total_bytes_processed DESC
LIMIT 10;

In most projects, 3-5 queries or users account for 70%+ of compute spend. Optimize these before addressing the long tail.

Cost Attribution by User

For ongoing cost management, aggregate by user or service account:

SELECT
user_email,
COUNT(*) AS query_count,
ROUND(SUM(total_bytes_processed) / POW(1024, 4), 2) AS total_tib,
ROUND(SUM(total_bytes_processed) / POW(1024, 4) * 6.25, 2) AS total_cost_usd
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
AND job_type = 'QUERY'
AND state = 'DONE'
ORDER BY total_cost_usd DESC
LIMIT 20;

This immediately tells you whether costs are driven by dbt service accounts (pipeline costs, optimizable through materializations and incrementals), BI tool service accounts (dashboard costs, optimizable through partitioning and materialized views), or human users (ad-hoc costs, addressable through governance controls).

Cost Attribution by dbt Model

If your dbt project is configured with query comments and job labels, you can attribute costs to individual models:

SELECT
JSON_EXTRACT_SCALAR(query, '$.dbt_model') AS dbt_model,
COUNT(*) AS runs,
ROUND(SUM(total_bytes_processed) / POW(1024, 4), 4) AS total_tib,
ROUND(SUM(total_bytes_processed) / POW(1024, 4) * 6.25, 2) AS total_cost_usd
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND query LIKE '%dbt_model%'
GROUP BY 1
ORDER BY total_cost_usd DESC;

This query reveals which models consume the most resources. This reveals which models consume the most resources.

For more comprehensive dbt cost monitoring, the dbt-bigquery-monitoring community package automates this pattern across projects and regions.

Cost Trend by Dataset

Track how costs evolve over time to catch regressions early:

SELECT
DATE(creation_time) AS date,
referenced_table.dataset_id,
ROUND(SUM(total_bytes_processed) / POW(1024, 4), 2) AS tib_scanned,
ROUND(SUM(total_bytes_processed) / POW(1024, 4) * 6.25, 2) AS cost_usd
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT,
UNNEST(referenced_tables) AS referenced_table
WHERE
creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
AND job_type = 'QUERY'
GROUP BY 1, 2
ORDER BY date DESC, cost_usd DESC;

A sudden spike in one dataset’s cost often means someone deployed a new model without partitioning, changed a materialization from incremental to full-refresh, or added an expensive new query. Catching this within a day is the difference between a $50 mistake and a $1,500 month.

Finding Unoptimized Tables

Large tables without partitioning or clustering are optimization candidates hiding in plain sight:

SELECT
table_schema,
table_name,
ROUND(total_logical_bytes / POW(1024, 3), 2) AS size_gb,
CASE WHEN partition_column IS NULL THEN 'Missing' ELSE 'OK' END AS partitioning,
CASE WHEN clustering_columns IS NULL THEN 'Missing' ELSE 'OK' END AS clustering
FROM `project.region-us`.INFORMATION_SCHEMA.TABLE_STORAGE
LEFT JOIN `project.region-us`.INFORMATION_SCHEMA.PARTITIONS USING (table_schema, table_name)
LEFT JOIN (
SELECT table_schema, table_name, STRING_AGG(clustering_ordinal_position) AS clustering_columns
FROM `project.region-us`.INFORMATION_SCHEMA.COLUMNS
WHERE clustering_ordinal_position IS NOT NULL
GROUP BY 1, 2
) USING (table_schema, table_name)
WHERE total_logical_bytes > 107374182400 -- > 100 GB
ORDER BY total_logical_bytes DESC;

Any table over 100 GB showing “Missing” for partitioning is likely costing you significant money on every query. Cross-reference with the top expensive queries — if an expensive query hits an unpartitioned table, you’ve found a quick win.

Using Google’s Built-In Recommendations

BigQuery’s recommender automatically suggests partitioning and clustering for tables that would benefit. It analyzes query patterns and suggests columns based on actual filter usage, not guesses. Access recommendations through:

  • Cloud Console: BigQuery > Table details > Recommendations tab
  • INFORMATION_SCHEMA.TABLE_OPTIONS for programmatic access
  • Recommender API for automated processing

The recommender only flags tables above 100 GB for partitioning, which aligns with Google’s own threshold for when partitioning overhead is justified.

Building a Weekly Review Practice

The queries above are most valuable when run regularly. A weekly cost review takes 15 minutes and catches problems before they compound into surprise bills.

Weekly checklist:

  1. Top 10 expensive queries: Any new additions? Any that could be optimized with better filters?
  2. Cost by user/service account: Any unexpected spikes? New service accounts appearing?
  3. Cost trend by dataset: Any datasets trending upward? Any regressions from recent deployments?
  4. Unoptimized tables: Any new large tables created without partitioning?

Key metrics for a cost dashboard:

  1. Daily compute spend: Catch anomalies before they compound
  2. Bytes scanned trend: Leading indicator of cost changes
  3. Slot utilization (if using reservations): Identify under/over provisioning
  4. Cost per query average: Track optimization progress over time
  5. Top cost contributors: Users, service accounts, datasets, models

Alert thresholds:

  • 50% of daily budget: Warning notification
  • 80% of daily budget: Critical notification to team lead
  • 100%: Auto-triggered investigation process

Weekly review catches cost regressions close to deployment, when the cause is easier to identify and fix.