BigQuery Slot Usage Monitoring

Slot allocation decisions require understanding current usage. BigQuery provides observability through INFORMATION_SCHEMA views, a built-in Slot Estimator, and Cloud Monitoring integration.

Key INFORMATION_SCHEMA Tables

These are your primary tools for understanding slot consumption:

Table	What It Shows
`JOBS_BY_PROJECT`	Per-job metrics: slots, bytes, runtime
`JOBS_BY_ORGANIZATION`	Same, but across all projects (requires org permissions)
`JOBS_TIMELINE`	Second-by-second slot consumption
`RESERVATION_CHANGES_BY_PROJECT`	Reservation modifications
`RESERVATIONS_TIMELINE`	Reservation capacity over time

The JOBS_TIMELINE view is particularly valuable for capacity planning because it shows your actual concurrency pattern — not just aggregate slot usage, but how it changes second by second throughout the day.

Finding Your Most Slot-Intensive Jobs

This query shows your most slot-intensive jobs over the past 7 days:

SELECT
  job_id,
  user_email,
  SAFE_DIVIDE(total_slot_ms,
    TIMESTAMP_DIFF(end_time, start_time, MILLISECOND)) AS avg_slots,
  TIMESTAMP_DIFF(end_time, start_time, SECOND) AS runtime_seconds,
  total_bytes_processed / POW(10, 9) AS gb_processed,
  reservation_id
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
  AND job_type = 'QUERY'
  AND state = 'DONE'
ORDER BY total_slot_ms DESC
LIMIT 25;

The avg_slots calculation (total slot-milliseconds divided by wall-clock time) gives you the average number of slots used by each job. This is the number you need for capacity planning. A job with an avg_slots of 500 running for 30 minutes needs a reservation that can provide at least 500 slots during that window.

The reservation_id column tells you which reservation served the job. If it’s null, the job ran on on-demand pricing. This helps you audit whether jobs are landing in the correct reservations.

Hourly Slot Utilization

See how your slot usage varies throughout the day:

SELECT
  TIMESTAMP_TRUNC(period_start, HOUR) AS hour,
  SUM(period_slot_ms) / 1000 / 3600 AS slot_hours
FROM `region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
WHERE period_start > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
  AND (statement_type != 'SCRIPT' OR statement_type IS NULL)
GROUP BY hour
ORDER BY hour;

This query reveals your daily usage pattern. Most teams see clear peaks during business hours and troughs overnight. The difference between peak and trough determines how much baseline vs. autoscaling capacity you need.

If your usage is relatively flat (e.g., ETL running 24/7), baseline-heavy allocation with minimal autoscaling is cost-efficient. If usage spikes 5x during business hours, a lower baseline with generous autoscaling headroom makes more sense.

Quick Slot Usage Summary

A quick overview of your recent slot consumption:

SELECT
  DATE(creation_time) AS date,
  COUNT(*) AS job_count,
  SUM(total_slot_ms) / 1000 / 60 / 60 AS total_slot_hours,
  AVG(SAFE_DIVIDE(total_slot_ms,
    TIMESTAMP_DIFF(end_time, start_time, MILLISECOND))) AS avg_slots_per_job
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
  AND job_type = 'QUERY'
GROUP BY date
ORDER BY date;

Look at total_slot_hours trending over days. If it’s growing steadily, your workload is expanding and you should plan capacity accordingly. The avg_slots_per_job metric tells you whether you’re running many small queries (low average) or fewer large queries (high average) — this shapes your autoscaling configuration.

The Slot Estimator Tool

The Slot Estimator is useful for capacity planning before committing to a purchase.

To access it:

Go to BigQuery Console
Click Capacity management in the left nav
Select the Slot estimator tab

What it shows:

Slot utilization over the past 30 days
Peak usage periods
Job latency percentiles (P90, P95, etc.)
Performance impact modeling (“what if you added 200 slots?”)
Cost-optimal recommendations

The performance impact modeling allows simulation of adding or removing slots to project the effect on job latency. It is based on 30 days of historical usage. Use it before purchasing commitments or adjusting baselines.

Cloud Monitoring Integration

For ongoing, real-time observability, BigQuery exposes metrics to Cloud Monitoring:

bigquery.googleapis.com/slots/allocated — how many slots are currently in use
bigquery.googleapis.com/slots/available — how many slots are available in the reservation
bigquery.googleapis.com/job/num_in_flight — how many jobs are currently running

With these metrics, you can:

Build dashboards showing real-time slot usage alongside reservation capacity
Set up alerts for slot contention (e.g., when allocated approaches available for more than 5 minutes)
Track utilization trends over weeks and months for capacity planning

A practical alert setup: trigger a warning when slots/allocated exceeds 80% of slots/available for more than 10 minutes during business hours. This gives you early notice of contention before it impacts query performance.

Monitoring for dbt Specifically

When running dbt on BigQuery, combine slot monitoring with dbt’s job labels to understand which models drive slot consumption:

SELECT
  (SELECT value FROM UNNEST(labels) WHERE key = 'dbt_model') AS model,
  COUNT(*) AS runs,
  AVG(SAFE_DIVIDE(total_slot_ms,
    TIMESTAMP_DIFF(end_time, start_time, MILLISECOND))) AS avg_slots,
  MAX(SAFE_DIVIDE(total_slot_ms,
    TIMESTAMP_DIFF(end_time, start_time, MILLISECOND))) AS peak_slots,
  SUM(total_slot_ms) / 1000 / 60 AS total_slot_minutes
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
  AND (SELECT value FROM UNNEST(labels) WHERE key = 'dbt_model') IS NOT NULL
GROUP BY model
ORDER BY total_slot_minutes DESC
LIMIT 20;

The peak_slots column reveals which models have the highest burst demand. If one model peaks at 800 slots while everything else stays under 200, that model is your autoscaling driver. Optimizing it (better partitioning, incremental instead of full refresh, more efficient SQL) might let you reduce your autoscaling maximum.

Run these queries before making capacity changes and again afterward to measure the impact.