ServicesAboutNotesContact Get in touch →
EN FR
Note

Baseline vs. Autoscaling Slots in BigQuery

How baseline and autoscaling slots work in BigQuery Editions -- guaranteed capacity vs. elastic scaling, the 60-second autoscale window, and slot usage priority.

Planted
bigquerycost optimization

Within Enterprise and Enterprise Plus editions, reservations have two types of capacity: baseline and autoscaling. Baseline is always allocated; autoscaling is elastic and best-effort.

Baseline Slots

Baseline slots are always allocated to your reservation. They’re guaranteed and immediately available, and you pay for them whether you use them or not.

Think of baseline as your “always-on” capacity. If your production dbt run needs 200 slots every morning at 6 AM, baseline ensures those slots are ready and waiting. No startup delay, no risk of regional capacity constraints, no uncertainty.

The guarantee is real: baseline slots can’t be preempted, can’t be delayed by regional demand, and can’t fail to allocate. They’re yours. This is what differentiates capacity-based pricing from on-demand — you’re buying certainty.

The downside is equally real: you pay for baseline slots 24/7, even at 3 AM when nothing is running. This makes baseline expensive for workloads that run only a few hours per day. A reservation with 500 baseline slots costs the same whether those slots process queries for 20 hours or 2 hours.

Autoscaling Slots

Autoscaling slots are allocated on-demand as your workload grows. You set a maximum, and BigQuery scales up (and down) automatically.

Key behaviors:

  • Scales in multiples of 50 slots. If you need 75 slots, you get 100.
  • Scales up nearly instantly. The delay is negligible for most workloads.
  • Subject to regional capacity availability. During rare regional constraints, autoscale requests might be delayed. This is the key difference from baseline — autoscaling is best-effort.
  • Billed per second while allocated.

Autoscaling is ideal for burst workloads. Your dbt full refresh that runs weekly and needs 800 slots for 2 hours? Autoscaling handles that without paying for 800 slots the other 166 hours of the week.

The 60-Second Autoscale Window

An important detail that catches many teams: when BigQuery scales up, it keeps those slots allocated for at least 60 seconds, even if your query finishes in 5 seconds.

Why? To prevent thrashing. Without this window, a burst of short queries would cause constant scale-up/scale-down cycles, hurting performance and complicating billing.

Example timeline:

12:00:00 - Query needs 100 slots → Scales to 100
12:00:05 - Query completes → Still at 100 (window active)
12:01:00 - Window expires, no new demand → Scales to 0
12:01:02 - New query needs 50 slots → Scales to 50
12:01:03 - Query completes → Still at 50 (new window)
12:02:03 - Window expires → Scales to 0

This has real implications for dbt, which runs many sequential queries. Each query that triggers autoscaling starts a new 60-second window. A dbt run with 200 models that each take 5 seconds still holds autoscaled slots for the full 60 seconds after each scaling event. The autoscaling 1.5x cost multiplier accounts for this overhead.

The window also means that tightly sequential workloads (one query immediately after another) effectively keep autoscaled slots allocated continuously, which can approximate the cost of baseline slots. If you’re consistently using autoscaled capacity for hours at a time, you’re often better off converting that to baseline.

Slot Usage Priority

When a job runs, BigQuery allocates capacity in this order:

  1. Baseline slots (guaranteed, used first)
  2. Idle slots from other reservations (if sharing is enabled)
  3. Autoscaling slots (if baseline + idle are exhausted)
  4. Queue (if all capacity sources are exhausted)
flowchart TB
Q[Query Submitted] --> B{Baseline<br/>Available?}
B -->|Yes| UB[Use Baseline]
B -->|No| I{Idle Slots<br/>Available?}
I -->|Yes| UI[Borrow Idle]
I -->|No| A{Autoscale<br/>Room?}
A -->|Yes| UA[Scale Up]
A -->|No| W[Queue Work]
UB --> RUN[Execute]
UI --> RUN
UA --> RUN
W --> WAIT[Wait for Slots]

This priority order matters because it affects both cost and reliability. Baseline is cheapest (you’re already paying for it). Idle slots are free to the borrower (the owning reservation already pays). Autoscaling is the most expensive per-slot-hour. Queuing means your query waits.

The implication: size your baseline to cover your steady-state needs, let idle sharing absorb moderate spikes, and use autoscaling as a safety valve for genuine peaks.

When to Use Baseline

Set a baseline when:

  • You have predictable, steady workloads that run at consistent times
  • Jobs are SLA-critical and can’t wait for autoscale or risk regional capacity constraints
  • You’re combining with commitments for the 20-40% discount (commitments apply to baseline slots first)

A common pattern: analyze your slot usage over 30 days, identify your P50 (median) usage, and set that as baseline. Autoscaling handles everything above the median.

This approach ensures that half your workload runs on guaranteed, discounted capacity, while the burstier half uses elastic scaling. As your workloads stabilize and you gain confidence in the pattern, you can gradually increase baseline to capture more of the committed discount.

When Autoscaling Alone Is Enough

Pure autoscaling (zero baseline) works when:

  • Workloads are highly variable with no consistent pattern
  • You’re in the exploration phase and don’t yet know your steady-state needs
  • The workload only runs during specific windows (a few hours per day/week)
  • You’re using Standard Edition, which only supports autoscaling

Standard Edition’s autoscale-only model is actually well-suited for development and light workloads precisely because you pay nothing when idle. The trade-off is the 1,600 slot cap and no commitment discounts.

The “Reservations Guarantee Capacity” Misconception

This is partially true and commonly misunderstood. Baseline slots are genuinely guaranteed — they’re always allocated to your reservation. But autoscaling slots are best-effort, subject to regional capacity. During rare regional constraints, autoscale requests might be delayed.

If you need guaranteed capacity, put it in your baseline. Don’t assume autoscaling will always deliver the slots you request instantly. It usually does, but “usually” isn’t the same as “guaranteed” when your SLA depends on it.