BigQuery Partitioning Mechanics

Partitioning physically divides a BigQuery table into segments based on a single column’s values. When a query filters on the partition column, BigQuery performs partition pruning — skipping partitions that don’t match. Skipped bytes are not scanned and not billed.

Unlike clustering, partitioning creates hard physical boundaries: a partition is either scanned entirely or skipped entirely. This binary behavior makes cost predictable — the bytes scanned can be determined before the query runs.

The Three Partitioning Types

BigQuery supports three ways to partition a table. Each serves different access patterns.

Time-Unit Column Partitioning

The most common type. You pick a DATE, TIMESTAMP, or DATETIME column and a granularity: HOUR, DAY, MONTH, or YEAR.

CREATE TABLE `project.dataset.events`
PARTITION BY DATE(event_timestamp)
AS SELECT * FROM source_events;

Or in dbt:

{{ config(
    partition_by={
      'field': 'event_date',
      'data_type': 'date',
      'granularity': 'day'
    }
) }}

This is the standard choice for event data, time-series workloads, and anything where queries consistently filter by date range. Most analytics tables fall into this category.

The granularity decision is significant. Daily partitioning on a table receiving 100MB per day creates 365 small partitions per year, each too small for BigQuery to optimize well. Monthly partitioning combined with clustering on the date column provides lifecycle management benefits without the small-partition overhead. See the decision framework for the size thresholds.

Ingestion-Time Partitioning

Assigns rows to partitions based on when BigQuery ingests them, using the pseudocolumns _PARTITIONTIME and _PARTITIONDATE:

CREATE TABLE `project.dataset.raw_events`
PARTITION BY _PARTITIONDATE
AS SELECT * FROM source_events;

This is useful when your source data lacks reliable timestamps — think legacy systems with unreliable created_at fields, or streaming pipelines where you want partitions based on arrival time rather than event time. The ingestion timestamp is always accurate because BigQuery controls it.

The trade-off is semantic. Ingestion time doesn’t equal event time. A record about something that happened on March 1st might arrive on March 5th and land in the March 5th partition. If your queries care about when things happened rather than when they arrived, time-unit column partitioning on the event timestamp is a better fit.

In dbt, ingestion-time partitioning has a special interaction with the copy_partitions optimization for insert_overwrite strategy. Enabling copy_partitions: true uses BigQuery’s copy table API instead of MERGE statements, eliminating insert costs and providing roughly 40% faster performance on large partitions:

{{ config(
    materialized='incremental',
    incremental_strategy='insert_overwrite',
    partition_by={
      'field': 'created_date',
      'data_type': 'timestamp',
      'granularity': 'day',
      'time_ingestion_partitioning': true,
      'copy_partitions': true
    }
) }}

Integer Range Partitioning

Divides data by INTEGER column ranges you define — start, end, and interval:

partition_by={
  'field': 'customer_id',
  'data_type': 'int64',
  'range': {
    'start': 0,
    'end': 1000000,
    'interval': 1000
  }
}

This works for ID-based access patterns or customer segmentation scenarios. It’s less common than time-based partitioning, but it solves specific problems well — multi-tenant systems where queries always filter by tenant ID, or data architectures that shard by customer segments.

The interval defines how many distinct values go into each partition. An interval of 1000 with a range of 0 to 1,000,000 creates 1,000 partitions. Values outside the range go into a special __UNPARTITIONED__ partition.

Key Constraints

Three constraints shape how you design partitioned tables:

One column only. You can partition by exactly one column. If you need filtering on multiple dimensions, partition on the most common filter and cluster on the rest.

10,000 partition limit. Each table can have at most 10,000 partitions. With daily granularity, that’s about 27 years of data — generally fine. But if you’re partitioning by hour on data that spans years, you’ll hit this limit. And integer range partitioning can hit it quickly with large ranges and small intervals.

Constant expressions required for pruning. This is the constraint that catches people. Partition pruning only works when the filter value is known at query planning time. A filter like WHERE event_date = '2025-01-15' prunes because the value is a literal. But WHERE event_date = (SELECT MAX(date) FROM other_table) does not prune because the value depends on a subquery that hasn’t executed yet.

This constraint extends to functions. WHERE DATE(event_timestamp) = '2025-01-15' on a TIMESTAMP-partitioned table defeats pruning because BigQuery can’t evaluate the function at planning time. See BigQuery Partition Pruning Patterns for the full catalog of pruning anti-patterns and their fixes.

What Partitioning Provides

Beyond query cost reduction, partitioning enables several operational capabilities that clustering does not:

Predictable cost estimates before execution. Partition pruning determines exactly how many bytes will be scanned before the query runs. Clustering estimates are only finalized after execution.

Partition lifecycle management. Automatic expiration via partition_expiration_days, efficient deletion of entire date ranges, and writes to specific partitions without touching others. Deleting a month of user data is a fast partition drop rather than a full table scan and rewrite.

Targeted writes in dbt. The insert_overwrite strategy replaces entire partitions atomically — no row-level comparison, no full table scan. This requires partitioning.

Partition filter enforcement. require_partition_filter = true prevents queries without a partition filter from running, making accidental full table scans on large tables impossible.

The Relationship Between Partitioning and Clustering

Partitioning and clustering are complementary, not competitive. Partitioning creates hard boundaries that eliminate entire segments. Clustering organizes data within those segments for finer-grained block-level skipping. On large tables, using both delivers the best results — partition pruning gives you the first 10-100x cost reduction, and clustering adds another 2-10x within each partition.

The decision framework walks through exactly when to use each, and when to combine them.