ServicesAboutNotesContact Get in touch →
EN FR
Note

BigQuery Table Types

Native BigQuery tables, BigLake external tables, and BigLake Iceberg tables — what each optimizes for, when to use them, and a decision framework for choosing.

Planted
bigquerygcpdata engineeringdata modeling

BigQuery offers three distinct approaches to storing and querying data. Each optimizes for different constraints, and the right choice depends on your access patterns, governance requirements, and whether engines besides BigQuery need to touch the same data. Understanding the trade-offs is the starting point for any GCP data platform design.

Native BigQuery Tables

Native tables store data in BigQuery’s proprietary columnar format (Capacitor on Colossus, as covered in BigQuery Architecture for Analytics Engineers). They deliver the best query performance, support streaming inserts with sub-second latency, and handle high-concurrency workloads without tuning.

The trade-off is portability. Data lives only in BigQuery. If Spark, Trino, or another engine needs access, you’re exporting data or running federated queries. For teams that operate entirely within BigQuery, this isn’t a limitation. For teams building multi-engine architectures, it’s a hard constraint.

Native tables are the right default for:

  • Gold-layer analytics tables serving dashboards and BI tools
  • High-concurrency workloads with hundreds of simultaneous users
  • Real-time operational dashboards requiring sub-second streaming insert latency
  • Tables that only BigQuery will ever query

BigLake External Tables

BigLake tables query data sitting in Cloud Storage while applying BigQuery’s governance layer. You get row-level security, column-level masking, and fine-grained access control on data stored in Parquet, ORC, Avro, or CSV files. The data stays in object storage — BigQuery reads it at query time through a Cloud Resource connection.

Query performance depends heavily on file layout and metadata caching configuration. Without metadata caching enabled, every query scans Cloud Storage to discover file statistics, which adds significant latency. With caching, performance approaches within 20-30% of native tables for most analytical patterns.

BigLake tables fit when:

  • Multiple engines need to query the same data (Spark, Trino, Presto alongside BigQuery)
  • Compliance requirements restrict data movement — some regulatory frameworks require data to stay in specific storage systems
  • You need BigQuery governance on existing Cloud Storage data without copying it into managed storage
  • Cost optimization via storage tiering is a priority for large, infrequently queried datasets

BigLake tables support Hive-style partitioning. Set require_partition_filter = true on tables partitioned by date or other high-cardinality columns to prevent analysts from accidentally triggering full table scans across terabytes of data. This is one of the most common mistakes in BigQuery data lake implementations — see BigQuery Partition Pruning Patterns for the full story.

BigLake Iceberg Tables

BigLake Iceberg Tables represent Google’s strategic direction for BigQuery. These tables store data in Apache Iceberg format on Cloud Storage but support full BigQuery DML — INSERT, UPDATE, DELETE, and MERGE. They handle streaming ingestion, support time travel queries, and maintain metadata that any Iceberg-compatible engine can read.

-- Creating a BigLake Iceberg table
CREATE TABLE `project.dataset.events`
(
event_id STRING,
event_timestamp TIMESTAMP,
user_id STRING,
event_name STRING,
event_params JSON
)
CLUSTER BY user_id
WITH CONNECTION `project.region.connection_id`
OPTIONS (
file_format = 'PARQUET',
table_format = 'ICEBERG',
storage_uri = 'gs://bucket/path/to/table'
);

BigQuery manages table compaction and optimization automatically. You don’t need to run OPTIMIZE or manage small files the way you would with a self-managed Iceberg deployment on Spark.

Key characteristics:

  • Full DML support: Unlike standard BigLake tables, you can run INSERT, UPDATE, DELETE, and MERGE operations
  • Streaming ingestion: High-throughput streaming, though individual record latency runs higher than native tables
  • Time travel: Query data as of a previous point in time
  • Multi-engine access: Any Iceberg-compatible engine (Spark, Flink, Trino) can read the data through the BigLake Metastore
  • Automatic compaction: BigQuery handles file compaction and optimization in the background

For most new projects, BigLake Iceberg tables eliminate the need to choose between native and external tables. You get open-format flexibility with warehouse-like operational characteristics. The performance penalty is acceptable for most use cases, and you retain the ability to export data to any Iceberg-compatible system.

Decision Framework

When evaluating which table type to use for a specific dataset, work through these questions in order:

1. Does any engine besides BigQuery need to query this data? If Spark, Trino, Presto, or another compute engine needs direct access, BigLake tables (either standard or Iceberg) are your only options. Native BigQuery tables require exporting data or using BigQuery’s connector ecosystem.

2. Do you need streaming inserts with sub-second latency? Native BigQuery tables handle this best. BigLake Iceberg supports streaming but with higher latency per record. For real-time operational dashboards or alerting, start with native tables.

3. Is this data subject to compliance requirements restricting movement? Some regulatory frameworks require data to remain in specific storage systems or regions. BigLake tables let you query data in place without copying it into BigQuery’s managed storage.

4. How important is query performance for this specific table? Gold-layer tables serving dashboards benefit from native table performance. Bronze-layer tables that only feed batch pipelines rarely need it. See BigLake Performance Characteristics for the specifics on where the gap matters.

5. Will this dataset exist in five years? Open formats like Iceberg provide better long-term portability. If you’re building core data assets that outlast your current architecture, the insurance value of open formats matters.

The default for new projects in 2026: BigLake Iceberg tables unless a specific requirement points elsewhere. Native BigQuery tables remain the right choice for high-performance serving layers. A hybrid architecture — Iceberg for flexibility, native for speed — is what the medallion lakehouse pattern implements.