The medallion architecture — bronze, silver, gold layers with increasing data quality — maps to BigQuery’s table type options. Different layers optimize for different things, and BigQuery allows mixing table types within a single project: flexibility where needed (early pipeline stages) and performance where it matters (analytics serving layers).
Bronze Layer: Raw Landing
Raw data lands in BigLake Iceberg tables. Spark or Flink jobs write directly to Iceberg format on Cloud Storage. BigLake Metastore tracks the table metadata, making these tables discoverable by both BigQuery and external compute engines.
This layer prioritizes write throughput and schema flexibility over query performance.
-- Bronze layer: BigLake Iceberg table for raw event ingestionCREATE TABLE `project.bronze.raw_events`( event_id STRING, event_timestamp TIMESTAMP, payload JSON, source_system STRING, ingestion_time TIMESTAMP)CLUSTER BY source_systemWITH CONNECTION `project.us.biglake_connection`OPTIONS ( file_format = 'PARQUET', table_format = 'ICEBERG', storage_uri = 'gs://data-lake-bronze/raw_events');Why Iceberg at the bronze layer:
- Multi-engine writes: Spark and Flink can write directly to the Iceberg format without going through BigQuery
- Schema evolution: Raw data schemas change frequently; Iceberg handles column additions, renames, and type promotions
- Time travel: Replay or audit raw data as of any previous state
- Portability: If you ever need to migrate off GCP, your raw data is in an open format
Bronze tables are typically append-heavy. New data arrives continuously, and corrections are rare. Enable Autoclass on the underlying Cloud Storage bucket so older data automatically moves to cheaper storage tiers based on actual access patterns.
Silver Layer: Cleaned and Conformed
dbt models transform bronze data into cleaned, conformed datasets. These transformations run in BigQuery using SQL, reading from Iceberg tables and writing to either more Iceberg tables or native BigQuery tables depending on downstream requirements.
The silver layer maps directly to the base and intermediate layers in a dbt project. Base models clean raw bronze data (rename columns, cast types, deduplicate). Intermediate models join and enrich across entities.
-- Silver layer dbt model reading from bronze Iceberg, writing to BigQuery-- models/intermediate/int__events__enriched.sql{{ config( materialized='incremental', incremental_strategy='insert_overwrite', partition_by={'field': 'event_date', 'data_type': 'date'}, cluster_by=['user_id', 'event_name']) }}
SELECT event_id, DATE(event_timestamp) AS event_date, event_timestamp, user_id, JSON_VALUE(payload, '$.event_name') AS event_name, JSON_VALUE(payload, '$.page_url') AS page_url, source_systemFROM {{ source('bronze', 'raw_events') }}
{% if is_incremental() %}WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY){% endif %}
QUALIFY ROW_NUMBER() OVER ( PARTITION BY event_id ORDER BY ingestion_time DESC) = 1Whether silver tables should be Iceberg or native depends on who else needs to read them:
- Iceberg: When Spark-based ML pipelines or other engines read silver data directly
- Native BigQuery: When only BigQuery and dbt consume silver tables (better query performance, simpler management)
For most analytics-focused teams, native BigQuery tables at the silver layer work well. The multi-engine benefit of Iceberg matters most at bronze (where external systems write) and less at silver (where BigQuery does the work).
Enable auto-reclustering for frequently-queried silver tables to maintain performance as data accumulates. BigQuery handles this automatically for native tables; for Iceberg tables, BigQuery manages compaction in the background.
Gold Layer: Business-Ready Analytics
Aggregated, business-ready datasets live in native BigQuery tables optimized for BI workloads. These tables serve dashboards, support ad-hoc analyst queries, and back embedded analytics. This is the mart layer in dbt terms.
-- Gold layer mart model: daily channel performance-- models/marts/marketing/mrt__marketing__daily_channel_performance.sql{{ config( materialized='incremental', incremental_strategy='insert_overwrite', partition_by={'field': 'event_date', 'data_type': 'date'}, cluster_by=['channel', 'source']) }}
SELECT event_date, channel, source, COUNT(DISTINCT user_id) AS unique_users, COUNT(DISTINCT session_id) AS sessions, SUM(conversions) AS total_conversions, SAFE_DIVIDE(SUM(conversions), COUNT(DISTINCT session_id)) AS conversion_rateFROM {{ ref('int__sessions__attributed') }}
{% if is_incremental() %}WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY){% endif %}
GROUP BY 1, 2, 3Native tables make sense at the gold layer because:
- Performance matters most here — these are the tables users actually touch
- Multi-engine access is rarely needed for aggregated metrics
- Streaming inserts work with sub-second latency for real-time dashboards
- Materialized views can further accelerate repeated dashboard queries (see BigQuery Materialized Views)
The Hybrid Advantage
The real power of this pattern is optionality. You’re not locked into a single table format across your entire platform.
- Bronze (Iceberg): Any engine can write, any engine can read. You can add Spark processing later or migrate to another platform without re-exporting raw data.
- Silver (Iceberg or native): Choose based on whether non-BigQuery engines need access.
- Gold (native): Maximum query speed for the tables that drive business decisions.
This hybrid approach preserves flexibility in early pipeline stages while maximizing speed for the tables users interact with daily. The catalog layer ties it all together, providing a unified view across table types.
When Not to Use the Full Medallion Pattern
Not every dataset needs the full medallion treatment. Simple reporting tables can go directly from source to native BigQuery tables without intermediate Iceberg layers. The lakehouse pattern adds value for data that needs:
- Multi-engine access at any stage
- Long-term portability guarantees
- Complex ingestion from streaming systems (Spark, Flink)
- Regulatory separation between raw and transformed data
For data that only BigQuery will ever touch, a straightforward dbt project with native tables at every layer is simpler and performs better. Over-complicating the architecture is one of the most common mistakes in BigQuery data lake implementations. Match the complexity of your infrastructure to the complexity of your actual requirements.