ServicesAboutNotesContact Get in touch →
EN FR
Note

BigQuery Data Lake Patterns

A reading guide for understanding BigQuery data lake architecture: table types, the medallion lakehouse pattern, catalog strategy, performance, cost optimization, and common mistakes.

Planted
bigquerygcpdata engineeringdata modelingcost optimization

Notes on designing, implementing, and optimizing a BigQuery-based data lake. Covers table types, the medallion lakehouse pattern on GCP, catalog strategy, performance characteristics, and common mistakes. Derived from BigQuery and Cloud Storage data lake patterns.

Prerequisites

Reading Order

1. BigQuery Table Types — native BigQuery tables, BigLake external tables, and BigLake Iceberg tables: what each type does and a decision framework for choosing between them.

2. BigLake Performance Characteristics — metadata caching, where the remaining performance gap between external and native tables matters, and where it doesn’t.

3. Medallion Lakehouse on GCP — the bronze-silver-gold architecture on BigQuery: Iceberg at the bronze layer, dbt transformations at silver, native tables at gold. Includes code examples.

4. BigLake Metastore and Catalog Strategy — BigLake Metastore and Dataplex Universal Catalog as the governance layer across table formats.

5. Cloud Storage Tiering for BigQuery — cost optimization across storage tiers, physical billing, and pricing model selection. Reducing storage costs by 60-80% requires coordinating all three.

6. BigQuery Data Lake Common Mistakes — missing metadata caching, unguarded partition filters, and over-engineered architectures.