ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Materialization Cost Impact on BigQuery

How dbt materialization choices affect BigQuery costs -- table vs view vs ephemeral trade-offs, the view chain anti-pattern, and why defaulting to tables usually wins.

Planted
dbtbigquerycost optimizationdata modeling

On BigQuery, where you pay per byte scanned, the choice of dbt materialization determines the baseline cost profile of every model. The cost compounds across every downstream query and every dbt run.

The Three Materializations and Their Cost Profiles

Table: Full scan during build, then storage costs. Every downstream query reads from the materialized table — no recomputation, no upstream scanning. The build cost is a one-time expense per run. Downstream queries are as cheap as possible because they only scan the materialized result, not the original source data.

View: Zero build cost, zero storage cost. Sounds free — it’s not. Every query against a view triggers a full execution of the underlying SQL, which means scanning all upstream tables. For complex transformations or views queried frequently, the cumulative compute cost dwarfs what table materialization would have cost.

Ephemeral: No storage, no build. The model is interpolated as a CTE into every model that references it. Here’s the trap: BigQuery does not materialize non-recursive CTEs. Each reference re-executes the underlying scan. An ephemeral model referenced by three downstream models means three separate scans of the source data. Ephemeral models are also difficult to debug since they don’t exist as queryable objects in BigQuery.

The cost comparison for a 100 GB source transformation queried 10 times daily:

MaterializationBuild CostQuery Cost (per query)Daily Total
Table$0.63~$0.01-0.10 (reads smaller result)~$1.63
View$0$0.63 (full source scan each time)$6.30
Ephemeral$0$0.63 (per referencing model)$6.30+

Views cost 4x more in this scenario. For models queried 50 times daily (dashboards, multiple downstream dependencies), the multiplier grows to 20x or more.

The View Chain Anti-Pattern

The most expensive mistake in dbt-on-BigQuery is chaining views together. Nothing computes until a query reaches a table materialization, at which point every upstream view in the chain executes.

Consider a lineage like this:

base__events (view) → int__events__enriched (view) → int__events__sessionized (view) → mrt__daily_sessions (table)

When dbt builds mrt__daily_sessions, BigQuery executes the SQL for all three upstream views in a single cascading query. The total bytes scanned equals the sum of what each view would scan independently.

But the cost doesn’t stop at dbt build time. If an analyst queries int__events__enriched directly (for exploration or debugging), BigQuery scans base__events under the hood. If another model also references int__events__sessionized, that’s another full cascade. Every consumer of any view in the chain triggers the full upstream computation.

dbt’s own dbt_project_evaluator package flags chains of 4 or more views as problematic. The warning exists because view chains don’t cost linearly — they cost multiplicatively across every downstream consumer and every ad-hoc query.

Why Tables Win on BigQuery

Given that storage costs are negligible compared to compute (85-90% compute vs 10-15% storage), the calculus on BigQuery is clear: materializing to tables is almost always cheaper than using views.

A 10 GB table stored for a month costs $0.20. Scanning that same 10 GB once costs $0.0625 on-demand. If the view equivalent would be queried even a few times, the table pays for itself in storage many times over.

The recommended default for most dbt projects on BigQuery:

dbt_project.yml
models:
my_project:
base:
+materialized: table
intermediate:
+materialized: table
marts:
+materialized: table
+partition_by:
field: created_date
data_type: date

This approach materializes everything as tables across all three layers of the dbt architecture. Mart-layer tables get partitioning by default, and you add clustering on specific models as needed.

Views can work for truly simple transformations that are rarely queried — a base model that renames columns and is only referenced by a single downstream table, for example. But the cost risk of views grows with query frequency and upstream complexity. When in doubt, use a table.

Column Selection in dbt Models

Beyond materialization type, the columns you select in your dbt models directly affect cost. BigQuery’s columnar storage means each column is stored and scanned separately. Selecting all columns when you only need a few is one of the most common cost mistakes in dbt projects.

The lazy pattern:

-- Expensive: scans all columns from the base model
SELECT * FROM {{ ref('base__analytics__events') }}

The cost-conscious pattern:

-- Better: explicit columns, only what downstream actually needs
SELECT
event_id,
user_id,
event_timestamp,
event_type
FROM {{ ref('base__analytics__events') }}

For a wide table with 50 columns where you only need 5, explicit column selection reduces scan cost by 90%. This applies at every layer of your dbt project: base models should select only the columns that downstream models actually use, intermediate models should carry forward only what marts need, and marts should expose only what consumers will query.

This discipline is harder to maintain than it sounds. Columns creep in during development (“let me add this just in case”), and nobody removes them. Periodic audits of which columns downstream models actually reference can identify dead columns that inflate costs silently.

Incremental: The Next Level

Once you’ve set table as your default materialization, the next cost lever is incremental models. A table materialization scans and rewrites the entire table on every dbt run. For a 500 GB table running daily, that’s $3.13 per run or about $94 monthly just for the rebuild.

Incremental models process only new or changed data, and the savings scale with table size and run frequency. The combination of table-default materialization for small-to-medium models and incremental materialization for large models is the optimal cost strategy for dbt on BigQuery.

The decision threshold: consider switching a model from table to incremental when:

  • The source table exceeds 100 GB
  • The model runs more than once daily
  • The transformation is append-heavy (new rows, not updates to existing rows)

For the specifics of configuring incremental models on BigQuery — including the critical choice between merge and insert_overwrite strategies — see Incremental Models in dbt and the Incremental Strategy Decision Framework.

The Cost-Conscious dbt Project

A cost-optimized dbt project on BigQuery follows these principles:

  1. Default to table materialization across all layers
  2. Use explicit column selection instead of SELECT * at every layer
  3. Add partitioning to mart-layer tables and any table over 100 GB
  4. Add clustering to frequently filtered tables over 64 MB
  5. Switch to incremental for large, append-heavy tables
  6. Use insert_overwrite with static partitions as the incremental strategy
  7. Set maximum_bytes_billed in profiles.yml as a safety net
  8. Never chain views — if you must use views, keep them to a single layer with table materializations on both sides

These configurations prevent compounding cost problems that can multiply a $500/month BigQuery bill by an order of magnitude.