Adrienne Vermorel
Base, Intermediate, Marts: When to Use Each Layer
Most dbt projects I audit have the same problem. Not missing tests or poor documentation (though those exist too). The real issue is blurred responsibilities between layers. Business logic scattered in base models. Aggregations happening in intermediate. Marts that are just renamed intermediates.
The three-layer pattern exists for a reason. Each layer has a job. When you respect those boundaries, your project becomes predictable. When you don’t, you end up with a codebase where nobody knows where to look for anything.
The Quick Reference
Before the nuance, here’s the summary:
| Layer | Prefix | What It Does | Aggregation |
|---|---|---|---|
| Base | base__ | Rename, cast, light cleanup | Never |
| Intermediate | int__ | Joins, business logic, full grain | Never |
| Mart | mrt__ | Aggregate for consumption | Yes |
Base models clean raw data without changing its meaning. Intermediate models combine and enrich that data while preserving every row. Marts aggregate to the grain your consumers actually need.
Base Layer: Keep It Boring
Base models are the least interesting part of your project, and that’s the point. Their job is to make raw source data usable without adding interpretation.
What belongs here:
- Renaming columns to your naming standards (for example,
user_idbecomescustomer__id) - Casting data types (strings to timestamps, integers to booleans)
- Deduplicating when sources have duplicate records
- Unnesting nested or repeated fields (common with GA4, JSON payloads)
- Filtering out test data or invalid records
What doesn’t belong here:
- Joins (with one exception I’ll explain)
- Business logic or calculations
- Aggregations of any kind
The exception on joins: some ETL tools split what’s logically one API endpoint into multiple tables. If your source has orders and order_metadata that should have always been one table, joining them in base is fine. You’re correcting an ETL artifact, not adding business logic.
A typical base model looks like this:
{{ config( materialized='table', tags=['base', 'shopify']) }}
WITH source AS ( SELECT id, user_id, created_at, total, status, is_deleted, _loaded_at FROM {{ source('shopify', 'orders') }}),
deduplicated AS ( SELECT id, user_id, created_at, total, status, is_deleted, _loaded_at FROM source QUALIFY ROW_NUMBER() OVER ( PARTITION BY id ORDER BY _loaded_at DESC ) = 1),
renamed AS ( SELECT id AS order__id, user_id AS customer__id, created_at AS order__created_at, CAST(total AS FLOAT64) AS order__amount_usd, status AS order__status, CAST(is_deleted AS BOOL) AS order__is_deleted, _loaded_at FROM deduplicated)
SELECT order__id, customer__id, order__created_at, order__amount_usd, order__status, order__is_deleted, _loaded_atFROM renamedWHERE order__is_deleted = FALSENotice there’s no business logic here. I’m not calculating margins or categorizing orders. I’m just making the raw data clean and consistently named.
Base models should be so boring that anyone can understand them in thirty seconds.
Intermediate Layer: Where Business Logic Lives
Intermediate models are where your project’s actual intelligence resides. This is where you join base models together, add calculated fields, and encode business rules. The critical constraint: you never reduce the grain.
What belongs here:
- Joining base models together
- Adding business calculations (margin rates, categorizations)
- Window functions (customer order number, running totals, rankings)
- Derived boolean flags (
order__is_first_order,customer__is_high_value)
What doesn’t belong here:
- Aggregations that reduce row count
GROUP BYas a final output (with one exception, explained below)
The exception on aggregations: sometimes you need to aggregate to create a lookup, then join that lookup back to your main grain. That’s fine, in the end you are not changing the grain of the table.
An example that shows the pattern:
{{ config( materialized='table', tags=['intermediate', 'order']) }}
WITH base__orders AS ( SELECT order__id, customer__id, order__amount_usd, order__created_at, order__is_completed, channel__name FROM {{ ref('base__shopify__orders') }}),
base__customers AS ( SELECT customer__id, customer__email, customer__segment FROM {{ ref('base__shopify__customers') }}),
joined AS ( SELECT base__orders.order__id, base__orders.customer__id, base__orders.order__amount_usd, base__orders.order__created_at, base__orders.order__is_completed, base__orders.channel__name, base__customers.customer__email, base__customers.customer__segment FROM base__orders LEFT JOIN base__customers USING (customer__id)),
enriched AS ( SELECT *, order__amount_usd * 0.7 AS order__margin_usd,
CASE WHEN order__amount_usd >= 500 THEN 'high' WHEN order__amount_usd >= 100 THEN 'medium' ELSE 'low' END AS order__value_tier,
order__amount_usd >= 100 AS order__is_significant FROM joined),
windowed AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY customer__id ORDER BY order__created_at ) AS customer__order_number,
ROW_NUMBER() OVER ( PARTITION BY customer__id ORDER BY order__created_at ) = 1 AS order__is_first_order FROM enriched)
SELECT order__id, customer__id, order__created_at, order__amount_usd, order__margin_usd, order__value_tier, order__is_completed, order__is_first_order, order__is_significant, channel__name, customer__email, customer__segment, customer__order_numberFROM windowedThe output has the same number of rows as base__shopify__orders. Every order is still there, now enriched with customer data, calculated fields, and sequence information.
For customer-level intermediate models, you might aggregate order data to bring metrics back to the customer grain. That’s the lookup pattern I mentioned:
WITH base__customers AS ( SELECT customer__id, customer__email FROM {{ ref('base__shopify__customers') }}),
customer_order_summary AS ( SELECT customer__id, COUNT(DISTINCT order__id) AS customer__orders, SUM(order__amount_usd) AS customer__total_spent_usd, MAX(order__created_at) AS customer__last_ordered_at FROM {{ ref('int__order__enriched') }} WHERE order__is_completed = TRUE GROUP BY 1),
joined AS ( SELECT base__customers.customer__id, base__customers.customer__email, COALESCE(customer_order_summary.customer__orders, 0) AS customer__orders, COALESCE(customer_order_summary.customer__total_spent_usd, 0) AS customer__total_spent_usd, customer_order_summary.customer__last_ordered_at FROM base__customers LEFT JOIN customer_order_summary USING (customer__id))
SELECT customer__id, customer__email, customer__orders, customer__total_spent_usd, customer__last_ordered_atFROM joinedThe grain is still one row per customer. The aggregation happened in a CTE to create a lookup, then got joined back to the customer base. This is the one place where GROUP BY appears in intermediate models.
Mart Layer: Built for Consumers
Marts exist for specific use cases. Unlike base and intermediate models, which serve the project internally, marts serve external consumers: dashboards, reverse ETL, ML pipelines.
Each mart has a specific consumer and a defined grain. You’re not building “general purpose” tables. You’re answering specific questions at specific aggregation levels.
The three main patterns:
Reporting marts aggregate to the grain your dashboards need. Daily performance by channel. Weekly cohort retention. Monthly revenue by segment.
Activation marts prepare data for reverse ETL tools destination like Braze or Salesforce. Usually one row per entity, with all the attributes the destination system expects.
ML marts create feature tables for machine learning. Wide format, temporal features, one row per prediction target.
A reporting mart example:
{{ config( materialized='table', partition_by={ "field": "date_day", "data_type": "date", "granularity": "day" }, cluster_by=['channel__name'], tags=['mart', 'reporting', 'marketing']) }}
WITH int__orders AS ( SELECT order__id, order__created_at, order__amount_usd, customer__id, channel__name, order__margin_usd, order__is_first_order, order__is_completed FROM {{ ref('int__order__enriched') }} WHERE order__is_completed = TRUE),
aggregated AS ( SELECT DATE(order__created_at) AS date_day, channel__name, COUNT(DISTINCT order__id) AS orders, COUNT(DISTINCT customer__id) AS customers, SUM(CASE WHEN order__is_first_order THEN 1 ELSE 0 END) AS new_customers, SUM(order__amount_usd) AS revenue_usd, SUM(order__margin_usd) AS margin_usd, SUM(CASE WHEN order__is_first_order THEN order__amount_usd ELSE 0 END) AS new_customer_revenue_usd FROM int__orders GROUP BY ALL),
enriched AS ( SELECT *, SAFE_DIVIDE(margin_usd, revenue_usd) AS margin_rate, SAFE_DIVIDE(revenue_usd, orders) AS avg_order_value_usd, SAFE_DIVIDE(new_customers, customers) AS new_customer_rate FROM aggregated)
SELECT date_day, channel__name, orders, customers, new_customers, revenue_usd, margin_usd, margin_rate, avg_order_value_usd, new_customer_revenue_usd, new_customer_rateFROM enrichedThis is the first time we’ve seen GROUP BY as the final output. The mart aggregates to daily + channel grain because that’s what the marketing dashboard needs. Not order-level, not hourly. Daily by channel.
An activation mart looks different. This one is built for a CRM tool:
{{ config( materialized='table', tags=['mart', 'activation', 'braze']) }}
WITH int__customers AS ( SELECT customer__id, customer__email, customer__segment, customer__orders, customer__total_spent_usd, customer__last_ordered_at FROM {{ ref('int__customer__enriched') }}),
enriched AS ( SELECT customer__id, customer__email, customer__segment, customer__orders, customer__total_spent_usd, customer__last_ordered_at,
CASE WHEN customer__orders = 0 THEN 'prospect' WHEN customer__orders = 1 THEN 'new' WHEN customer__last_ordered_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) THEN 'active' WHEN customer__last_ordered_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY) THEN 'lapsing' ELSE 'churned' END AS customer__lifecycle_stage,
customer__orders >= 5 AS customer__is_vip, customer__total_spent_usd >= 1000 AS customer__is_high_ltv FROM int__customers)
SELECT customer__email AS email, customer__id AS external_id, customer__segment, customer__lifecycle_stage AS lifecycle_stage, customer__orders AS orders, customer__total_spent_usd AS total_spent_usd, customer__is_vip AS is_vip, customer__is_high_ltv AS is_high_ltvFROM enrichedThe column names match what Braze expects. The table exposes only what the destination system needs, nothing more.
The Decision Framework
When you’re not sure where something belongs, ask these questions:
Am I just cleaning source data? Base layer. Renaming, casting, deduplicating, unnesting. No interpretation.
Am I combining tables or adding calculations while keeping every row? Intermediate layer. Joins, business logic, window functions. Preserve the grain.
Am I reducing rows for a specific consumer? Mart layer. Aggregations, final formatting, consumer-specific column names.
A more detailed reference:
| Task | Layer |
|---|---|
| Rename columns from source | Base |
| Cast data types | Base |
| Unnest nested/repeated fields | Base |
| Deduplicate source data | Base |
| Join tables together | Intermediate |
| Add business logic/calculations | Intermediate |
| Window functions (rankings, running totals) | Intermediate |
| Aggregate to reporting grain | Mart |
| Build for specific consumer | Mart |
Common Mistakes I See
Business logic in base models. Calculating is_high_value or customer_tier in base. This belongs in intermediate. Base models should be simple.
Aggregations in intermediate. Building a int__daily_orders that groups by date. This belongs in a mart. Intermediate preserves grain.
Marts that don’t aggregate. A mart that’s just SELECT * FROM int__order__enriched with a different name. If you’re not changing the grain or formatting for a consumer, you probably don’t need a mart.
One giant intermediate model. Joining fifteen tables in a single intermediate model. Build intermediate models around business entities (order, customer, product), not around “all the data I might need.”
Unclear ownership. Two intermediate models that both calculate customer__lifetime_value differently. Pick one source of truth. Reference it everywhere.
Start Simple
You don’t need all three layers from day one. If you have a small project with straightforward sources, start with base and marts. Add intermediate models when you find yourself duplicating joins or calculations across multiple marts.
The pattern scales up as your project grows. A mature project might have:
- 30 base models (one per source table)
- 10 intermediate models (one per business entity)
- 20 marts (various reporting, activation, and ML use cases)
The ratios don’t matter. What matters is that each layer has clear responsibilities, and everyone on your team knows where to find things.
When someone asks “where does customer lifetime value get calculated?” the answer should be obvious: int__customer__enriched. Not “somewhere in the marts” or “I think there are a few different versions.”
That predictability is the whole point.