Adrienne Vermorel

Base, Intermediate, Marts: When to Use Each Layer

Most dbt projects I audit have the same problem. Not missing tests or poor documentation (though those exist too). The real issue is blurred responsibilities between layers. Business logic scattered in base models. Aggregations happening in intermediate. Marts that are just renamed intermediates.

The three-layer pattern exists for a reason. Each layer has a job. When you respect those boundaries, your project becomes predictable. When you don’t, you end up with a codebase where nobody knows where to look for anything.

The Quick Reference

Before the nuance, here’s the summary:

LayerPrefixWhat It DoesAggregation
Basebase__Rename, cast, light cleanupNever
Intermediateint__Joins, business logic, full grainNever
Martmrt__Aggregate for consumptionYes

Base models clean raw data without changing its meaning. Intermediate models combine and enrich that data while preserving every row. Marts aggregate to the grain your consumers actually need.

Base Layer: Keep It Boring

Base models are the least interesting part of your project, and that’s the point. Their job is to make raw source data usable without adding interpretation.

What belongs here:

  • Renaming columns to your naming standards (for example, user_id becomes customer__id)
  • Casting data types (strings to timestamps, integers to booleans)
  • Deduplicating when sources have duplicate records
  • Unnesting nested or repeated fields (common with GA4, JSON payloads)
  • Filtering out test data or invalid records

What doesn’t belong here:

  • Joins (with one exception I’ll explain)
  • Business logic or calculations
  • Aggregations of any kind

The exception on joins: some ETL tools split what’s logically one API endpoint into multiple tables. If your source has orders and order_metadata that should have always been one table, joining them in base is fine. You’re correcting an ETL artifact, not adding business logic.

A typical base model looks like this:

{{ config(
materialized='table',
tags=['base', 'shopify']
) }}
WITH source AS (
SELECT
id,
user_id,
created_at,
total,
status,
is_deleted,
_loaded_at
FROM {{ source('shopify', 'orders') }}
),
deduplicated AS (
SELECT
id,
user_id,
created_at,
total,
status,
is_deleted,
_loaded_at
FROM source
QUALIFY ROW_NUMBER() OVER (
PARTITION BY id
ORDER BY _loaded_at DESC
) = 1
),
renamed AS (
SELECT
id AS order__id,
user_id AS customer__id,
created_at AS order__created_at,
CAST(total AS FLOAT64) AS order__amount_usd,
status AS order__status,
CAST(is_deleted AS BOOL) AS order__is_deleted,
_loaded_at
FROM deduplicated
)
SELECT
order__id,
customer__id,
order__created_at,
order__amount_usd,
order__status,
order__is_deleted,
_loaded_at
FROM renamed
WHERE order__is_deleted = FALSE

Notice there’s no business logic here. I’m not calculating margins or categorizing orders. I’m just making the raw data clean and consistently named.

Base models should be so boring that anyone can understand them in thirty seconds.

Intermediate Layer: Where Business Logic Lives

Intermediate models are where your project’s actual intelligence resides. This is where you join base models together, add calculated fields, and encode business rules. The critical constraint: you never reduce the grain.

What belongs here:

  • Joining base models together
  • Adding business calculations (margin rates, categorizations)
  • Window functions (customer order number, running totals, rankings)
  • Derived boolean flags (order__is_first_order, customer__is_high_value)

What doesn’t belong here:

  • Aggregations that reduce row count
  • GROUP BY as a final output (with one exception, explained below)

The exception on aggregations: sometimes you need to aggregate to create a lookup, then join that lookup back to your main grain. That’s fine, in the end you are not changing the grain of the table.

An example that shows the pattern:

{{ config(
materialized='table',
tags=['intermediate', 'order']
) }}
WITH base__orders AS (
SELECT
order__id,
customer__id,
order__amount_usd,
order__created_at,
order__is_completed,
channel__name
FROM {{ ref('base__shopify__orders') }}
),
base__customers AS (
SELECT
customer__id,
customer__email,
customer__segment
FROM {{ ref('base__shopify__customers') }}
),
joined AS (
SELECT
base__orders.order__id,
base__orders.customer__id,
base__orders.order__amount_usd,
base__orders.order__created_at,
base__orders.order__is_completed,
base__orders.channel__name,
base__customers.customer__email,
base__customers.customer__segment
FROM base__orders
LEFT JOIN base__customers USING (customer__id)
),
enriched AS (
SELECT
*,
order__amount_usd * 0.7 AS order__margin_usd,
CASE
WHEN order__amount_usd >= 500 THEN 'high'
WHEN order__amount_usd >= 100 THEN 'medium'
ELSE 'low'
END AS order__value_tier,
order__amount_usd >= 100 AS order__is_significant
FROM joined
),
windowed AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY customer__id
ORDER BY order__created_at
) AS customer__order_number,
ROW_NUMBER() OVER (
PARTITION BY customer__id
ORDER BY order__created_at
) = 1 AS order__is_first_order
FROM enriched
)
SELECT
order__id,
customer__id,
order__created_at,
order__amount_usd,
order__margin_usd,
order__value_tier,
order__is_completed,
order__is_first_order,
order__is_significant,
channel__name,
customer__email,
customer__segment,
customer__order_number
FROM windowed

The output has the same number of rows as base__shopify__orders. Every order is still there, now enriched with customer data, calculated fields, and sequence information.

For customer-level intermediate models, you might aggregate order data to bring metrics back to the customer grain. That’s the lookup pattern I mentioned:

WITH base__customers AS (
SELECT
customer__id,
customer__email
FROM {{ ref('base__shopify__customers') }}
),
customer_order_summary AS (
SELECT
customer__id,
COUNT(DISTINCT order__id) AS customer__orders,
SUM(order__amount_usd) AS customer__total_spent_usd,
MAX(order__created_at) AS customer__last_ordered_at
FROM {{ ref('int__order__enriched') }}
WHERE order__is_completed = TRUE
GROUP BY 1
),
joined AS (
SELECT
base__customers.customer__id,
base__customers.customer__email,
COALESCE(customer_order_summary.customer__orders, 0) AS customer__orders,
COALESCE(customer_order_summary.customer__total_spent_usd, 0) AS customer__total_spent_usd,
customer_order_summary.customer__last_ordered_at
FROM base__customers
LEFT JOIN customer_order_summary USING (customer__id)
)
SELECT
customer__id,
customer__email,
customer__orders,
customer__total_spent_usd,
customer__last_ordered_at
FROM joined

The grain is still one row per customer. The aggregation happened in a CTE to create a lookup, then got joined back to the customer base. This is the one place where GROUP BY appears in intermediate models.

Mart Layer: Built for Consumers

Marts exist for specific use cases. Unlike base and intermediate models, which serve the project internally, marts serve external consumers: dashboards, reverse ETL, ML pipelines.

Each mart has a specific consumer and a defined grain. You’re not building “general purpose” tables. You’re answering specific questions at specific aggregation levels.

The three main patterns:

Reporting marts aggregate to the grain your dashboards need. Daily performance by channel. Weekly cohort retention. Monthly revenue by segment.

Activation marts prepare data for reverse ETL tools destination like Braze or Salesforce. Usually one row per entity, with all the attributes the destination system expects.

ML marts create feature tables for machine learning. Wide format, temporal features, one row per prediction target.

A reporting mart example:

{{ config(
materialized='table',
partition_by={
"field": "date_day",
"data_type": "date",
"granularity": "day"
},
cluster_by=['channel__name'],
tags=['mart', 'reporting', 'marketing']
) }}
WITH int__orders AS (
SELECT
order__id,
order__created_at,
order__amount_usd,
customer__id,
channel__name,
order__margin_usd,
order__is_first_order,
order__is_completed
FROM {{ ref('int__order__enriched') }}
WHERE order__is_completed = TRUE
),
aggregated AS (
SELECT
DATE(order__created_at) AS date_day,
channel__name,
COUNT(DISTINCT order__id) AS orders,
COUNT(DISTINCT customer__id) AS customers,
SUM(CASE WHEN order__is_first_order THEN 1 ELSE 0 END) AS new_customers,
SUM(order__amount_usd) AS revenue_usd,
SUM(order__margin_usd) AS margin_usd,
SUM(CASE WHEN order__is_first_order THEN order__amount_usd ELSE 0 END) AS new_customer_revenue_usd
FROM int__orders
GROUP BY ALL
),
enriched AS (
SELECT
*,
SAFE_DIVIDE(margin_usd, revenue_usd) AS margin_rate,
SAFE_DIVIDE(revenue_usd, orders) AS avg_order_value_usd,
SAFE_DIVIDE(new_customers, customers) AS new_customer_rate
FROM aggregated
)
SELECT
date_day,
channel__name,
orders,
customers,
new_customers,
revenue_usd,
margin_usd,
margin_rate,
avg_order_value_usd,
new_customer_revenue_usd,
new_customer_rate
FROM enriched

This is the first time we’ve seen GROUP BY as the final output. The mart aggregates to daily + channel grain because that’s what the marketing dashboard needs. Not order-level, not hourly. Daily by channel.

An activation mart looks different. This one is built for a CRM tool:

{{ config(
materialized='table',
tags=['mart', 'activation', 'braze']
) }}
WITH int__customers AS (
SELECT
customer__id,
customer__email,
customer__segment,
customer__orders,
customer__total_spent_usd,
customer__last_ordered_at
FROM {{ ref('int__customer__enriched') }}
),
enriched AS (
SELECT
customer__id,
customer__email,
customer__segment,
customer__orders,
customer__total_spent_usd,
customer__last_ordered_at,
CASE
WHEN customer__orders = 0 THEN 'prospect'
WHEN customer__orders = 1 THEN 'new'
WHEN customer__last_ordered_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) THEN 'active'
WHEN customer__last_ordered_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY) THEN 'lapsing'
ELSE 'churned'
END AS customer__lifecycle_stage,
customer__orders >= 5 AS customer__is_vip,
customer__total_spent_usd >= 1000 AS customer__is_high_ltv
FROM int__customers
)
SELECT
customer__email AS email,
customer__id AS external_id,
customer__segment,
customer__lifecycle_stage AS lifecycle_stage,
customer__orders AS orders,
customer__total_spent_usd AS total_spent_usd,
customer__is_vip AS is_vip,
customer__is_high_ltv AS is_high_ltv
FROM enriched

The column names match what Braze expects. The table exposes only what the destination system needs, nothing more.

The Decision Framework

When you’re not sure where something belongs, ask these questions:

Am I just cleaning source data? Base layer. Renaming, casting, deduplicating, unnesting. No interpretation.

Am I combining tables or adding calculations while keeping every row? Intermediate layer. Joins, business logic, window functions. Preserve the grain.

Am I reducing rows for a specific consumer? Mart layer. Aggregations, final formatting, consumer-specific column names.

A more detailed reference:

TaskLayer
Rename columns from sourceBase
Cast data typesBase
Unnest nested/repeated fieldsBase
Deduplicate source dataBase
Join tables togetherIntermediate
Add business logic/calculationsIntermediate
Window functions (rankings, running totals)Intermediate
Aggregate to reporting grainMart
Build for specific consumerMart

Common Mistakes I See

Business logic in base models. Calculating is_high_value or customer_tier in base. This belongs in intermediate. Base models should be simple.

Aggregations in intermediate. Building a int__daily_orders that groups by date. This belongs in a mart. Intermediate preserves grain.

Marts that don’t aggregate. A mart that’s just SELECT * FROM int__order__enriched with a different name. If you’re not changing the grain or formatting for a consumer, you probably don’t need a mart.

One giant intermediate model. Joining fifteen tables in a single intermediate model. Build intermediate models around business entities (order, customer, product), not around “all the data I might need.”

Unclear ownership. Two intermediate models that both calculate customer__lifetime_value differently. Pick one source of truth. Reference it everywhere.

Start Simple

You don’t need all three layers from day one. If you have a small project with straightforward sources, start with base and marts. Add intermediate models when you find yourself duplicating joins or calculations across multiple marts.

The pattern scales up as your project grows. A mature project might have:

  • 30 base models (one per source table)
  • 10 intermediate models (one per business entity)
  • 20 marts (various reporting, activation, and ML use cases)

The ratios don’t matter. What matters is that each layer has clear responsibilities, and everyone on your team knows where to find things.

When someone asks “where does customer lifetime value get calculated?” the answer should be obvious: int__customer__enriched. Not “somewhere in the marts” or “I think there are a few different versions.”

That predictability is the whole point.