Base, Intermediate, Marts: When to Use Each Layer

Most dbt projects I audit have the same problem. Not missing tests or poor documentation (though those exist too). The real issue is blurred responsibilities between layers. Business logic scattered in base models. Aggregations happening in intermediate. Marts that are just renamed intermediates.

The three-layer pattern exists for a reason. Each layer has a job. When you respect those boundaries, your project structure becomes predictable. When you don’t, you end up with a codebase where nobody knows where to look for anything.

The Quick Reference

Before the nuance, here’s the summary:

Layer	Prefix	What It Does	Aggregation
Base	`base__`	Rename, cast, light cleanup	Never
Intermediate	`int__`	Joins, business logic, full grain	Never
Mart	`mrt__`	Aggregate for consumption	Yes

Base models clean raw data without changing its meaning. Intermediate models combine and enrich that data while preserving every row. Marts aggregate to the grain your consumers actually need.

Base Layer: Keep It Boring

Base models are the least interesting part of your project, and that’s the point. Their job is to make raw source data usable without adding interpretation.

What belongs here:

Renaming columns to your naming standards (for example, user_id becomes customer__id)
Casting data types (strings to timestamps, integers to booleans)
Deduplicating when sources have duplicate records
Unnesting nested or repeated fields (common with GA4, JSON payloads)
Filtering out test data or invalid records

What doesn’t belong here:

Joins (with one exception I’ll explain)
Business logic or calculations
Aggregations of any kind

The exception on joins: some ETL tools split what’s logically one API endpoint into multiple tables. If your source has orders and order_metadata that should have always been one table, joining them in base is fine. You’re correcting an ETL artifact, not adding business logic.

A typical base model looks like this:

{{ config(
    materialized='table',
    tags=['base', 'shopify']
) }}

WITH source AS (
    SELECT
        id,
        user_id,
        created_at,
        total,
        status,
        is_deleted,
        _loaded_at
    FROM {{ source('shopify', 'orders') }}
),

deduplicated AS (
    SELECT
        id,
        user_id,
        created_at,
        total,
        status,
        is_deleted,
        _loaded_at
    FROM source
    QUALIFY ROW_NUMBER() OVER (
        PARTITION BY id
        ORDER BY _loaded_at DESC
    ) = 1
),

renamed AS (
    SELECT
        id AS order__id,
        user_id AS customer__id,
        created_at AS order__created_at,
        CAST(total AS FLOAT64) AS order__amount_usd,
        status AS order__status,
        CAST(is_deleted AS BOOL) AS order__is_deleted,
        _loaded_at
    FROM deduplicated
)

SELECT
    order__id,
    customer__id,
    order__created_at,
    order__amount_usd,
    order__status,
    order__is_deleted,
    _loaded_at
FROM renamed
WHERE order__is_deleted = FALSE

Notice there’s no business logic here. I’m not calculating margins or categorizing orders. I’m just making the raw data clean and consistently named.

Base models should be so boring that anyone can understand them in thirty seconds.

Intermediate Layer: Where Business Logic Lives

Intermediate models are where your project’s actual intelligence resides. This is where you join base models together, add calculated fields, and encode business rules. The critical constraint: you never reduce the grain.

What belongs here:

Joining base models together
Adding business calculations (margin rates, categorizations)
Window functions (customer order number, running totals, rankings)
Derived boolean flags (order__is_first_order, customer__is_high_value)

What doesn’t belong here:

Aggregations that reduce row count
GROUP BY as a final output (with one exception, explained below)

The exception on aggregations: sometimes you need to aggregate to create a lookup, then join that lookup back to your main grain. That’s fine, in the end you are not changing the grain of the table.

An example that shows the pattern:

{{ config(
    materialized='table',
    tags=['intermediate', 'order']
) }}

WITH base__orders AS (
    SELECT
        order__id,
        customer__id,
        order__amount_usd,
        order__created_at,
        order__is_completed,
        channel__name
    FROM {{ ref('base__shopify__orders') }}
),

base__customers AS (
    SELECT
        customer__id,
        customer__email,
        customer__segment
    FROM {{ ref('base__shopify__customers') }}
),

joined AS (
    SELECT
        base__orders.order__id,
        base__orders.customer__id,
        base__orders.order__amount_usd,
        base__orders.order__created_at,
        base__orders.order__is_completed,
        base__orders.channel__name,
        base__customers.customer__email,
        base__customers.customer__segment
    FROM base__orders
    LEFT JOIN base__customers USING (customer__id)
),

enriched AS (
    SELECT
        *,
        order__amount_usd * 0.7 AS order__margin_usd,

        CASE
            WHEN order__amount_usd >= 500 THEN 'high'
            WHEN order__amount_usd >= 100 THEN 'medium'
            ELSE 'low'
        END AS order__value_tier,

        order__amount_usd >= 100 AS order__is_significant
    FROM joined
),

windowed AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY customer__id
            ORDER BY order__created_at
        ) AS customer__order_number,

        ROW_NUMBER() OVER (
            PARTITION BY customer__id
            ORDER BY order__created_at
        ) = 1 AS order__is_first_order
    FROM enriched
)

SELECT
    order__id,
    customer__id,
    order__created_at,
    order__amount_usd,
    order__margin_usd,
    order__value_tier,
    order__is_completed,
    order__is_first_order,
    order__is_significant,
    channel__name,
    customer__email,
    customer__segment,
    customer__order_number
FROM windowed

The output has the same number of rows as base__shopify__orders. Every order is still there, now enriched with customer data, calculated fields, and sequence information.

For customer-level intermediate models, you might aggregate order data to bring metrics back to the customer grain. That’s the lookup pattern I mentioned:

WITH base__customers AS (
    SELECT
        customer__id,
        customer__email
    FROM {{ ref('base__shopify__customers') }}
),

customer_order_summary AS (
    SELECT
        customer__id,
        COUNT(DISTINCT order__id) AS customer__orders,
        SUM(order__amount_usd) AS customer__total_spent_usd,
        MAX(order__created_at) AS customer__last_ordered_at
    FROM {{ ref('int__order__enriched') }}
    WHERE order__is_completed = TRUE
    GROUP BY 1
),

joined AS (
    SELECT
        base__customers.customer__id,
        base__customers.customer__email,
        COALESCE(customer_order_summary.customer__orders, 0) AS customer__orders,
        COALESCE(customer_order_summary.customer__total_spent_usd, 0) AS customer__total_spent_usd,
        customer_order_summary.customer__last_ordered_at
    FROM base__customers
    LEFT JOIN customer_order_summary USING (customer__id)
)

SELECT
    customer__id,
    customer__email,
    customer__orders,
    customer__total_spent_usd,
    customer__last_ordered_at
FROM joined

The grain is still one row per customer. The aggregation happened in a CTE to create a lookup, then got joined back to the customer base. This is the one place where GROUP BY appears in intermediate models.

Mart Layer: Built for Consumers

Marts exist for specific use cases. Unlike base and intermediate models, which serve the project internally, marts serve external consumers: dashboards, reverse ETL, ML pipelines.

Each mart has a specific consumer and a defined grain. You’re not building “general purpose” tables. You’re answering specific questions at specific aggregation levels.

The three main patterns:

Reporting marts aggregate to the grain your dashboards need. Daily performance by channel. Weekly cohort retention. Monthly revenue by segment. For BigQuery projects, consider how partitioning and clustering affect these aggregation patterns.

Activation marts prepare data for reverse ETL tools destination like Braze or Salesforce. Usually one row per entity, with all the attributes the destination system expects.

ML marts create feature tables for machine learning. Wide format, temporal features, one row per prediction target.

A reporting mart example:

{{ config(
    materialized='table',
    partition_by={
        "field": "date_day",
        "data_type": "date",
        "granularity": "day"
    },
    cluster_by=['channel__name'],
    tags=['mart', 'reporting', 'marketing']
) }}

WITH int__orders AS (
    SELECT
        order__id,
        order__created_at,
        order__amount_usd,
        customer__id,
        channel__name,
        order__margin_usd,
        order__is_first_order,
        order__is_completed
    FROM {{ ref('int__order__enriched') }}
    WHERE order__is_completed = TRUE
),

aggregated AS (
    SELECT
        DATE(order__created_at) AS date_day,
        channel__name,
        COUNT(DISTINCT order__id) AS orders,
        COUNT(DISTINCT customer__id) AS customers,
        SUM(CASE WHEN order__is_first_order THEN 1 ELSE 0 END) AS new_customers,
        SUM(order__amount_usd) AS revenue_usd,
        SUM(order__margin_usd) AS margin_usd,
        SUM(CASE WHEN order__is_first_order THEN order__amount_usd ELSE 0 END) AS new_customer_revenue_usd
    FROM int__orders
    GROUP BY ALL
),

enriched AS (
    SELECT
        *,
        SAFE_DIVIDE(margin_usd, revenue_usd) AS margin_rate,
        SAFE_DIVIDE(revenue_usd, orders) AS avg_order_value_usd,
        SAFE_DIVIDE(new_customers, customers) AS new_customer_rate
    FROM aggregated
)

SELECT
    date_day,
    channel__name,
    orders,
    customers,
    new_customers,
    revenue_usd,
    margin_usd,
    margin_rate,
    avg_order_value_usd,
    new_customer_revenue_usd,
    new_customer_rate
FROM enriched

This is the first time we’ve seen GROUP BY as the final output. The mart aggregates to daily + channel grain because that’s what the marketing dashboard needs. Not order-level, not hourly. Daily by channel.

An activation mart looks different. This one is built for a CRM tool:

{{ config(
    materialized='table',
    tags=['mart', 'activation', 'braze']
) }}

WITH int__customers AS (
    SELECT
        customer__id,
        customer__email,
        customer__segment,
        customer__orders,
        customer__total_spent_usd,
        customer__last_ordered_at
    FROM {{ ref('int__customer__enriched') }}
),

enriched AS (
    SELECT
        customer__id,
        customer__email,
        customer__segment,
        customer__orders,
        customer__total_spent_usd,
        customer__last_ordered_at,

        CASE
            WHEN customer__orders = 0 THEN 'prospect'
            WHEN customer__orders = 1 THEN 'new'
            WHEN customer__last_ordered_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) THEN 'active'
            WHEN customer__last_ordered_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY) THEN 'lapsing'
            ELSE 'churned'
        END AS customer__lifecycle_stage,

        customer__orders >= 5 AS customer__is_vip,
        customer__total_spent_usd >= 1000 AS customer__is_high_ltv
    FROM int__customers
)

SELECT
    customer__email AS email,
    customer__id AS external_id,
    customer__segment,
    customer__lifecycle_stage AS lifecycle_stage,
    customer__orders AS orders,
    customer__total_spent_usd AS total_spent_usd,
    customer__is_vip AS is_vip,
    customer__is_high_ltv AS is_high_ltv
FROM enriched

The column names match what Braze expects. The table exposes only what the destination system needs, nothing more.

The Decision Framework

When you’re not sure where something belongs, ask these questions:

Am I just cleaning source data? Base layer. Renaming, casting, deduplicating, unnesting. No interpretation.

Am I combining tables or adding calculations while keeping every row? Intermediate layer. Joins, business logic, window functions. Preserve the grain.

Am I reducing rows for a specific consumer? Mart layer. Aggregations, final formatting, consumer-specific column names.

A more detailed reference:

Task	Layer
Rename columns from source	Base
Cast data types	Base
Unnest nested/repeated fields	Base
Deduplicate source data	Base
Join tables together	Intermediate
Add business logic/calculations	Intermediate
Window functions (rankings, running totals)	Intermediate
Aggregate to reporting grain	Mart
Build for specific consumer	Mart

Common Mistakes I See

Business logic in base models. Calculating is_high_value or customer_tier in base. This belongs in intermediate. Base models should be simple. A good testing strategy helps catch when logic ends up in the wrong layer.

Aggregations in intermediate. Building a int__daily_orders that groups by date. This belongs in a mart. Intermediate preserves grain. If you need to aggregate time-series data efficiently, consider incremental models to avoid reprocessing everything.

Marts that don’t aggregate. A mart that’s just SELECT * FROM int__order__enriched with a different name. If you’re not changing the grain or formatting for a consumer, you probably don’t need a mart.

One giant intermediate model. Joining fifteen tables in a single intermediate model. Build intermediate models around business entities (order, customer, product), not around “all the data I might need.”

Unclear ownership. Two intermediate models that both calculate customer__lifetime_value differently. Pick one source of truth. Reference it everywhere.

Start Simple

You don’t need all three layers from day one. If you have a small project with straightforward sources, start with base and marts. Add intermediate models when you find yourself duplicating joins or calculations across multiple marts. Understanding the difference between unit tests and data tests also helps validate that each layer is doing its job correctly.

The pattern scales up as your project grows. A mature project might have:

30 base models (one per source table)
10 intermediate models (one per business entity)
20 marts (various reporting, activation, and ML use cases)

The ratios don’t matter. What matters is that each layer has clear responsibilities, and everyone on your team knows where to find things.

When someone asks “where does customer lifetime value get calculated?” the answer should be obvious: int__customer__enriched. Not “somewhere in the marts” or “I think there are a few different versions.”

That predictability is the whole point.