The three-layer pattern—base, intermediate, marts—organizes a dbt project so each layer has a defined job with clear boundaries. Each layer has a specific job. Blurred boundaries lead to business logic in base models, aggregations in intermediate layers, and marts that duplicate each other.
The Three Layers at a Glance
Base: Clean raw data. Rename columns, cast types, handle source quirks. No business logic, no joins.
Intermediate: Combine and enrich. Join base models together, add calculations, apply business rules. Preserve every row—never aggregate here.
Marts: Aggregate for consumption. Build tables for specific use cases: dashboards, reverse ETL, ML models. This is where GROUP BY belongs.
Think of it as a pipeline: source data → cleaned foundation → enriched entities → ready-to-consume tables.
Base Layer: The Foundation
Base models make raw source data usable without interpreting it.
What belongs in base models:
- Renaming columns to consistent naming standards
- Casting data types (string timestamps to proper timestamps, integers to booleans)
- Handling source-specific quirks: unnesting nested fields, deduplicating on primary key
- Filtering out data you never want (test records, soft-deleted rows)
- Unit conversions (milliseconds to seconds, cents to dollars)
What doesn’t belong:
- Joins (with one narrow exception: if your source split what’s logically one table into multiple—like orders and order_metadata—correcting this in base is acceptable)
- Business logic or calculations
- Any form of aggregation
Base models should be purely mechanical: someone unfamiliar with the business domain should be able to read the SQL and understand the transformation quickly.
A typical base model for an e-commerce source might rename user_id to customer__id, cast created_at to a proper timestamp, deduplicate on order ID, and filter out test orders. It doesn’t calculate margins, categorize orders, or flag high-value customers. Those are intermediate concerns.
Why This Matters
Base models act as a contract between your dbt project and the raw data. When a source schema changes, only base models break—not everything downstream. When you need to debug a calculation, tracing it back to base is straightforward because base is trustworthy.
Intermediate Layer
Intermediate models join entities, encode business logic, and add calculated fields.
The critical constraint: you never reduce the grain. If you’re building an order-enriched model, every row remains one order. If you’re building a session-enriched model, every row remains one event or session. You add columns, never fewer rows.
What belongs here:
- Joining base models to create enriched entities
- Sessionization logic (grouping events into cohesive units)
- Business calculations (margin rates, customer tiers, recency flags)
- Window functions (sequence numbers, running totals, rankings)
- Complex deduplication logic (merging customer records from CRM and billing)
What doesn’t belong:
- Aggregations that reduce row count as a final output (with one exception: lookup tables built temporarily to enrich back to the original grain)
Consider an order example: you join base__orders with base__customers, add a margin_usd calculated field, assign each order a customer__order_number using window functions, and compute order__is_first_order. The result has the same number of rows as the orders base model—it’s just wider and smarter.
For aggregation-heavy intermediate models, the one exception applies. Imagine you need customer-level metrics (order count, total spent) to enrich an order-grain model. You’d build a CTE that aggregates orders to customer level, then LEFT JOIN it back to the orders table. The final output is still one row per order; the aggregation was just an intermediate step.
Entity-Centric Organization
Intermediate models should represent clean, well-defined business entities at a specific grain, not transformations. This is a critical difference. An intermediate model named int__orders__enriched is better than int__orders__with_customer_data_pivoted. The name should tell you what entity and grain you’re getting, not the SQL gymnastics used to build it.
When you have multiple intermediate models joining different entities, self-documenting names help. int__customer__customer_lj_order clearly states: this is a customer-grain model with a LEFT JOIN to orders. The name tells you the grain and join pattern without opening the SQL file.
Mart Layer: Built for Consumption
Marts are the last stop. Unlike base (organized by source system) and intermediate (organized by entity), marts are organized by use case and consumer. A mart aggregates to the grain a specific team actually needs: daily performance by channel, one row per customer with all attributes for CRM sync, feature columns for ML models.
What belongs in marts:
- Aggregations to a specific reporting grain
- Final business metrics and calculated rates (conversion rate, revenue per session)
- Consumer-specific formatting and column naming
- Denormalized wide tables ready for dashboards
The key difference: marts are built for someone. Not general-purpose. Each mart answers specific questions at specific aggregation levels. One mart might aggregate daily × channel. Another might aggregate yearly by region. Another might be one row per customer for reverse ETL.
A reporting mart example: you aggregate sessionized GA4 events to daily × source × medium grain, calculating daily session count, user count, conversions, and revenue per session. This is dashboard-ready.
An activation mart (for CRM sync) is completely different: one row per customer with all attributes the destination system expects, formatted to match their exact schema.
The aggregation happens here because this is where you’re reducing the data to what consumers actually need.
Data Flow and Why It Matters
Data flows source → base → intermediate → marts.
This lineage has implications:
-
Base models are stable. They’re close to source truth. Marts depend on them indirectly, but a break in base doesn’t immediately break all marts because intermediate acts as a buffer.
-
Intermediate models are reusable. Multiple marts can reference the same intermediate. If you need customer metrics in both a dashboard mart and an activation mart, both reference
int__customer__enriched. Change it once, both marts see it. This is why intermediate exists. -
Marts are specialized. Because they’re use-case-specific, they can be aggressive about formatting, aggregation, and column selection. They’re not trying to serve everyone; they’re trying to solve one problem really well.
This structure scales. A small project might skip intermediate entirely (base → marts). A large project might have 30 base models (one per source), 10 intermediate models (one per entity), and 20 marts (various consumers).
Common Mistakes
Aggregations in intermediate: Building int__daily_sales that groups by date. Stop. If it reduces rows, it’s a mart. Intermediate should preserve grain.
Business logic in base: Calculating customer segments or value tiers in a staging model. Move it to intermediate. Base should be mechanical.
Marts that don’t aggregate: SELECT * FROM int__order__enriched renamed and called a mart. If you’re not changing the grain or formatting for a specific consumer, you probably don’t need a mart.
Unclear ownership: Two intermediate models that each calculate lifetime value differently. Pick one. Make it the source of truth. Everything else references it.
Skipping intermediate when it’s needed: Marts with 10 JOINs and repeated logic across multiple models. This signals you need intermediate layers to centralize the joins and logic.
When to Add or Skip Layers
Not every project needs all three.
Simple projects (< 20 models): Base + marts may suffice. If your marts are straightforward aggregations of single base models, intermediate adds overhead.
Growing projects: Once you’re joining 3+ base models in multiple places or duplicating join logic across marts, extract that logic to intermediate. Once mart build times slow, consider incremental intermediate models.
Large projects (200+ models): All three layers become essential. You’ll likely have organized subfolders: base/stripe/, intermediate/customer/, marts/finance/.
The decision framework: If you’re copying and pasting JOIN logic between marts, you need an intermediate model.
Naming Clarity
Use prefixes to make layers obvious: base__, int__, mrt__. Include the source for base (base__stripe__payment), the entity for intermediate (int__session, int__customer__customer_lj_order), and the domain for marts (mrt__marketing__campaign_performance).
The intermediate enriched pattern int__[primary]__[entity]_[join]_[entity] is verbose but self-documenting. int__customer__customer_lj_order tells you the grain, what’s joined, and how.
For detailed implementation examples with full SQL code, see dbt Base Layer Patterns, dbt Intermediate Layer Patterns, and dbt Mart Layer Patterns. For project structure guidance, see dbt Project Structure and Naming.