Intermediate models join entities, add calculated fields, and encode business rules. The critical constraint: grain never reduces. If a base model has one row per order, the intermediate model also has one row per order — wider and enriched, but not aggregated. Aggregating rows crosses into mart territory.
What Belongs in Intermediate Models
Joining base models together. An order needs customer data. A session needs traffic source data. A payment needs subscription context. These joins happen in intermediate, creating enriched entity models that multiple downstream marts can reference.
Business calculations. Margin rates, value tiers, categorizations, derived flags. This is where you interpret the data. A CASE WHEN that classifies orders as high, medium, or low value based on amount thresholds is intermediate work.
Window functions. Customer order numbers, running totals, rankings, first/last flags. These are window function patterns that add context without changing the grain. ROW_NUMBER() OVER (PARTITION BY customer__id ORDER BY order__created_at) gives you customer__order_number — the order’s position in a customer’s history.
Derived boolean flags. order__is_first_order, customer__is_high_value, session__has_conversion. These pre-computed flags make downstream queries simple and consistent. Every mart that needs to know if an order is a first order references the same intermediate field rather than re-implementing the logic.
What Doesn’t Belong
Aggregations that reduce row count. No GROUP BY as the final output. If your model ends with fewer rows than it started with, it belongs in a mart.
Consumer-specific formatting. Renaming columns to match what Braze or Salesforce expects, selecting only the fields a dashboard needs — that’s mart work. Intermediate models serve the project, not external consumers.
The One Exception: Lookup Aggregations
Sometimes you need to aggregate data to create a lookup, then join that lookup back to your main grain. This is the one place where GROUP BY appears in intermediate models — and it’s fine, because the final output preserves the original grain.
Consider a customer-level intermediate model. You need order metrics (total orders, total spent, last order date) attached to each customer. The aggregation happens in a CTE, but the final output is still one row per customer:
WITH base__customers AS ( SELECT customer__id, customer__email FROM {{ ref('base__shopify__customers') }}),
customer_order_summary AS ( SELECT customer__id, COUNT(DISTINCT order__id) AS customer__orders, SUM(order__amount_usd) AS customer__total_spent_usd, MAX(order__created_at) AS customer__last_ordered_at FROM {{ ref('int__order__enriched') }} WHERE order__is_completed = TRUE GROUP BY 1),
joined AS ( SELECT base__customers.customer__id, base__customers.customer__email, COALESCE(customer_order_summary.customer__orders, 0) AS customer__orders, COALESCE(customer_order_summary.customer__total_spent_usd, 0) AS customer__total_spent_usd, customer_order_summary.customer__last_ordered_at FROM base__customers LEFT JOIN customer_order_summary USING (customer__id))
SELECT customer__id, customer__email, customer__orders, customer__total_spent_usd, customer__last_ordered_atFROM joinedThe grain is one row per customer. The GROUP BY in the CTE created a lookup that got joined back to the customer base. The final result doesn’t reduce rows relative to the base customer table. That’s what makes this acceptable.
Full Intermediate Model Example
This order-enriched model shows the standard patterns: join base models, add calculations, apply window functions, preserve grain.
{{ config( materialized='table', tags=['intermediate', 'order']) }}
WITH base__orders AS ( SELECT order__id, customer__id, order__amount_usd, order__created_at, order__is_completed, channel__name FROM {{ ref('base__shopify__orders') }}),
base__customers AS ( SELECT customer__id, customer__email, customer__segment FROM {{ ref('base__shopify__customers') }}),
joined AS ( SELECT base__orders.order__id, base__orders.customer__id, base__orders.order__amount_usd, base__orders.order__created_at, base__orders.order__is_completed, base__orders.channel__name, base__customers.customer__email, base__customers.customer__segment FROM base__orders LEFT JOIN base__customers USING (customer__id)),
enriched AS ( SELECT *, order__amount_usd * 0.7 AS order__margin_usd,
CASE WHEN order__amount_usd >= 500 THEN 'high' WHEN order__amount_usd >= 100 THEN 'medium' ELSE 'low' END AS order__value_tier,
order__amount_usd >= 100 AS order__is_significant FROM joined),
windowed AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY customer__id ORDER BY order__created_at ) AS customer__order_number,
ROW_NUMBER() OVER ( PARTITION BY customer__id ORDER BY order__created_at ) = 1 AS order__is_first_order FROM enriched)
SELECT order__id, customer__id, order__created_at, order__amount_usd, order__margin_usd, order__value_tier, order__is_completed, order__is_first_order, order__is_significant, channel__name, customer__email, customer__segment, customer__order_numberFROM windowedThe output has the same number of rows as base__shopify__orders. Every order is still there, now enriched with customer data, calculated fields, and sequence information. That’s the intermediate contract: same grain, more columns.
CTE Structure for Intermediate Models
The CTE flow follows a logical pattern:
- Import CTEs — One per base model referenced, selecting only the columns needed.
- Join CTE — Combine the base models. Use explicit
LEFT JOINrather than implicit joins. - Enrichment CTE — Add business logic, calculated fields, categorizations.
- Window CTE — Apply window functions for sequencing, rankings, flags.
- Final SELECT — List all columns explicitly in a clean order.
Simple intermediate models may combine enrichment and windowing; complex ones may have multiple join steps. The pattern is not rigid.
Entity-Centric Organization
Intermediate models organize around business entities — orders, customers, products, sessions — not around consumer-specific data bundles. A single intermediate model joining fifteen tables for one team’s purposes creates maintenance problems and reduces reusability. Entity-specific models instead:
int__order__enriched— Order grain, enriched with customer and channel dataint__customer__enriched— Customer grain, enriched with aggregated order metricsint__session__enriched— Session grain, enriched with conversion and attribution data
Each entity model is independently testable, independently debuggable, and reusable across multiple marts. When the marketing team needs order data and the finance team needs order data, both reference int__order__enriched. One source of truth.
Single definition per metric
Two intermediate models that calculate customer__lifetime_value differently (one including refunds, one not; one all-time, one 12-month) create ambiguity. One definition should be canonical in the intermediate layer; variants derive from it in the mart layer where consumer-specific adjustments belong. That predictability — “customer lifetime value is in int__customer__enriched” — is the point of the dbt Three-Layer Architecture.
Testing Intermediate Models
Intermediate models benefit from specific testing patterns:
- Primary key tests (
unique+not_null) on the entity’s grain column to confirm the grain wasn’t accidentally changed by a join - Relationship tests to verify joins didn’t produce orphaned records
- Row count comparisons (via
dbt_utils.equal_rowcount) against the base model to confirm the grain is preserved - Unit tests for complex business logic — value tier classifications, first-order flags, and similar
CASE WHENlogic benefit from mocked input verification
The row count test is especially important. If int__order__enriched has more rows than base__shopify__orders, a join created a fan-out. If it has fewer, a filter or inner join dropped records. Either way, the grain contract is broken.