ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Intermediate Layer Patterns

What belongs in dbt intermediate models — joins, business logic, window functions — and the critical rule of never reducing grain.

Planted
dbtdata modelingdata engineering

Intermediate models join entities, add calculated fields, and encode business rules. The critical constraint: grain never reduces. If a base model has one row per order, the intermediate model also has one row per order — wider and enriched, but not aggregated. Aggregating rows crosses into mart territory.

What Belongs in Intermediate Models

Joining base models together. An order needs customer data. A session needs traffic source data. A payment needs subscription context. These joins happen in intermediate, creating enriched entity models that multiple downstream marts can reference.

Business calculations. Margin rates, value tiers, categorizations, derived flags. This is where you interpret the data. A CASE WHEN that classifies orders as high, medium, or low value based on amount thresholds is intermediate work.

Window functions. Customer order numbers, running totals, rankings, first/last flags. These are window function patterns that add context without changing the grain. ROW_NUMBER() OVER (PARTITION BY customer__id ORDER BY order__created_at) gives you customer__order_number — the order’s position in a customer’s history.

Derived boolean flags. order__is_first_order, customer__is_high_value, session__has_conversion. These pre-computed flags make downstream queries simple and consistent. Every mart that needs to know if an order is a first order references the same intermediate field rather than re-implementing the logic.

What Doesn’t Belong

Aggregations that reduce row count. No GROUP BY as the final output. If your model ends with fewer rows than it started with, it belongs in a mart.

Consumer-specific formatting. Renaming columns to match what Braze or Salesforce expects, selecting only the fields a dashboard needs — that’s mart work. Intermediate models serve the project, not external consumers.

The One Exception: Lookup Aggregations

Sometimes you need to aggregate data to create a lookup, then join that lookup back to your main grain. This is the one place where GROUP BY appears in intermediate models — and it’s fine, because the final output preserves the original grain.

Consider a customer-level intermediate model. You need order metrics (total orders, total spent, last order date) attached to each customer. The aggregation happens in a CTE, but the final output is still one row per customer:

WITH base__customers AS (
SELECT
customer__id,
customer__email
FROM {{ ref('base__shopify__customers') }}
),
customer_order_summary AS (
SELECT
customer__id,
COUNT(DISTINCT order__id) AS customer__orders,
SUM(order__amount_usd) AS customer__total_spent_usd,
MAX(order__created_at) AS customer__last_ordered_at
FROM {{ ref('int__order__enriched') }}
WHERE order__is_completed = TRUE
GROUP BY 1
),
joined AS (
SELECT
base__customers.customer__id,
base__customers.customer__email,
COALESCE(customer_order_summary.customer__orders, 0) AS customer__orders,
COALESCE(customer_order_summary.customer__total_spent_usd, 0) AS customer__total_spent_usd,
customer_order_summary.customer__last_ordered_at
FROM base__customers
LEFT JOIN customer_order_summary USING (customer__id)
)
SELECT
customer__id,
customer__email,
customer__orders,
customer__total_spent_usd,
customer__last_ordered_at
FROM joined

The grain is one row per customer. The GROUP BY in the CTE created a lookup that got joined back to the customer base. The final result doesn’t reduce rows relative to the base customer table. That’s what makes this acceptable.

Full Intermediate Model Example

This order-enriched model shows the standard patterns: join base models, add calculations, apply window functions, preserve grain.

{{ config(
materialized='table',
tags=['intermediate', 'order']
) }}
WITH base__orders AS (
SELECT
order__id,
customer__id,
order__amount_usd,
order__created_at,
order__is_completed,
channel__name
FROM {{ ref('base__shopify__orders') }}
),
base__customers AS (
SELECT
customer__id,
customer__email,
customer__segment
FROM {{ ref('base__shopify__customers') }}
),
joined AS (
SELECT
base__orders.order__id,
base__orders.customer__id,
base__orders.order__amount_usd,
base__orders.order__created_at,
base__orders.order__is_completed,
base__orders.channel__name,
base__customers.customer__email,
base__customers.customer__segment
FROM base__orders
LEFT JOIN base__customers USING (customer__id)
),
enriched AS (
SELECT
*,
order__amount_usd * 0.7 AS order__margin_usd,
CASE
WHEN order__amount_usd >= 500 THEN 'high'
WHEN order__amount_usd >= 100 THEN 'medium'
ELSE 'low'
END AS order__value_tier,
order__amount_usd >= 100 AS order__is_significant
FROM joined
),
windowed AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY customer__id
ORDER BY order__created_at
) AS customer__order_number,
ROW_NUMBER() OVER (
PARTITION BY customer__id
ORDER BY order__created_at
) = 1 AS order__is_first_order
FROM enriched
)
SELECT
order__id,
customer__id,
order__created_at,
order__amount_usd,
order__margin_usd,
order__value_tier,
order__is_completed,
order__is_first_order,
order__is_significant,
channel__name,
customer__email,
customer__segment,
customer__order_number
FROM windowed

The output has the same number of rows as base__shopify__orders. Every order is still there, now enriched with customer data, calculated fields, and sequence information. That’s the intermediate contract: same grain, more columns.

CTE Structure for Intermediate Models

The CTE flow follows a logical pattern:

  1. Import CTEs — One per base model referenced, selecting only the columns needed.
  2. Join CTE — Combine the base models. Use explicit LEFT JOIN rather than implicit joins.
  3. Enrichment CTE — Add business logic, calculated fields, categorizations.
  4. Window CTE — Apply window functions for sequencing, rankings, flags.
  5. Final SELECT — List all columns explicitly in a clean order.

Simple intermediate models may combine enrichment and windowing; complex ones may have multiple join steps. The pattern is not rigid.

Entity-Centric Organization

Intermediate models organize around business entities — orders, customers, products, sessions — not around consumer-specific data bundles. A single intermediate model joining fifteen tables for one team’s purposes creates maintenance problems and reduces reusability. Entity-specific models instead:

  • int__order__enriched — Order grain, enriched with customer and channel data
  • int__customer__enriched — Customer grain, enriched with aggregated order metrics
  • int__session__enriched — Session grain, enriched with conversion and attribution data

Each entity model is independently testable, independently debuggable, and reusable across multiple marts. When the marketing team needs order data and the finance team needs order data, both reference int__order__enriched. One source of truth.

Single definition per metric

Two intermediate models that calculate customer__lifetime_value differently (one including refunds, one not; one all-time, one 12-month) create ambiguity. One definition should be canonical in the intermediate layer; variants derive from it in the mart layer where consumer-specific adjustments belong. That predictability — “customer lifetime value is in int__customer__enriched” — is the point of the dbt Three-Layer Architecture.

Testing Intermediate Models

Intermediate models benefit from specific testing patterns:

  • Primary key tests (unique + not_null) on the entity’s grain column to confirm the grain wasn’t accidentally changed by a join
  • Relationship tests to verify joins didn’t produce orphaned records
  • Row count comparisons (via dbt_utils.equal_rowcount) against the base model to confirm the grain is preserved
  • Unit tests for complex business logic — value tier classifications, first-order flags, and similar CASE WHEN logic benefit from mocked input verification

The row count test is especially important. If int__order__enriched has more rows than base__shopify__orders, a join created a fan-out. If it has fewer, a filter or inner join dropped records. Either way, the grain contract is broken.