ServicesAboutNotesContact Get in touch →
EN FR
Note

Entity-Centric Naming for dbt Intermediate Models

Why intermediate models should be named for the entity they represent, not the transformation they perform — and the self-documenting join notation that makes it work.

Planted
dbtdata modelingdata engineering

The official dbt documentation suggests naming intermediate models after the transformation they perform:

  • int_payments_pivoted_to_order
  • int_events_sessionized
  • int_customers_aggregated_by_account

This approach breaks down at scale, for reasons that also clarify broader intermediate-layer design principles.

Transformation-centric names

Transformation-centric names describe what the SQL does — “how was this built?” rather than “what does this represent?” With a small number of intermediate models this is workable, but with 30+ models finding the one that contains customer data joined with orders requires opening files:

  • Is it int_customers_enriched_with_orders?
  • int_orders_with_customer_data?
  • int_customers_aggregated_by_order_channel?

The names convey the SQL operation but not the grain, the entities present, or the join relationship.

Transformation-centric names also tend to encourage building models around convenience rather than concepts, leading to models like int_orders_for_the_finance_mart and int_orders_for_attribution — similar logic duplicated for different consumers, drifting over time.

Entity-Centric Naming

The alternative: name intermediate models after the entity they represent, at the grain they maintain.

There are two types of intermediate models, and each gets its own naming pattern:

Pure entity models — When you’re applying business logic to a single base model (sessionization, deduplication, complex calculations that don’t require another entity):

int__[entity]int__session, int__customer, int__conversion

Enriched entity models — When you’re joining entities together:

int__[primary_entity]__[entity1]_[join_type]_[entity2]

The join type abbreviations are:

  • lj = LEFT JOIN
  • ij = INNER JOIN
  • cj = CROSS JOIN

So: int__customer__customer_lj_order, int__session__session_lj_conversion, int__customer__customer_lj_order_lj_session.

Why the enriched naming pattern is verbose by design

The int__customer__customer_lj_order pattern is intentionally verbose.

Reading the model name, you immediately know:

  1. The grain: customer (the primary entity)
  2. What data is present: customer records enriched with order data
  3. The join relationship: left join (orders may not exist for all customers)

You don’t need to open the SQL file. You don’t need to query information_schema.columns. The name is self-documenting in exactly the dimensions that matter for downstream work.

For multiple joins, chain them:

int__customer__customer_lj_order_lj_session

This reads as: customer-grain model, left-joined to orders, left-joined to sessions. You know the grain hasn’t changed from the first word. You know you’re not missing any customers (left joins). You know what data is available.

Compare to the transformation-centric alternative: int_customers_with_orders_and_sessions_aggregated. What’s the grain? Is it customer-grain, or have sessions been aggregated to customer level? Is every customer present? You need the SQL to know.

When not to create an intermediate model

Do not create int__customer if it is identical to base__crm__customer. The intermediate layer should add value through joins, business logic, or transformations. An intermediate model that passes through the base model without changes adds a DAG node and a level of indirection without benefit.

Create int__customer when you’re:

  • Merging customer records from CRM and billing systems (identity resolution)
  • Adding derived fields (customer tier, lifetime value tiers, segment classification)
  • Deduplicating across sources
  • Applying business logic that transforms how you think about the entity

Don’t create it just because “base models should only have one source” or as a pass-through wrapper. The dbt Three-Layer Architecture isn’t about having exactly three hops for every model; it’s about having appropriate separation between raw-cleaning, entity-building, and consumption.

Organizing Intermediate Models by Entity

With entity-centric naming, folder organization follows naturally:

models/
└── intermediate/
├── _int__models.yml
├── session/
│ ├── int__session.sql
│ └── int__session__session_lj_conversion.sql
├── customer/
│ ├── int__customer.sql
│ └── int__customer__customer_lj_order.sql
├── int__unified_ad_spend.sql # Flat file for single models
└── int__attribution_touchpoint.sql

Subfolders by entity work when you have 3+ models for the same entity. Keep it flat for smaller projects. The key constraint: never organize intermediate by business domain (intermediate/marketing/, intermediate/finance/). An int__customer__customer_lj_order model serves both marketing and finance — forcing it into one domain’s folder creates artificial boundaries and makes it harder to find.

Intermediate models serve the project’s internal reuse needs, not the consumption patterns of specific teams. Teams find their models in the mart layer.

A Marketing Analytics Example

For a GA4 and ad platform setup, the intermediate layer looks like this:

intermediate/
├── session/
│ ├── int__session.sql # Sessionization of GA4 events
│ └── int__session__session_lj_conversion.sql # Sessions enriched with conversions
├── int__conversion.sql # Conversion events (entity model)
├── int__unified_ad_spend.sql # Ad spend UNIONed across platforms
└── int__attribution_touchpoint.sql # All touchpoints for attribution

Reading these names, you know:

  • int__session maintains session grain with business logic applied
  • int__session__session_lj_conversion maintains session grain, enriched with conversion data via left join (sessions without conversions are still present)
  • int__unified_ad_spend is a special case — “unified” signals a UNION across platforms, which is the entity concept

The alternative — int_events_sessionized_with_30_minute_timeout, int_sessions_with_conversion_data_left_joined — trades conciseness for noise without gaining meaningful clarity.

Naming as design discipline

Entity-centric naming requires answering “what entity does this represent?” before creating a model. If the answer is unclear, the model is likely a query extracted from a mart for convenience rather than a coherent entity. Intermediate layer entities should correspond to business concepts — sessions, customers, orders, campaigns, conversions — not to the data needs of a specific downstream report. That constraint is what makes the intermediate layer reusable.