ServicesAboutNotesContact Get in touch →
EN FR
Note

Metric Organization in dbt Projects

How to organize semantic models and metrics in dbt — co-located vs parallel subfolder structures, the one-primary-entity rule, and scaling patterns for large projects

Planted
dbtdata modelinganalytics

The key decisions for organizing semantic model and metric YAML files are where they live relative to your SQL models, and how semantic models map to your existing dbt layers. A co-located approach works for small projects; larger projects benefit from a parallel subfolder structure.

Co-located Structure

For projects with fewer than 20 metrics, keep semantic models and metrics together with the SQL model they describe:

models/
marts/
mrt__finance__orders.sql
mrt__finance__orders.yml # semantic model + metrics
mrt__sales__customers.sql
mrt__sales__customers.yml # semantic model + metrics

The YAML file contains both the semantic model definition and any metrics built from it. This mirrors the standard dbt convention of co-locating model configs with SQL files.

The advantage is simplicity. When you open a model, you see its metrics right there. When you modify the SQL, you immediately see which metrics might be affected.

The limitation is that metrics often span multiple semantic models. A customer_lifetime_value metric might reference measures from both orders and customers. In a co-located structure, where does that metric live? You end up making arbitrary placement decisions that break the “find it in 10 seconds” rule.

Parallel Sub-folder Structure

For larger projects, separate semantic models from metrics and organize by domain:

models/
marts/
mrt__finance__orders.sql
mrt__sales__customers.sql
semantic_models/
orders.yml
customers.yml
metrics/
revenue_metrics.yml
customer_metrics.yml
conversion_metrics.yml

This structure scales because cross-model metrics have a natural home. customer_lifetime_value goes in customer_metrics.yml regardless of which semantic models it references. Revenue metrics that combine orders and refund data go in revenue_metrics.yml.

The domain-based metric file grouping mirrors how users think about metrics. “Where are the revenue metrics?” has an obvious answer: metrics/revenue_metrics.yml. This aligns with the prefix-based naming convention where all revenue metrics start with revenue_.

The trade-off is indirection. Modifying mrt__finance__orders.sql requires checking a separate directory for affected semantic models and metrics. In a co-located structure, you see the impact immediately.

The One Primary Entity Rule

Each semantic model should have exactly one primary entity, typically aligned with one of your mart-layer models:

semantic_models:
- name: orders
defaults:
agg_time_dimension: ordered_at
model: ref('mrt__finance__orders')
entities:
- name: order
type: primary
- name: customer
type: foreign
- name: product
type: foreign

The primary entity (order) identifies what each row represents. Foreign entities (customer, product) enable joins to other semantic models. MetricFlow uses these entity relationships to build the semantic graph — the map of how your data connects.

This constraint keeps the graph navigable. If a semantic model has two primary entities, MetricFlow cannot determine the grain of the model, and joins become ambiguous. One row, one primary entity, no exceptions.

The mapping is usually straightforward: your mart mrt__finance__orders maps to semantic model orders with primary entity order. Your mart mrt__sales__customers maps to semantic model customers with primary entity customer. The semantic model is a thin layer on top of the mart, adding entity, dimension, and measure annotations to the columns that already exist.

Semantic Models Map to Marts

Semantic models should reference mart-layer models, not base or intermediate models. This is consistent with the three-layer architecture: marts are the consumption layer. Semantic models sit on top of that consumption layer, adding metadata for the semantic layer.

Pointing a semantic model at an intermediate model creates fragile coupling. Intermediate models are internal to your dbt project — they can be refactored, split, or merged without notice to downstream consumers. Marts, by contrast, are stable interfaces with defined consumers.

# Good: semantic model on a mart
semantic_models:
- name: orders
model: ref('mrt__finance__orders')
# Bad: semantic model on an intermediate
semantic_models:
- name: orders
model: ref('int__order__order_lj_customer')

The intermediate model might change its grain or get renamed during a refactor. The mart is a contract with its consumers.

Validation in CI

However you organize your files, validate the semantic layer in CI:

.github/workflows/dbt.yml
- name: Validate semantic layer
run: dbt sl validate
Terminal window
# Or for dbt Core:
mf validate-configs

MetricFlow validates at three levels:

  1. Parsing validation — Does the YAML follow the schema?
  2. Semantic validation — Are names unique? Do references exist? Is there exactly one primary entity?
  3. Data platform validation — Do the referenced columns exist in physical tables?

Add --verbose-issues --show-all when debugging failures. Validation catches broken references, duplicate names, and missing entities before they reach production.

This is especially important with the parallel subfolder structure, where a change to a semantic model in one directory can break metrics defined in another. CI validation is the safety net that makes the separation of files safe.

When to Restructure

The signal to move from co-located to parallel structure is not a specific metric count. It is the moment you find yourself putting a metric in a file where it does not logically belong because “it has to go somewhere.” That is the co-located structure breaking down.

A project with 30 metrics that all map 1-to-1 to single semantic models works fine with co-location. A project with 15 metrics where half of them are derived or ratio metrics spanning multiple semantic models needs the parallel structure from day one.

Start co-located. Migrate when placement decisions become arbitrary. The restructuring is a file move, not a logic change — metric and semantic model definitions stay identical regardless of which directory they live in.