ServicesAboutNotesContact Get in touch →
EN FR
Note

MetricFlow semantic model components

The three building blocks of a MetricFlow semantic model: entities (join keys), dimensions (group-by columns), and measures (numeric aggregations that feed metrics).

Planted
dbtdata modelinganalytics

A semantic model is a YAML layer that describes a dbt model to MetricFlow. It has a one-to-one relationship with a dbt SQL or Python model and annotates that model with three types of information: entities (how tables join), dimensions (how data can be sliced), and measures (what can be aggregated). Everything else in the dbt semantic layer builds on top of these three components.

The full structure looks like this:

semantic_models:
- name: orders
description: Order transactions with revenue and quantity
model: ref('mrt__sales__orders')
defaults:
agg_time_dimension: order__created_at
entities:
- name: order_id
type: primary
- name: customer_id
type: foreign
- name: product_id
type: foreign
dimensions:
- name: order__created_at
type: time
type_params:
time_granularity: day
- name: order__status
type: categorical
- name: order__channel
type: categorical
measures:
- name: order_total
agg: sum
expr: order__amount
description: Sum of order amounts
- name: order_count
agg: count
expr: order_id
- name: distinct_customers
agg: count_distinct
expr: customer_id

The model field points to the mart layer model. Semantic models should reference marts, not base or intermediate models. Marts are stable interfaces; intermediate models get refactored without warning. More on this in Metric Organization in dbt Projects.

Entities

Entities are join keys. They tell MetricFlow how semantic models connect to each other, forming the semantic graph that makes automatic joins possible. You define an entity once, and MetricFlow uses it to navigate between tables when a query spans multiple semantic models.

Four entity types exist:

  • primary — one record per row, no nulls. This is the grain of the semantic model, equivalent to a primary key. Every semantic model must have exactly one.
  • unique — one per row, nulls allowed. Use this when a column uniquely identifies rows but can be absent.
  • foreign — zero to many instances. A customer_id in an orders table is foreign because many orders share the same customer.
  • natural — columns that uniquely identify records based on real-world data, like country codes or product SKUs.

The primary entity is the most consequential. MetricFlow uses it to determine the grain of the model and validate that joins make sense. If a semantic model has no primary entity, or has two, MetricFlow will reject it.

When you query metrics from two different semantic models — say, revenue from orders and customer count from customers — MetricFlow looks at the entities to find a shared join path. If orders has customer_id as a foreign entity and customers has customer_id as a primary entity, MetricFlow knows how to join them. You do not write the JOIN. You just request the metrics.

Dimensions

Dimensions are the columns you group by, filter on, and slice your metrics with. Two types exist: time and categorical.

Time dimensions are required for any time-based querying. The agg_time_dimension in the defaults block specifies which time dimension MetricFlow uses when you query metric_time — the built-in time dimension that all time-series queries go through.

dimensions:
- name: order__created_at
type: time
type_params:
time_granularity: day

The time_granularity setting controls the finest level at which this dimension operates. A day granularity means queries can group by day, week, month, quarter, or year. MetricFlow handles the truncation automatically at query time.

Categorical dimensions are anything else you group by:

dimensions:
- name: order__status
type: categorical
- name: order__channel
type: categorical

The naming convention with double underscores (order__status) comes from MetricFlow’s approach to dimension resolution. When you reference a dimension in a filter or group-by, you use the entity name and dimension name together: orders__order__status. This makes explicit which semantic model the dimension comes from, which matters when multiple models share dimension names.

A dimension with no primary entity in its semantic model will not appear in query results. Every dimension needs a primary entity to anchor it.

Measures

Measures are numeric aggregations defined on columns of the underlying model. They are the building blocks that metrics reference. The distinction between measures and metrics matters: measures define the aggregation mechanics, metrics define the business meaning.

measures:
- name: order_total
agg: sum
expr: order__amount
description: Sum of order amounts
- name: order_count
agg: count
expr: order_id
- name: distinct_customers
agg: count_distinct
expr: customer_id

The expr field is the column name (or a SQL expression) from the underlying model. The agg field is the aggregation function MetricFlow applies.

Supported aggregations: sum, count, count_distinct, avg, min, max, median, and percentile. The last two are less commonly needed but useful for latency metrics and SLA tracking.

Measures should stay close to raw aggregations. Avoid encoding business logic at the measure level — that belongs in metric filters. A measure named order_total that aggregates order__amount is reusable for enterprise_revenue, smb_revenue, mtd_revenue, and any other metric that needs total order amounts. A measure named enterprise_order_total with an embedded filter serves only one purpose.

The defaults.agg_time_dimension field at the semantic model level sets the default time dimension for all measures in that model. MetricFlow uses this when generating time-series queries and needs to know which date column to group by.

How they compose

The three components work together to form a navigable graph. Entities connect models. Dimensions enable filtering. Measures feed metrics. A query like “show me revenue by channel for the past 30 days” resolves as:

  1. Find the revenue metric
  2. Trace it to the order_total measure on the orders semantic model
  3. Find the order__channel dimension on orders
  4. Apply the time filter via order__created_at
  5. Generate SQL: SELECT order__channel, SUM(order__amount) FROM orders WHERE order__created_at >= ...

You define the components once. MetricFlow handles the SQL generation. When your mart model changes column names, you update the semantic model YAML and every metric that depends on it updates automatically — no dashboard-hunting required.