A semantic model is a YAML layer that describes a dbt model to MetricFlow. It has a one-to-one relationship with a dbt SQL or Python model and annotates that model with three types of information: entities (how tables join), dimensions (how data can be sliced), and measures (what can be aggregated). Everything else in the dbt semantic layer builds on top of these three components.
The full structure looks like this:
semantic_models: - name: orders description: Order transactions with revenue and quantity model: ref('mrt__sales__orders') defaults: agg_time_dimension: order__created_at
entities: - name: order_id type: primary - name: customer_id type: foreign - name: product_id type: foreign
dimensions: - name: order__created_at type: time type_params: time_granularity: day - name: order__status type: categorical - name: order__channel type: categorical
measures: - name: order_total agg: sum expr: order__amount description: Sum of order amounts - name: order_count agg: count expr: order_id - name: distinct_customers agg: count_distinct expr: customer_idThe model field points to the mart layer model. Semantic models should reference marts, not base or intermediate models. Marts are stable interfaces; intermediate models get refactored without warning. More on this in Metric Organization in dbt Projects.
Entities
Entities are join keys. They tell MetricFlow how semantic models connect to each other, forming the semantic graph that makes automatic joins possible. You define an entity once, and MetricFlow uses it to navigate between tables when a query spans multiple semantic models.
Four entity types exist:
primary— one record per row, no nulls. This is the grain of the semantic model, equivalent to a primary key. Every semantic model must have exactly one.unique— one per row, nulls allowed. Use this when a column uniquely identifies rows but can be absent.foreign— zero to many instances. Acustomer_idin an orders table is foreign because many orders share the same customer.natural— columns that uniquely identify records based on real-world data, like country codes or product SKUs.
The primary entity is the most consequential. MetricFlow uses it to determine the grain of the model and validate that joins make sense. If a semantic model has no primary entity, or has two, MetricFlow will reject it.
When you query metrics from two different semantic models — say, revenue from orders and customer count from customers — MetricFlow looks at the entities to find a shared join path. If orders has customer_id as a foreign entity and customers has customer_id as a primary entity, MetricFlow knows how to join them. You do not write the JOIN. You just request the metrics.
Dimensions
Dimensions are the columns you group by, filter on, and slice your metrics with. Two types exist: time and categorical.
Time dimensions are required for any time-based querying. The agg_time_dimension in the defaults block specifies which time dimension MetricFlow uses when you query metric_time — the built-in time dimension that all time-series queries go through.
dimensions: - name: order__created_at type: time type_params: time_granularity: dayThe time_granularity setting controls the finest level at which this dimension operates. A day granularity means queries can group by day, week, month, quarter, or year. MetricFlow handles the truncation automatically at query time.
Categorical dimensions are anything else you group by:
dimensions: - name: order__status type: categorical - name: order__channel type: categoricalThe naming convention with double underscores (order__status) comes from MetricFlow’s approach to dimension resolution. When you reference a dimension in a filter or group-by, you use the entity name and dimension name together: orders__order__status. This makes explicit which semantic model the dimension comes from, which matters when multiple models share dimension names.
A dimension with no primary entity in its semantic model will not appear in query results. Every dimension needs a primary entity to anchor it.
Measures
Measures are numeric aggregations defined on columns of the underlying model. They are the building blocks that metrics reference. The distinction between measures and metrics matters: measures define the aggregation mechanics, metrics define the business meaning.
measures: - name: order_total agg: sum expr: order__amount description: Sum of order amounts - name: order_count agg: count expr: order_id - name: distinct_customers agg: count_distinct expr: customer_idThe expr field is the column name (or a SQL expression) from the underlying model. The agg field is the aggregation function MetricFlow applies.
Supported aggregations: sum, count, count_distinct, avg, min, max, median, and percentile. The last two are less commonly needed but useful for latency metrics and SLA tracking.
Measures should stay close to raw aggregations. Avoid encoding business logic at the measure level — that belongs in metric filters. A measure named order_total that aggregates order__amount is reusable for enterprise_revenue, smb_revenue, mtd_revenue, and any other metric that needs total order amounts. A measure named enterprise_order_total with an embedded filter serves only one purpose.
The defaults.agg_time_dimension field at the semantic model level sets the default time dimension for all measures in that model. MetricFlow uses this when generating time-series queries and needs to know which date column to group by.
How they compose
The three components work together to form a navigable graph. Entities connect models. Dimensions enable filtering. Measures feed metrics. A query like “show me revenue by channel for the past 30 days” resolves as:
- Find the
revenuemetric - Trace it to the
order_totalmeasure on theorderssemantic model - Find the
order__channeldimension onorders - Apply the time filter via
order__created_at - Generate SQL:
SELECT order__channel, SUM(order__amount) FROM orders WHERE order__created_at >= ...
You define the components once. MetricFlow handles the SQL generation. When your mart model changes column names, you update the semantic model YAML and every metric that depends on it updates automatically — no dashboard-hunting required.