MetricFlow semantic model components

A semantic model is a YAML layer that describes a dbt model to MetricFlow. It has a one-to-one relationship with a dbt SQL or Python model and annotates that model with three types of information: entities (how tables join), dimensions (how data can be sliced), and measures (what can be aggregated). Everything else in the dbt semantic layer builds on top of these three components.

The full structure looks like this:

semantic_models:
  - name: orders
    description: Order transactions with revenue and quantity
    model: ref('mrt__sales__orders')
    defaults:
      agg_time_dimension: order__created_at

    entities:
      - name: order_id
        type: primary
      - name: customer_id
        type: foreign
      - name: product_id
        type: foreign

    dimensions:
      - name: order__created_at
        type: time
        type_params:
          time_granularity: day
      - name: order__status
        type: categorical
      - name: order__channel
        type: categorical

    measures:
      - name: order_total
        agg: sum
        expr: order__amount
        description: Sum of order amounts
      - name: order_count
        agg: count
        expr: order_id
      - name: distinct_customers
        agg: count_distinct
        expr: customer_id

The model field points to the mart layer model. Semantic models should reference marts, not base or intermediate models. Marts are stable interfaces; intermediate models get refactored without warning. More on this in Metric Organization in dbt Projects.

Entities

Entities are join keys. They tell MetricFlow how semantic models connect to each other, forming the semantic graph that makes automatic joins possible. You define an entity once, and MetricFlow uses it to navigate between tables when a query spans multiple semantic models.

Four entity types exist:

primary — one record per row, no nulls. This is the grain of the semantic model, equivalent to a primary key. Every semantic model must have exactly one.
unique — one per row, nulls allowed. Use this when a column uniquely identifies rows but can be absent.
foreign — zero to many instances. A customer_id in an orders table is foreign because many orders share the same customer.
natural — columns that uniquely identify records based on real-world data, like country codes or product SKUs.

The primary entity is the most consequential. MetricFlow uses it to determine the grain of the model and validate that joins make sense. If a semantic model has no primary entity, or has two, MetricFlow will reject it.

When you query metrics from two different semantic models — say, revenue from orders and customer count from customers — MetricFlow looks at the entities to find a shared join path. If orders has customer_id as a foreign entity and customers has customer_id as a primary entity, MetricFlow knows how to join them. You do not write the JOIN. You just request the metrics.

Dimensions

Dimensions are the columns you group by, filter on, and slice your metrics with. Two types exist: time and categorical.

Time dimensions are required for any time-based querying. The agg_time_dimension in the defaults block specifies which time dimension MetricFlow uses when you query metric_time — the built-in time dimension that all time-series queries go through.

dimensions:
  - name: order__created_at
    type: time
    type_params:
      time_granularity: day

The time_granularity setting controls the finest level at which this dimension operates. A day granularity means queries can group by day, week, month, quarter, or year. MetricFlow handles the truncation automatically at query time.

Categorical dimensions are anything else you group by:

dimensions:
  - name: order__status
    type: categorical
  - name: order__channel
    type: categorical

The naming convention with double underscores (order__status) comes from MetricFlow’s approach to dimension resolution. When you reference a dimension in a filter or group-by, you use the entity name and dimension name together: orders__order__status. This makes explicit which semantic model the dimension comes from, which matters when multiple models share dimension names.

A dimension with no primary entity in its semantic model will not appear in query results. Every dimension needs a primary entity to anchor it.

Measures

Measures are numeric aggregations defined on columns of the underlying model. They are the building blocks that metrics reference. The distinction between measures and metrics matters: measures define the aggregation mechanics, metrics define the business meaning.

measures:
  - name: order_total
    agg: sum
    expr: order__amount
    description: Sum of order amounts
  - name: order_count
    agg: count
    expr: order_id
  - name: distinct_customers
    agg: count_distinct
    expr: customer_id

The expr field is the column name (or a SQL expression) from the underlying model. The agg field is the aggregation function MetricFlow applies.

Supported aggregations: sum, count, count_distinct, avg, min, max, median, and percentile. The last two are less commonly needed but useful for latency metrics and SLA tracking.

Measures should stay close to raw aggregations. Avoid encoding business logic at the measure level — that belongs in metric filters. A measure named order_total that aggregates order__amount is reusable for enterprise_revenue, smb_revenue, mtd_revenue, and any other metric that needs total order amounts. A measure named enterprise_order_total with an embedded filter serves only one purpose.

The defaults.agg_time_dimension field at the semantic model level sets the default time dimension for all measures in that model. MetricFlow uses this when generating time-series queries and needs to know which date column to group by.

How they compose

The three components work together to form a navigable graph. Entities connect models. Dimensions enable filtering. Measures feed metrics. A query like “show me revenue by channel for the past 30 days” resolves as:

Find the revenue metric
Trace it to the order_total measure on the orders semantic model
Find the order__channel dimension on orders
Apply the time filter via order__created_at
Generate SQL: SELECT order__channel, SUM(order__amount) FROM orders WHERE order__created_at >= ...

You define the components once. MetricFlow handles the SQL generation. When your mart model changes column names, you update the semantic model YAML and every metric that depends on it updates automatically — no dashboard-hunting required.