Adrienne Vermorel
Getting Started with the dbt Semantic Layer and MetricFlow
The dbt Semantic Layer lets you define metrics once and query them from any tool, so your organization gets consistent numbers everywhere. Setting it up is more accessible than you might expect.
This guide walks through installing MetricFlow, configuring your first semantic model, defining metrics, and running queries. By the end, you’ll have a working semantic layer you can extend as your needs grow.
Understanding the architecture
MetricFlow is the SQL query generation engine behind the dbt Semantic Layer. When you define metrics in YAML, MetricFlow builds a semantic graph (a relationship structure that maps how your tables connect through entities). Think of it as a navigation system that knows how to join tables without you spelling out every relationship.
When you query a metric, MetricFlow:
- Reads your YAML definitions
- Determines which joins are needed
- Generates optimized SQL for your data platform
- Executes the query and returns results
MetricFlow handles join logic automatically. You define entities (the join keys) once, and it figures out how to navigate between tables.
dbt Core vs dbt Cloud capabilities
Capabilities differ between the two:
| Feature | dbt Core | dbt Cloud |
|---|---|---|
| Define semantic models and metrics | Yes | Yes |
| Generate SQL from configs | Yes | Yes |
| Query via CLI | mf commands | dbt sl commands |
| Query via APIs (JDBC, GraphQL) | No | Yes |
| BI tool integrations | No | Yes |
| Exports to materialize metrics | No | Yes |
If you’re on dbt Core, you get full metric definition and CLI querying. The API layer and BI integrations require dbt Cloud.
Installation
For dbt Core, install the MetricFlow bundle matching your adapter:
pip install "dbt-metricflow[dbt-snowflake]"Replace dbt-snowflake with your adapter: dbt-bigquery, dbt-databricks, dbt-redshift, or dbt-postgres.
For dbt Cloud, MetricFlow is built in. You’ll use dbt sl commands instead of mf.
Supported platforms: Snowflake, BigQuery, Databricks, Redshift, Postgres (Core only), and Trino.
Setting up the time spine
Cumulative metrics and time-based calculations require a time spine, a table with one row per day. MetricFlow uses this as a reliable date dimension to join against.
Create a model called metricflow_time_spine.sql:
{{ config(materialized='table') }}
{{ dbt.date_spine( datepart="day", start_date="CAST('2020-01-01' AS DATE)", end_date="CAST('2030-12-31' AS DATE)") }}Then configure it in your dbt_project.yml:
semantic-layer: time_spine: standard_granularity_column: date_dayRun dbt build -s metricflow_time_spine to materialize it.
Defining a semantic model
Semantic models are YAML abstractions that describe your dbt models for MetricFlow. Each semantic model has a one-to-one relationship with a dbt SQL or Python model.
This semantic model describes an orders table:
semantic_models: - name: orders description: Order transactions with revenue and quantity model: ref('mrt__sales__orders') defaults: agg_time_dimension: order__created_at
entities: - name: order_id type: primary - name: customer_id type: foreign - name: product_id type: foreign
dimensions: - name: order__created_at type: time type_params: time_granularity: day - name: order__status type: categorical - name: order__channel type: categorical
measures: - name: order_total agg: sum expr: order__amount description: Sum of order amounts - name: order_count agg: count expr: order_id description: Count of orders - name: distinct_customers agg: count_distinct expr: customer_idBreaking down the components
Entities are join keys that connect semantic models. Four types exist:
primary: One record per row, no nulls allowed (likeorder_id)unique: One per row, nulls allowedforeign: Zero to many instances (likecustomer_idin an orders table)natural: Columns uniquely identifying records based on real-world data
Dimensions enable grouping, slicing, and filtering. Use time for dates and categorical for string values.
Measures are the numeric aggregations that become building blocks for metrics. Supported aggregations: sum, count, count_distinct, avg, min, max, median, and percentile.
Defining metrics
Metrics reference the measures you defined. These simple metrics build on the orders semantic model:
metrics: - name: revenue description: Total order revenue type: simple label: Revenue type_params: measure: order_total
- name: orders description: Total number of orders type: simple label: Order Count type_params: measure: order_count
- name: unique_customers description: Count of distinct customers type: simple label: Unique Customers type_params: measure: distinct_customersMetric types for different needs
Simple metrics directly reference a measure, like the examples above.
Derived metrics calculate from other metrics. This example computes revenue per customer:
metrics: - name: revenue_per_customer description: Average revenue per unique customer type: derived label: Revenue per Customer type_params: expr: revenue / unique_customers metrics: - name: revenue - name: unique_customersRatio metrics compute ratios correctly by aggregating numerator and denominator independently before dividing:
metrics: - name: average_order_value description: Average value per order type: ratio label: AOV type_params: numerator: order_total denominator: order_countCumulative metrics aggregate over time windows:
metrics: - name: revenue_mtd description: Month-to-date revenue type: cumulative label: Revenue MTD type_params: measure: order_total grain_to_date: monthValidating your configuration
Before querying, validate that your definitions parse correctly:
# dbt Coremf validate-configs
# dbt Clouddbt sl validateIf validation fails, check for these common issues:
- “ensure that you’ve ran an artifacts…” error: Run
dbt parseordbt buildfirst to generate the semantic manifest. - Metafont conflict: The
mfcommand can conflict with the Metafont LaTeX package. Uninstall it if you see unexpected behavior. - Dimensions not appearing: Every dimension needs a primary entity in its semantic model.
Querying metrics
Generate the semantic manifest first:
dbt parseThen query your metrics:
# dbt Coremf query --metrics revenue --group-by metric_time
# dbt Clouddbt sl query --metrics revenue --group-by metric_timeAdd dimensions for slicing:
mf query --metrics revenue,orders --group-by metric_time,order__channelFilter with the --where flag:
mf query --metrics revenue --group-by metric_time --where "{{ Dimension('orders__order__channel') }} = 'web'"The filter syntax uses Jinja templating. Reference dimensions as {{ Dimension('semantic_model__dimension_name') }}.
Organizing semantic models at scale
For smaller projects, co-locate semantic definitions with your dbt models:
models/ marts/ mrt__sales__orders.sql mrt__sales__orders.yml # semantic model + metrics mrt__sales__customers.sql mrt__sales__customers.ymlFor larger projects, separate into dedicated directories:
models/ marts/ mrt__sales__orders.sql mrt__sales__customers.sql semantic/ semantic_models/ sem_orders.yml sem_customers.yml metrics/ revenue_metrics.yml customer_metrics.ymlA complete working example
Putting it all together with an orders model:
-- models/marts/mrt__sales__orders.sqlSELECT order_id, customer_id, created_at AS order__created_at, channel AS order__channel, status AS order__status, amount AS order__amountFROM {{ ref('int__orders_enriched') }}Define the semantic layer:
semantic_models: - name: orders model: ref('mrt__sales__orders') defaults: agg_time_dimension: order__created_at
entities: - name: order_id type: primary - name: customer_id type: foreign
dimensions: - name: order__created_at type: time type_params: time_granularity: day - name: order__status type: categorical - name: order__channel type: categorical
measures: - name: order_total agg: sum expr: order__amount - name: order_count agg: count expr: order_id
metrics: - name: revenue type: simple type_params: measure: order_total
- name: orders type: simple type_params: measure: order_count
- name: aov type: ratio type_params: numerator: order_total denominator: order_countBuild and query:
dbt builddbt parsemf query --metrics revenue,aov --group-by metric_time,order__channelWhat comes next
Once your semantic layer is running, you can:
- Add more semantic models for customers, products, and other entities. MetricFlow will automatically handle joins through shared entities.
- Define period-over-period metrics using derived metrics with
offset_windowfor comparisons like month-over-month growth. - Connect BI tools (dbt Cloud) through JDBC or the GraphQL API.
- Enable LLM access by exposing your governed metrics to AI tools that can query natural language against consistent definitions.
The real value shows up when multiple teams query the same metrics and get the same answers, without maintaining duplicate logic across dashboards and reports.