ServicesAboutNotesContact Get in touch →
EN FR
Note

Semantic Layer Architecture

How semantic layers work in the modern data stack — competing implementations (MetricFlow, Snowflake Semantic Views, Databricks Metric Views), the OSI initiative, and why the semantic layer determines AI accuracy

Planted
dbtbigquerysnowflakeanalyticsdata modeling

A semantic layer sits between your data warehouse and your consumers — BI tools, AI agents, custom applications, notebooks — and provides a single, governed place where metrics and dimensions are defined. Instead of every consumer writing their own SQL to calculate “revenue” or “active users,” the semantic layer holds the canonical definition and translates it into warehouse-native SQL at query time.

The concept has existed for decades (Cognos metadata layers, Business Objects universes). As of 2025-2026, three production-ready competing implementations coexist, and an interoperability standard is emerging.

Why the Semantic Layer Matters Now

Without a governed semantic layer, different analysts writing queries for the same metric produce different results: one excludes refunds, another uses billing date instead of transaction date, a third filters to a different currency set. This metric drift erodes trust in data.

The infrastructure to solve this problem matured across multiple platforms simultaneously. The rise of AI-powered analytics added urgency: an AI copilot generating SQL without a governed semantic layer will produce inconsistent metric definitions at scale.

The Three Competing Implementations

dbt MetricFlow

MetricFlow is dbt Labs’ semantic layer, open-sourced under Apache 2.0 after the Fivetran-dbt Labs merger. It defines metrics in YAML alongside your dbt project, making metric definitions part of the same version-controlled, CI/CD-tested codebase as your transformations.

A MetricFlow metric definition looks like this:

semantic_models:
- name: orders
defaults:
agg_time_dimension: order_date
entities:
- name: order_id
type: primary
- name: customer_id
type: foreign
measures:
- name: order_total
agg: sum
expr: amount
dimensions:
- name: order_date
type: time
type_params:
time_granularity: day
- name: status
type: categorical
metrics:
- name: revenue
type: simple
type_params:
measure: order_total
filter: |
{{ Dimension('order_id__status') }} = 'completed'
- name: revenue_per_customer
type: derived
type_params:
expr: revenue / count_customers
metrics:
- name: revenue
- name: count_customers

The key architectural decisions: metrics are built on top of “measures” (aggregations on columns) and “semantic models” (the mapping between your warehouse tables and business entities). Derived metrics compose other metrics with arithmetic. Filters use a Jinja-like syntax that references dimensions through their entity path.

As of early 2026, the dbt Fusion engine is modernizing the MetricFlow spec for dbt Core v1.12. The biggest change: removing the separate “measures” concept and embedding semantic annotations directly within model YAML entries. This simplifies the authoring experience considerably — you define a model and its semantic meaning in one place instead of maintaining parallel YAML blocks.

MetricFlow is multi-cloud. It generates SQL for BigQuery, Snowflake, Databricks, Redshift, and PostgreSQL. The dbt Cloud Semantic Layer API exposes metrics via JDBC, GraphQL, and REST, which any downstream tool can query.

Snowflake Semantic Views

Snowflake’s native semantic layer, built into the platform itself. Rather than defining metrics in external YAML files, you create semantic views as first-class database objects using SQL DDL:

CREATE SEMANTIC VIEW revenue_metrics AS
TABLES (
orders AS o,
customers AS c
)
RELATIONSHIPS (
o.customer_id = c.customer_id
)
METRICS (
total_revenue AS SUM(o.amount),
avg_order_value AS AVG(o.amount)
)
DIMENSIONS (
c.region,
o.order_date
);

The advantage is tight integration with Snowflake’s query optimizer and governance model — permissions, row-level security, and data masking apply to semantic views the same way they apply to regular views. The disadvantage is vendor lock-in: these definitions only work on Snowflake.

Databricks Metric Views

Databricks’ approach is lakehouse-centric, integrating with Unity Catalog for governance. Metric Views define metrics that are discoverable through the catalog and queryable by any tool connected to Databricks SQL. Like Snowflake’s approach, this ties metric governance to the platform’s existing permission model.

The Interoperability Problem and OSI

Having three competing semantic layers creates an obvious problem: if your metrics are defined in MetricFlow but a Snowflake-native tool only reads Semantic Views, you either maintain parallel definitions (defeating the purpose) or pick a side.

The Open Semantic Interchange (OSI) initiative, launched by dbt Labs, Snowflake, Salesforce, Atlan, and ThoughtSpot, aims to create vendor-neutral standards for semantic data exchange. The goal is a common format that any tool can read and write, so metric definitions are portable across platforms.

Meaningful interoperability is expected in 2026-2027. Until then, the practical choice depends on your stack:

  • dbt-centric teams: MetricFlow. Your metrics live alongside your transformations, reviewed in the same PRs, tested in the same CI pipelines.
  • Snowflake-native teams without dbt: Snowflake Semantic Views. Zero external dependencies.
  • Databricks-native teams: Metric Views integrated with Unity Catalog.
  • Multi-cloud teams: MetricFlow is the only option that generates SQL across warehouses today.

The Semantic Layer Determines AI Accuracy

Every major BI tool now ships AI features: natural language to SQL, automated insights, conversational analytics. Power BI has Copilot. Tableau has Einstein AI. Looker has Gemini. ThoughtSpot has Spotter. Lightdash has AI Agents. The list goes on.

Practitioners report the same finding: the AI model is not the bottleneck — the semantic layer is. When a user asks “what was revenue last quarter?”, the AI needs to know:

  1. Which table contains revenue data
  2. Which column represents the amount
  3. What filters define “revenue” (completed orders only? excluding refunds?)
  4. What “last quarter” means (fiscal quarter? calendar quarter? which timezone?)

Without a governed semantic layer, the AI picks plausible-looking columns and applies reasonable-sounding filters, producing SQL that looks correct but calculates the wrong number.

A well-defined semantic layer constrains the AI’s vocabulary to known metric definitions, valid dimensions, and allowed filters. This reduces the problem from “infer what revenue means” to “translate a user question into a query against known definitions.”

Semantic Layer and Headless BI

The semantic layer’s power multiplies when exposed via APIs rather than locked inside a single BI tool. This is the headless BI pattern: your semantic layer becomes a metric service that any frontend can query. A React dashboard, a Slack bot, a scheduled email report, and an AI agent all consume the same governed metrics from the same API.

Cube.dev and dbt’s Semantic Layer API are the leading implementations. The key insight is that the semantic layer is infrastructure, not a feature of your BI tool. Decoupling it from visualization means you can swap BI frontends without redefining metrics, and you can serve metrics to applications that aren’t BI tools at all.

Properties of a mature implementation

  • Metric definitions in version control, reviewed via pull requests, with tests validating correctness
  • One canonical definition per metric — no parallel definitions across tools
  • Dimension hierarchies for drill-down without custom SQL
  • Access controls inherited from the warehouse permission model
  • API access for any downstream consumer, not just the BI tool
  • Documentation generated automatically from metric metadata

The hard work is organizational: reaching agreement on metric definitions, maintaining them as business logic evolves, and enforcing that all consumers query the semantic layer rather than writing ad-hoc SQL.