ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Mart Layer Patterns

What belongs in dbt mart models — reporting aggregations, activation exports, ML feature tables — and the principle that every mart serves a specific consumer.

Planted
dbtdata modelingdata engineering

Mart models are the consumer-facing layer of a dbt project. Unlike base models (the project’s foundation) and intermediate models (entity enrichment for internal reuse), marts exist for specific use cases. Each mart has a defined consumer and a defined grain — a specific question at a specific aggregation level.

This is the first and only layer in the dbt Three-Layer Architecture where GROUP BY appears as the final output. Marts reduce rows because that’s what consumers actually need: not every order, but daily performance by channel. Not every customer event, but one row per customer with all their attributes.

The Three Main Mart Patterns

Reporting Marts

Reporting marts aggregate to the grain your dashboards need. Daily performance by channel. Weekly cohort retention. Monthly revenue by segment. The grain is determined by what the dashboard team queries most — building a mart at the wrong grain means the BI tool does the aggregation instead, which is slower and less consistent.

For BigQuery projects, consider how partitioning and clustering affect these aggregation patterns. A daily reporting mart partitioned by date and clustered by channel gives you efficient queries out of the box.

{{ config(
materialized='table',
partition_by={
"field": "date_day",
"data_type": "date",
"granularity": "day"
},
cluster_by=['channel__name'],
tags=['mart', 'reporting', 'marketing']
) }}
WITH int__orders AS (
SELECT
order__id,
order__created_at,
order__amount_usd,
customer__id,
channel__name,
order__margin_usd,
order__is_first_order,
order__is_completed
FROM {{ ref('int__order__enriched') }}
WHERE order__is_completed = TRUE
),
aggregated AS (
SELECT
DATE(order__created_at) AS date_day,
channel__name,
COUNT(DISTINCT order__id) AS orders,
COUNT(DISTINCT customer__id) AS customers,
SUM(CASE WHEN order__is_first_order THEN 1 ELSE 0 END) AS new_customers,
SUM(order__amount_usd) AS revenue_usd,
SUM(order__margin_usd) AS margin_usd,
SUM(CASE WHEN order__is_first_order THEN order__amount_usd ELSE 0 END) AS new_customer_revenue_usd
FROM int__orders
GROUP BY ALL
),
enriched AS (
SELECT
*,
SAFE_DIVIDE(margin_usd, revenue_usd) AS margin_rate,
SAFE_DIVIDE(revenue_usd, orders) AS avg_order_value_usd,
SAFE_DIVIDE(new_customers, customers) AS new_customer_rate
FROM aggregated
)
SELECT
date_day,
channel__name,
orders,
customers,
new_customers,
revenue_usd,
margin_usd,
margin_rate,
avg_order_value_usd,
new_customer_revenue_usd,
new_customer_rate
FROM enriched

Notice the CTE structure: import from intermediate, aggregate to the target grain, then enrich with derived metrics (rates, averages). The SAFE_DIVIDE function prevents division-by-zero errors that would otherwise produce NULLs or failures in your dashboard.

The GROUP BY ALL shorthand (available in BigQuery and some other warehouses) is convenient but controversial — some teams prefer explicit column lists for clarity. Either way, the key point is that this is the first time in the lineage where rows are being reduced.

Activation Marts

Activation marts prepare data for reverse ETL destinations: CRM tools like Braze or Salesforce, email platforms, ad audiences, customer data platforms. The grain is usually one row per entity, with all the attributes the destination system expects.

{{ config(
materialized='table',
tags=['mart', 'activation', 'braze']
) }}
WITH int__customers AS (
SELECT
customer__id,
customer__email,
customer__segment,
customer__orders,
customer__total_spent_usd,
customer__last_ordered_at
FROM {{ ref('int__customer__enriched') }}
),
enriched AS (
SELECT
customer__id,
customer__email,
customer__segment,
customer__orders,
customer__total_spent_usd,
customer__last_ordered_at,
CASE
WHEN customer__orders = 0 THEN 'prospect'
WHEN customer__orders = 1 THEN 'new'
WHEN customer__last_ordered_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) THEN 'active'
WHEN customer__last_ordered_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY) THEN 'lapsing'
ELSE 'churned'
END AS customer__lifecycle_stage,
customer__orders >= 5 AS customer__is_vip,
customer__total_spent_usd >= 1000 AS customer__is_high_ltv
FROM int__customers
)
SELECT
customer__email AS email,
customer__id AS external_id,
customer__segment,
customer__lifecycle_stage AS lifecycle_stage,
customer__orders AS orders,
customer__total_spent_usd AS total_spent_usd,
customer__is_vip AS is_vip,
customer__is_high_ltv AS is_high_ltv
FROM enriched

Two things distinguish activation marts from reporting marts. First, the column names in the final SELECT match what the destination system expects (email, external_id, lifecycle_stage) rather than your internal naming convention. Second, the table exposes only what the destination needs — nothing more. Braze doesn’t need your internal customer ID format or raw order timestamps. Keeping the output tight reduces data transfer costs and prevents sensitive data from leaking into external systems.

ML Feature Marts

ML marts create feature tables for machine learning pipelines. They tend to be wide (many columns) with temporal features, one row per prediction target. The grain matches what the ML model expects: one row per customer for churn prediction, one row per product for demand forecasting, one row per session for conversion prediction.

ML marts often include lagged features (previous period metrics), rolling aggregations (30-day averages), and ratio features that data scientists have identified as predictive. The intermediate layer provides the building blocks; the mart assembles them into the feature matrix the model training pipeline consumes.

Every Mart Serves a Specific Consumer

Before creating a mart, three questions should have clear answers:

  • Who consumes this? A specific dashboard, a reverse ETL pipeline, an ML training job.
  • What grain do they need? Daily by channel, one row per customer, one row per session.
  • What columns do they need? Only what they’ll actually query or ingest.

If these questions don’t have clear answers, the use case may be better served by a well-built intermediate model.

When You Don’t Need a Mart

A mart that’s just SELECT * FROM int__order__enriched with a different name adds a node to your DAG without adding value. If you’re not changing the grain, not formatting for a specific consumer, and not aggregating — you probably don’t need a mart. Consumers can query the intermediate model directly, or you can expose it as a public model with a contract.

The test: if removing the mart and pointing consumers at the intermediate model wouldn’t change their queries, the mart is unnecessary overhead.

Mart Organization

Marts are organized by business domain, not by source system or entity. The project structure places them in folders like marts/finance/, marts/marketing/, marts/activation/. This mirrors how consumers think: the marketing team looks in the marketing folder, not in a folder named after the source system their data came from.

Names follow the pattern mrt__[domain]__[entity_or_purpose]:

  • mrt__marketing__daily_channel_performance
  • mrt__finance__monthly_revenue
  • mrt__activation__braze_customers
  • mrt__ml__customer_churn_features

The domain prefix makes it obvious who owns and consumes each table.

Materialization Considerations

Most marts should be materialized as table for fast query performance. The exceptions:

  • High-volume reporting marts with historical data benefit from incremental materialization. A daily performance mart that grows by one row per day per channel doesn’t need to recompute all of history on every run.
  • BigQuery reporting marts should use partition_by on the date column and cluster_by on the most common filter columns. This gives the BI tool efficient access patterns without full table scans.
  • Activation marts are typically small enough (one row per customer) that table materialization with full refresh is fine. The overhead of incremental logic isn’t worth it when the table fits in memory.

Testing Marts

Marts deserve the heaviest testing investment in the testing taxonomy:

  • Primary key tests on the grain columns (date + channel for a daily mart, customer_id for a customer mart)
  • Not-null tests on all metric columns — a NULL in revenue_usd will silently break dashboard totals
  • Accepted values on categorical columns that consumers filter on
  • Model contracts for public-facing marts, enforcing schema stability so BI tools don’t break when you refactor
  • dbt-expectations range tests for business metrics — revenue should never be negative, rates should be between 0 and 1

The stakes are higher here because marts are what consumers see. A bug in intermediate is invisible until it surfaces in a mart. A bug in a mart is immediately visible in a dashboard or a CRM sync.