CRM Modeling Patterns in dbt

The base-intermediate-mart pattern maps naturally to CRM data. Each layer has a clear responsibility, and the CRM domain adds specific patterns at each layer that go beyond what you’d see in e-commerce or event analytics modeling.

This note covers concrete implementation patterns for Salesforce and HubSpot data. For the general architectural principles, see the individual layer pattern notes. For the extraction challenges that inform these patterns, see CRM Data Extraction Challenges.

Base Layer: Clean and Normalize

One model per source table, materialized as views. Rename columns, cast types, filter out deleted records. No joins, no aggregations. The base layer is where you handle the extraction quirks — soft deletes, property prefixes, and custom field suffixes.

Salesforce Base Model

-- base__salesforce__opportunity.sql
SELECT
    id                AS opportunity__id,
    account_id        AS opportunity__account_id,
    owner_id          AS opportunity__owner_id,
    name              AS opportunity__name,
    stage_name        AS opportunity__stage,
    amount            AS opportunity__amount,
    close_date        AS opportunity__close_at,
    is_won            AS opportunity__is_won,
    is_closed         AS opportunity__is_closed,
    created_date      AS opportunity__created_at,
    system_modstamp   AS opportunity__updated_at
FROM {{ source('salesforce', 'opportunity') }}
WHERE NOT _fivetran_deleted

Key patterns:

The WHERE NOT _fivetran_deleted filter is mandatory on every CRM base model
Rename to your project’s naming convention — the double-underscore pattern (entity__field) makes the grain explicit
Keep system_modstamp as opportunity__updated_at — you’ll need it for snapshot strategies and incremental logic

HubSpot Base Model

The same pattern applies, but with HubSpot’s property_ column renaming:

-- base__hubspot__contact.sql
SELECT
    id                              AS contact__id,
    property_email                  AS contact__email,
    property_firstname              AS contact__first_name,
    property_lastname               AS contact__last_name,
    property_lifecyclestage         AS contact__lifecycle_stage,
    property_hs_lead_status         AS contact__lead_status,
    property_createdate             AS contact__created_at
FROM {{ source('hubspot', 'contact') }}
WHERE NOT _fivetran_deleted

HubSpot’s property_ prefix is consistent across all objects, making base model patterns predictable. The prefix exists because HubSpot’s API distinguishes between system fields (like id) and user-defined properties.

Bridge Table Base Models (HubSpot)

For HubSpot’s association tables, create base models that normalize the bridge data:

-- base__hubspot__deal_contact.sql
SELECT
    deal_id     AS deal__id,
    contact_id  AS contact__id,
    label       AS association__label,
    _fivetran_synced AS synced_at
FROM {{ source('hubspot', 'deal_contact') }}

These bridge tables rarely have soft deletes, but check your specific extraction tool’s behavior.

Intermediate Layer: Enrich and Join

This is where business logic lives. Materialized as ephemeral models or views. Joins, deduplication, window functions, and more complex transformations go here. The critical constraint holds: never reduce the grain.

Salesforce Opportunity Enrichment

-- int__opportunity_enriched.sql
WITH

opportunities AS (
    SELECT
        opportunity__id,
        opportunity__account_id,
        opportunity__owner_id,
        opportunity__name,
        opportunity__stage,
        opportunity__amount,
        opportunity__close_at,
        opportunity__is_won,
        opportunity__is_closed,
        opportunity__created_at
    FROM {{ ref('base__salesforce__opportunity') }}
),

accounts AS (
    SELECT
        account__id,
        account__name,
        account__industry,
        account__annual_revenue
    FROM {{ ref('base__salesforce__account') }}
)

SELECT
    opportunity__id,
    opportunity__owner_id,
    opportunity__name,
    opportunity__stage,
    opportunity__amount,
    opportunity__close_at,
    opportunity__is_won,
    opportunity__is_closed,
    opportunity__created_at,
    account__name,
    account__industry,
    account__annual_revenue
FROM opportunities
LEFT JOIN accounts
    ON opportunities.opportunity__account_id = accounts.account__id

The LEFT JOIN to accounts is correct because Salesforce’s Account-Opportunity relationship is a Lookup (nullable FK), not Master-Detail. Some opportunities may have no associated account.

An intermediate model should ideally feed only one downstream mart model. If the same enrichment serves multiple marts, that’s fine — but if you find yourself building “catch-all” intermediate models with every possible join, split them into focused entity models.

Mart Layer: Business-Ready Entities

Business-ready entities at a specific grain. Materialized as tables or incremental models. These are what your BI tool queries.

-- mrt__sales__opportunity_enhanced.sql
WITH

enriched AS (
    SELECT
        opportunity__id,
        opportunity__owner_id,
        opportunity__name,
        opportunity__stage,
        opportunity__amount,
        opportunity__close_at,
        opportunity__is_won,
        opportunity__is_closed,
        opportunity__created_at,
        account__name,
        account__industry
    FROM {{ ref('int__opportunity_enriched') }}
),

owners AS (
    SELECT
        user__id,
        user__full_name
    FROM {{ ref('base__salesforce__user') }}
)

SELECT
    enriched.opportunity__id,
    enriched.opportunity__name,
    enriched.opportunity__stage,
    enriched.opportunity__amount,
    enriched.opportunity__close_at,
    enriched.opportunity__is_won,
    enriched.opportunity__is_closed,
    enriched.account__name,
    enriched.account__industry,
    owners.user__full_name AS opportunity__owner_name,
    DATE_DIFF(
        COALESCE(enriched.opportunity__close_at, CURRENT_DATE()),
        enriched.opportunity__created_at,
        DAY
    ) AS opportunity__days_open
FROM enriched
LEFT JOIN owners
    ON enriched.opportunity__owner_id = owners.user__id

The opportunity__days_open calculation is a good example of mart-level enrichment — it’s a metric for dashboards, not reusable business logic. If multiple marts needed this, it would belong in intermediate.

Materialization Progression

Start with views, promote to tables when queries get slow, then promote to incremental models when full table rebuilds become expensive. For CRM data, this progression often happens faster than with event data because CRM tables grow through mutability (more columns, more history) rather than pure volume.

The tipping point for CRM tables is typically the activity and history tables — opportunity history, activity timelines, engagement logs — which can reach millions of rows.

Incremental Strategies for CRM on BigQuery

CRM fact tables grow fast. The right incremental strategy makes a real difference in both cost and build time.

Merge (default in dbt) requires a unique_key. It scans both source and destination tables, which gets expensive as tables grow. Best for dimension tables with frequent updates — the core CRM entity tables (Account, Contact, Opportunity) where individual records change.

Insert_overwrite (BigQuery-optimized) replaces entire partitions without a full table scan. With copy_partitions: true, BigQuery performs zero-cost partition replacement. Best for CRM activity and event tables where data arrives in time-bounded batches.

For CRM data on BigQuery, partition by date fields (opportunity__created_at, opportunity__close_at) and cluster by frequently filtered columns (opportunity__account_id, opportunity__owner_id, opportunity__stage). Clustering can reduce scanned data by 88%+ for incremental models.

{{
    config(
        materialized='incremental',
        incremental_strategy='insert_overwrite',
        partition_by={
            'field': 'activity__occurred_at',
            'data_type': 'date',
            'granularity': 'day'
        },
        cluster_by=['activity__account_id', 'activity__owner_id']
    )
}}

The insert_overwrite strategy alone can reduce costs by 96% on large event tables compared to merge. This is especially relevant for CRM activity data where you’re dealing with engagement logs, email opens, call logs, and task completions that accumulate quickly.

The Schema Design Question

For structuring final CRM mart models — entity-separated star schema vs wide denormalized tables — see Star Schema vs One Big Table. The practical answer for CRM: entity-separated intermediate models for reusability, wide pre-joined mart models for BI performance. The Fivetran dbt packages follow this pattern with models like salesforce__opportunity_enhanced.