The base-intermediate-mart pattern maps naturally to CRM data. Each layer has a clear responsibility, and the CRM domain adds specific patterns at each layer that go beyond what you’d see in e-commerce or event analytics modeling.
This note covers concrete implementation patterns for Salesforce and HubSpot data. For the general architectural principles, see the individual layer pattern notes. For the extraction challenges that inform these patterns, see CRM Data Extraction Challenges.
Base Layer: Clean and Normalize
One model per source table, materialized as views. Rename columns, cast types, filter out deleted records. No joins, no aggregations. The base layer is where you handle the extraction quirks — soft deletes, property prefixes, and custom field suffixes.
Salesforce Base Model
-- base__salesforce__opportunity.sqlSELECT id AS opportunity__id, account_id AS opportunity__account_id, owner_id AS opportunity__owner_id, name AS opportunity__name, stage_name AS opportunity__stage, amount AS opportunity__amount, close_date AS opportunity__close_at, is_won AS opportunity__is_won, is_closed AS opportunity__is_closed, created_date AS opportunity__created_at, system_modstamp AS opportunity__updated_atFROM {{ source('salesforce', 'opportunity') }}WHERE NOT _fivetran_deletedKey patterns:
- The
WHERE NOT _fivetran_deletedfilter is mandatory on every CRM base model - Rename to your project’s naming convention — the double-underscore pattern (
entity__field) makes the grain explicit - Keep
system_modstampasopportunity__updated_at— you’ll need it for snapshot strategies and incremental logic
HubSpot Base Model
The same pattern applies, but with HubSpot’s property_ column renaming:
-- base__hubspot__contact.sqlSELECT id AS contact__id, property_email AS contact__email, property_firstname AS contact__first_name, property_lastname AS contact__last_name, property_lifecyclestage AS contact__lifecycle_stage, property_hs_lead_status AS contact__lead_status, property_createdate AS contact__created_atFROM {{ source('hubspot', 'contact') }}WHERE NOT _fivetran_deletedHubSpot’s property_ prefix is consistent across all objects, making base model patterns predictable. The prefix exists because HubSpot’s API distinguishes between system fields (like id) and user-defined properties.
Bridge Table Base Models (HubSpot)
For HubSpot’s association tables, create base models that normalize the bridge data:
-- base__hubspot__deal_contact.sqlSELECT deal_id AS deal__id, contact_id AS contact__id, label AS association__label, _fivetran_synced AS synced_atFROM {{ source('hubspot', 'deal_contact') }}These bridge tables rarely have soft deletes, but check your specific extraction tool’s behavior.
Intermediate Layer: Enrich and Join
This is where business logic lives. Materialized as ephemeral models or views. Joins, deduplication, window functions, and more complex transformations go here. The critical constraint holds: never reduce the grain.
Salesforce Opportunity Enrichment
-- int__opportunity_enriched.sqlWITH
opportunities AS ( SELECT opportunity__id, opportunity__account_id, opportunity__owner_id, opportunity__name, opportunity__stage, opportunity__amount, opportunity__close_at, opportunity__is_won, opportunity__is_closed, opportunity__created_at FROM {{ ref('base__salesforce__opportunity') }}),
accounts AS ( SELECT account__id, account__name, account__industry, account__annual_revenue FROM {{ ref('base__salesforce__account') }})
SELECT opportunity__id, opportunity__owner_id, opportunity__name, opportunity__stage, opportunity__amount, opportunity__close_at, opportunity__is_won, opportunity__is_closed, opportunity__created_at, account__name, account__industry, account__annual_revenueFROM opportunitiesLEFT JOIN accounts ON opportunities.opportunity__account_id = accounts.account__idThe LEFT JOIN to accounts is correct because Salesforce’s Account-Opportunity relationship is a Lookup (nullable FK), not Master-Detail. Some opportunities may have no associated account.
An intermediate model should ideally feed only one downstream mart model. If the same enrichment serves multiple marts, that’s fine — but if you find yourself building “catch-all” intermediate models with every possible join, split them into focused entity models.
Mart Layer: Business-Ready Entities
Business-ready entities at a specific grain. Materialized as tables or incremental models. These are what your BI tool queries.
-- mrt__sales__opportunity_enhanced.sqlWITH
enriched AS ( SELECT opportunity__id, opportunity__owner_id, opportunity__name, opportunity__stage, opportunity__amount, opportunity__close_at, opportunity__is_won, opportunity__is_closed, opportunity__created_at, account__name, account__industry FROM {{ ref('int__opportunity_enriched') }}),
owners AS ( SELECT user__id, user__full_name FROM {{ ref('base__salesforce__user') }})
SELECT enriched.opportunity__id, enriched.opportunity__name, enriched.opportunity__stage, enriched.opportunity__amount, enriched.opportunity__close_at, enriched.opportunity__is_won, enriched.opportunity__is_closed, enriched.account__name, enriched.account__industry, owners.user__full_name AS opportunity__owner_name, DATE_DIFF( COALESCE(enriched.opportunity__close_at, CURRENT_DATE()), enriched.opportunity__created_at, DAY ) AS opportunity__days_openFROM enrichedLEFT JOIN owners ON enriched.opportunity__owner_id = owners.user__idThe opportunity__days_open calculation is a good example of mart-level enrichment — it’s a metric for dashboards, not reusable business logic. If multiple marts needed this, it would belong in intermediate.
Materialization Progression
Start with views, promote to tables when queries get slow, then promote to incremental models when full table rebuilds become expensive. For CRM data, this progression often happens faster than with event data because CRM tables grow through mutability (more columns, more history) rather than pure volume.
The tipping point for CRM tables is typically the activity and history tables — opportunity history, activity timelines, engagement logs — which can reach millions of rows.
Incremental Strategies for CRM on BigQuery
CRM fact tables grow fast. The right incremental strategy makes a real difference in both cost and build time.
Merge (default in dbt) requires a unique_key. It scans both source and destination tables, which gets expensive as tables grow. Best for dimension tables with frequent updates — the core CRM entity tables (Account, Contact, Opportunity) where individual records change.
Insert_overwrite (BigQuery-optimized) replaces entire partitions without a full table scan. With copy_partitions: true, BigQuery performs zero-cost partition replacement. Best for CRM activity and event tables where data arrives in time-bounded batches.
For CRM data on BigQuery, partition by date fields (opportunity__created_at, opportunity__close_at) and cluster by frequently filtered columns (opportunity__account_id, opportunity__owner_id, opportunity__stage). Clustering can reduce scanned data by 88%+ for incremental models.
{{ config( materialized='incremental', incremental_strategy='insert_overwrite', partition_by={ 'field': 'activity__occurred_at', 'data_type': 'date', 'granularity': 'day' }, cluster_by=['activity__account_id', 'activity__owner_id'] )}}The insert_overwrite strategy alone can reduce costs by 96% on large event tables compared to merge. This is especially relevant for CRM activity data where you’re dealing with engagement logs, email opens, call logs, and task completions that accumulate quickly.
The Schema Design Question
For structuring final CRM mart models — entity-separated star schema vs wide denormalized tables — see Star Schema vs One Big Table. The practical answer for CRM: entity-separated intermediate models for reusability, wide pre-joined mart models for BI performance. The Fivetran dbt packages follow this pattern with models like salesforce__opportunity_enhanced.