ServicesAboutNotesContact Get in touch →
EN FR
Note

CRM Data Architecture Hub

Hub note connecting all garden notes on modeling Salesforce and HubSpot data in a modern warehouse with dbt and BigQuery.

Planted
dbtbigquerydata modelingdata engineering

CRM data is highly requested and poorly trusted in most warehouses. These notes cover the architecture patterns that turn Salesforce and HubSpot data into reliable warehouse models using dbt and BigQuery.

The Challenges

CRM Data Extraction Challenges — Why CRM data is harder than most sources: mutability, API-based extraction, soft deletes, formula field blind spots, and rate limits. Start here if you’re new to CRM data warehousing.

Understanding the Source Systems

Salesforce vs HubSpot Data Models — How the two CRMs structure data differently. Salesforce uses metadata-driven relational models with lookup and master-detail relationships. HubSpot uses many-to-many associations with bridge tables and optional labels. This structural difference shapes every modeling decision.

Modeling Patterns

CRM Modeling Patterns in dbt — How to apply the base-intermediate-mart pattern to CRM data. Concrete code examples for both Salesforce and HubSpot, including base model conventions, intermediate enrichment, mart design, and incremental strategies for BigQuery.

SCD Type 2 with dbt Snapshots — Tracking historical changes to CRM records. Timestamp vs check strategies, Fivetran History Mode as an alternative, querying point-in-time state, and stage duration analysis.

Schema Design

Star Schema vs One Big Table — When to use entity-separated models vs wide denormalized tables. BigQuery performance benchmarks, the practical answer of building both at different layers, and how the Fivetran packages implement this pattern.

HubSpot-Specific Patterns

HubSpot Bigquery Pipeline Hub — The full index of HubSpot-specific garden notes: associations, lifecycle stages, deal stages, and property history.

HubSpot Association Bridge Tables — How HubSpot’s many-to-many association model requires bridge tables at every layer, the fan-out problem, and primary company resolution.

HubSpot Lifecycle Stages in the Warehouse — The “Became a Stage Date” model, forward-only transitions, funnel mart patterns, and merged contact artifacts.

HubSpot Deal Stage Modeling — The DEAL_STAGE table, is_closed vs label columns, time-in-stage, and conversion rate analysis.

HubSpot Property History Mechanics — Retention limits, CALCULATED property inflation, and deduplication at the base layer.

Packages and Tooling

Fivetran dbt Packages for CRM — What dbt_salesforce and dbt_hubspot provide out of the box: model coverage, pass-through columns for custom fields, history mode support, multi-org configuration, and naming convention tradeoffs.

These existing notes provide the foundational concepts that CRM modeling builds on: