CRM data is highly requested and poorly trusted in most warehouses. These notes cover the architecture patterns that turn Salesforce and HubSpot data into reliable warehouse models using dbt and BigQuery.
The Challenges
CRM Data Extraction Challenges — Why CRM data is harder than most sources: mutability, API-based extraction, soft deletes, formula field blind spots, and rate limits. Start here if you’re new to CRM data warehousing.
Understanding the Source Systems
Salesforce vs HubSpot Data Models — How the two CRMs structure data differently. Salesforce uses metadata-driven relational models with lookup and master-detail relationships. HubSpot uses many-to-many associations with bridge tables and optional labels. This structural difference shapes every modeling decision.
Modeling Patterns
CRM Modeling Patterns in dbt — How to apply the base-intermediate-mart pattern to CRM data. Concrete code examples for both Salesforce and HubSpot, including base model conventions, intermediate enrichment, mart design, and incremental strategies for BigQuery.
SCD Type 2 with dbt Snapshots — Tracking historical changes to CRM records. Timestamp vs check strategies, Fivetran History Mode as an alternative, querying point-in-time state, and stage duration analysis.
Schema Design
Star Schema vs One Big Table — When to use entity-separated models vs wide denormalized tables. BigQuery performance benchmarks, the practical answer of building both at different layers, and how the Fivetran packages implement this pattern.
HubSpot-Specific Patterns
HubSpot Bigquery Pipeline Hub — The full index of HubSpot-specific garden notes: associations, lifecycle stages, deal stages, and property history.
HubSpot Association Bridge Tables — How HubSpot’s many-to-many association model requires bridge tables at every layer, the fan-out problem, and primary company resolution.
HubSpot Lifecycle Stages in the Warehouse — The “Became a Stage Date” model, forward-only transitions, funnel mart patterns, and merged contact artifacts.
HubSpot Deal Stage Modeling — The DEAL_STAGE table, is_closed vs label columns, time-in-stage, and conversion rate analysis.
HubSpot Property History Mechanics — Retention limits, CALCULATED property inflation, and deduplication at the base layer.
Packages and Tooling
Fivetran dbt Packages for CRM — What dbt_salesforce and dbt_hubspot provide out of the box: model coverage, pass-through columns for custom fields, history mode support, multi-org configuration, and naming convention tradeoffs.
Related Architecture Notes
These existing notes provide the foundational concepts that CRM modeling builds on:
- dbt Three-Layer Architecture — The base-intermediate-mart layering pattern
- dbt Base Layer Patterns — What belongs in base models
- dbt Intermediate Layer Patterns — Enrichment without reducing grain
- dbt Mart Layer Patterns — Consumer-specific aggregation
- Incremental Models in dbt — When and how to use incremental materialization
- Incremental Strategy Decision Framework — Choosing between merge, insert_overwrite, and others
- BigQuery Partition Pruning Patterns — Partitioning and clustering for cost optimization
- dbt Package Anatomy — How dbt packages are structured
- Hybrid ELT Strategy — When to buy managed extraction vs build custom