HubSpot uses many-to-many associations with optional labels rather than Salesforce’s lookup and master-detail relationships. Field history is distributed across separate per-object property history tables with retention limits rather than a single history object. These structural differences require distinct modeling patterns at every layer.
This hub covers the full HubSpot-to-BigQuery pipeline. The companion article is HubSpot to BigQuery: a complete pipeline with dbt. These notes isolate individual concepts from that article.
Ingestion and Structure
Salesforce vs HubSpot Data Models — Structural differences between HubSpot’s association-based model and Salesforce’s foreign-key model, and how those differences affect downstream modeling.
Choosing Between Fivetran, Airbyte, and dlt — Ingestion tool decision framework. HubSpot-specific considerations: Fivetran covers 50+ tables, Airbyte has a hard 10K result cap on the CRM Search API, dlt provides core objects with a companion dbt package, and HubSpot’s native BigQuery connector covers fewer objects but costs nothing.
Core Data Model Challenges
HubSpot Association Bridge Tables — HubSpot has no foreign keys on core objects. Every relationship — contact to company, deal to contact, deal to company — flows through a bridge table. Covers modeling associations correctly, handling the fan-out problem, and resolving the primary company question without losing multi-company contacts.
HubSpot Lifecycle Stages in the Warehouse — HubSpot stamps a “Became a Stage Date” property on contacts for each lifecycle stage reached. Covers how forward-only lifecycle transitions work, how to extract stage timestamps in base models, how to build the lifecycle funnel mart, and how to detect merged contact artifacts that produce impossible date sequences.
HubSpot Deal Stage Modeling — Deal stage transitions live in DEAL_STAGE, not DEAL_PROPERTY_HISTORY. Covers how the is_closed and label columns work together, time-in-stage patterns, and pipeline conversion rate models.
HubSpot Property History Mechanics — Per-object property history tables have retention caps (45 values for contacts, 20 for others). The CALCULATED property type inflates sync costs by always updating cursor timestamps even when values haven’t changed. Covers detection, deduplication at the base layer, and enabling history in the dbt_hubspot package.
The dbt_hubspot Package
Fivetran dbt Packages for CRM — dbt_hubspot (v1.6.1) generates 147 models when fully enabled. Covers configuring only the modules you need, pass-through columns for custom properties, and building on top of the package vs replacing it with custom models. The package uses insert_overwrite as its BigQuery incremental strategy.
Related CRM Notes
- CRM Data Architecture Hub — The broader CRM modeling landscape
- CRM Data Extraction Challenges — Mutability, soft deletes, rate limits, and API-based extraction
- CRM Modeling Patterns in dbt — The three-layer pattern applied to CRM data
- SCD Type 2 with dbt Snapshots — Tracking historical CRM record state
- Star Schema vs One Big Table — Mart design decisions for CRM analytics