ServicesAboutNotesContact Get in touch →
EN FR
Note

Salesforce vs HubSpot Data Models

How Salesforce and HubSpot structure CRM data differently — metadata-driven relational models vs many-to-many associations — and what that means for warehouse modeling.

Planted
dbtbigquerydata modelingdata engineering

Salesforce and HubSpot take fundamentally different approaches to data structure. These differences shape base models, join patterns, intermediate enrichment logic, and schema design.

Salesforce: Metadata-Driven, Relational

Salesforce uses a metadata-driven, multi-tenant architecture. Under the hood, all customer data lives in shared tables, with metadata defining the schema for each tenant (org). But from a data modeling perspective, what matters is that Salesforce data is inherently relational with well-defined foreign keys.

Standard Objects

Standard objects follow predictable schemas that are consistent across every Salesforce org:

  • Account — Companies or organizations
  • Contact — People associated with accounts
  • Lead — Prospects not yet associated with accounts
  • Opportunity — Deals in the pipeline
  • Case — Support tickets
  • Campaign — Marketing campaigns and their members

Each object maps cleanly to a raw table in your warehouse, then to a base model, then to one or more intermediate or mart models.

Custom Objects

Custom objects carry a __c suffix: Territory__c, Subscription__c, Product_Usage__c. Their fields also carry the suffix: annual_revenue__c, renewal_date__c. This suffix is preserved through extraction into your warehouse, so your base models need to handle renaming them to your project’s naming conventions.

-- base__salesforce__territory.sql
SELECT
id AS territory__id,
name AS territory__name,
region__c AS territory__region,
target_revenue__c AS territory__target_revenue,
owner_id AS territory__owner_id
FROM {{ source('salesforce', 'territory__c') }}
WHERE NOT _fivetran_deleted

Relationship Types

Salesforce relationships come in two flavors:

Lookup relationships create a loose coupling with a nullable foreign key. A Contact has a lookup to Account via account_id, but the Contact can exist without an Account. In your warehouse, this means LEFT JOINs — not every Contact will have matching Account data.

Master-Detail relationships create tight coupling with cascade deletes. If the master record is deleted, all detail records go with it. The foreign key is required (non-nullable). In your warehouse, this means INNER JOINs are safe — every detail record has a master.

The practical implication: when writing your intermediate models, check whether a Salesforce relationship is Lookup or Master-Detail. It determines whether you use LEFT JOIN or INNER JOIN, and whether you need to handle NULLs in the foreign key column.

Polymorphic Relationships

Salesforce has polymorphic fields like WhoId on Task and Event, which can reference either a Contact or a Lead. And WhatId, which can reference an Account, Opportunity, or custom object. These require CASE-based routing in your intermediate layer:

CASE
WHEN who_id LIKE '003%' THEN 'Contact'
WHEN who_id LIKE '00Q%' THEN 'Lead'
END AS who_type

Salesforce ID prefixes are deterministic — 003 is always Contact, 00Q is always Lead, 001 is always Account, 006 is always Opportunity. This prefix-based routing is reliable but non-obvious if you haven’t worked with Salesforce before.

HubSpot: Association-Based, Many-to-Many

HubSpot takes a fundamentally different approach. Instead of foreign keys on the objects themselves, HubSpot uses a separate associations system to connect objects.

Core Objects

HubSpot’s core objects are simpler than Salesforce’s standard objects:

  • Contacts — People
  • Companies — Organizations
  • Deals — Sales opportunities
  • Tickets — Support requests

Properties are prefixed with property_ in the extracted data: property_email, property_firstname, property_lifecyclestage. Your base models rename these:

-- base__hubspot__contact.sql
SELECT
id AS contact__id,
property_email AS contact__email,
property_firstname AS contact__first_name,
property_lastname AS contact__last_name,
property_lifecyclestage AS contact__lifecycle_stage,
property_hs_lead_status AS contact__lead_status,
property_createdate AS contact__created_at
FROM {{ source('hubspot', 'contact') }}
WHERE NOT _fivetran_deleted

The Association Model

This is where HubSpot diverges sharply from Salesforce. In Salesforce, a Contact belongs to one Account (via account_id). In HubSpot, a Contact can be associated with multiple Companies, and a Deal can involve multiple Contacts in different roles.

Associations live in separate bridge tables:

  • contact_company — Links contacts to companies (many-to-many)
  • deal_contact — Links deals to contacts (many-to-many)
  • deal_company — Links deals to companies (many-to-many)

Each association can carry an optional label like “Decision Maker”, “Billing Contact”, or “Primary Company”. These labels add semantic meaning to the relationship but complicate your join logic.

In practice, this means your intermediate models need explicit bridge table joins:

-- int__deal_contact_enriched.sql
WITH
deals AS (
SELECT deal__id, deal__name, deal__amount, deal__stage
FROM {{ ref('base__hubspot__deal') }}
),
contacts AS (
SELECT contact__id, contact__email, contact__first_name
FROM {{ ref('base__hubspot__contact') }}
),
associations AS (
SELECT
deal_id AS deal__id,
contact_id AS contact__id,
label AS association__label
FROM {{ ref('base__hubspot__deal_contact') }}
)
SELECT
deals.deal__id,
deals.deal__name,
deals.deal__amount,
deals.deal__stage,
contacts.contact__id,
contacts.contact__email,
contacts.contact__first_name,
associations.association__label
FROM deals
INNER JOIN associations ON deals.deal__id = associations.deal__id
INNER JOIN contacts ON associations.contact__id = contacts.contact__id

The output is one row per deal-contact pair, not one row per deal. This is a different grain than what you’d get from a Salesforce opportunity model, and downstream logic needs to account for it.

Primary Association Pattern

When you need one company per contact (to match a Salesforce-like model), you have to pick a “primary” association. Common approaches:

  • Filter on a specific label (e.g., WHERE association__label = 'Primary')
  • Take the earliest association by creation date
  • Take the company with the most associated contacts (a proxy for “main” company)

None of these are perfect. HubSpot’s data model genuinely supports many-to-many, so reducing it to one-to-one always involves a business decision about which association takes precedence.

Modeling Implications

The structural differences have direct consequences for your dbt project:

ConcernSalesforceHubSpot
Join patternDirect foreign key joinsBridge table joins
Grain managementStraightforward — FK preserves grainFan-out risk from many-to-many
Base model countOne per objectOne per object + one per bridge table
Primary relationshipBuilt-in (FK on child)Must be derived from associations
Custom fields__c suffix, rename in baseproperty_ prefix, rename in base

Salesforce modeling ends up being mostly about handling mutability, hierarchy (Account → Contact → Opportunity), and extraction quirks like formula fields. HubSpot modeling is about resolving many-to-many relationships correctly and deciding how to flatten associations for reporting.

Both systems benefit from the same three-layer dbt architecture, but the intermediate layer does very different work for each. For Salesforce, intermediate enriches entities with related data. For HubSpot, intermediate resolves association complexity into queryable joins.