Salesforce vs HubSpot Data Models

Salesforce and HubSpot take fundamentally different approaches to data structure. These differences shape base models, join patterns, intermediate enrichment logic, and schema design.

Salesforce: Metadata-Driven, Relational

Salesforce uses a metadata-driven, multi-tenant architecture. Under the hood, all customer data lives in shared tables, with metadata defining the schema for each tenant (org). But from a data modeling perspective, what matters is that Salesforce data is inherently relational with well-defined foreign keys.

Standard Objects

Standard objects follow predictable schemas that are consistent across every Salesforce org:

Account — Companies or organizations
Contact — People associated with accounts
Lead — Prospects not yet associated with accounts
Opportunity — Deals in the pipeline
Case — Support tickets
Campaign — Marketing campaigns and their members

Each object maps cleanly to a raw table in your warehouse, then to a base model, then to one or more intermediate or mart models.

Custom Objects

Custom objects carry a __c suffix: Territory__c, Subscription__c, Product_Usage__c. Their fields also carry the suffix: annual_revenue__c, renewal_date__c. This suffix is preserved through extraction into your warehouse, so your base models need to handle renaming them to your project’s naming conventions.

-- base__salesforce__territory.sql
SELECT
    id                      AS territory__id,
    name                    AS territory__name,
    region__c               AS territory__region,
    target_revenue__c       AS territory__target_revenue,
    owner_id                AS territory__owner_id
FROM {{ source('salesforce', 'territory__c') }}
WHERE NOT _fivetran_deleted

Relationship Types

Salesforce relationships come in two flavors:

Lookup relationships create a loose coupling with a nullable foreign key. A Contact has a lookup to Account via account_id, but the Contact can exist without an Account. In your warehouse, this means LEFT JOINs — not every Contact will have matching Account data.

Master-Detail relationships create tight coupling with cascade deletes. If the master record is deleted, all detail records go with it. The foreign key is required (non-nullable). In your warehouse, this means INNER JOINs are safe — every detail record has a master.

The practical implication: when writing your intermediate models, check whether a Salesforce relationship is Lookup or Master-Detail. It determines whether you use LEFT JOIN or INNER JOIN, and whether you need to handle NULLs in the foreign key column.

Polymorphic Relationships

Salesforce has polymorphic fields like WhoId on Task and Event, which can reference either a Contact or a Lead. And WhatId, which can reference an Account, Opportunity, or custom object. These require CASE-based routing in your intermediate layer:

CASE
    WHEN who_id LIKE '003%' THEN 'Contact'
    WHEN who_id LIKE '00Q%' THEN 'Lead'
END AS who_type

Salesforce ID prefixes are deterministic — 003 is always Contact, 00Q is always Lead, 001 is always Account, 006 is always Opportunity. This prefix-based routing is reliable but non-obvious if you haven’t worked with Salesforce before.

HubSpot: Association-Based, Many-to-Many

HubSpot takes a fundamentally different approach. Instead of foreign keys on the objects themselves, HubSpot uses a separate associations system to connect objects.

Core Objects

HubSpot’s core objects are simpler than Salesforce’s standard objects:

Contacts — People
Companies — Organizations
Deals — Sales opportunities
Tickets — Support requests

Properties are prefixed with property_ in the extracted data: property_email, property_firstname, property_lifecyclestage. Your base models rename these:

-- base__hubspot__contact.sql
SELECT
    id                              AS contact__id,
    property_email                  AS contact__email,
    property_firstname              AS contact__first_name,
    property_lastname               AS contact__last_name,
    property_lifecyclestage         AS contact__lifecycle_stage,
    property_hs_lead_status         AS contact__lead_status,
    property_createdate             AS contact__created_at
FROM {{ source('hubspot', 'contact') }}
WHERE NOT _fivetran_deleted

The Association Model

This is where HubSpot diverges sharply from Salesforce. In Salesforce, a Contact belongs to one Account (via account_id). In HubSpot, a Contact can be associated with multiple Companies, and a Deal can involve multiple Contacts in different roles.

Associations live in separate bridge tables:

contact_company — Links contacts to companies (many-to-many)
deal_contact — Links deals to contacts (many-to-many)
deal_company — Links deals to companies (many-to-many)

Each association can carry an optional label like “Decision Maker”, “Billing Contact”, or “Primary Company”. These labels add semantic meaning to the relationship but complicate your join logic.

In practice, this means your intermediate models need explicit bridge table joins:

-- int__deal_contact_enriched.sql
WITH

deals AS (
    SELECT deal__id, deal__name, deal__amount, deal__stage
    FROM {{ ref('base__hubspot__deal') }}
),

contacts AS (
    SELECT contact__id, contact__email, contact__first_name
    FROM {{ ref('base__hubspot__contact') }}
),

associations AS (
    SELECT
        deal_id     AS deal__id,
        contact_id  AS contact__id,
        label       AS association__label
    FROM {{ ref('base__hubspot__deal_contact') }}
)

SELECT
    deals.deal__id,
    deals.deal__name,
    deals.deal__amount,
    deals.deal__stage,
    contacts.contact__id,
    contacts.contact__email,
    contacts.contact__first_name,
    associations.association__label
FROM deals
INNER JOIN associations ON deals.deal__id = associations.deal__id
INNER JOIN contacts ON associations.contact__id = contacts.contact__id

The output is one row per deal-contact pair, not one row per deal. This is a different grain than what you’d get from a Salesforce opportunity model, and downstream logic needs to account for it.

Primary Association Pattern

When you need one company per contact (to match a Salesforce-like model), you have to pick a “primary” association. Common approaches:

Filter on a specific label (e.g., WHERE association__label = 'Primary')
Take the earliest association by creation date
Take the company with the most associated contacts (a proxy for “main” company)

None of these are perfect. HubSpot’s data model genuinely supports many-to-many, so reducing it to one-to-one always involves a business decision about which association takes precedence.

Modeling Implications

The structural differences have direct consequences for your dbt project:

Concern	Salesforce	HubSpot
Join pattern	Direct foreign key joins	Bridge table joins
Grain management	Straightforward — FK preserves grain	Fan-out risk from many-to-many
Base model count	One per object	One per object + one per bridge table
Primary relationship	Built-in (FK on child)	Must be derived from associations
Custom fields	`__c` suffix, rename in base	`property_` prefix, rename in base

Salesforce modeling ends up being mostly about handling mutability, hierarchy (Account → Contact → Opportunity), and extraction quirks like formula fields. HubSpot modeling is about resolving many-to-many relationships correctly and deciding how to flatten associations for reporting.

Both systems benefit from the same three-layer dbt architecture, but the intermediate layer does very different work for each. For Salesforce, intermediate enriches entities with related data. For HubSpot, intermediate resolves association complexity into queryable joins.