ServicesAboutNotesContact Get in touch →
EN FR
Note

HubSpot Associations as Bridge Tables

HubSpot's many-to-many association model requires bridge tables at every layer. How to model them correctly, handle fan-out, and resolve the primary company problem.

Planted
dbtbigquerydata modelingdata engineering

HubSpot’s data model does not place object relationships on the objects themselves. There is no company_id column on the contact table, no foreign key pointing from a deal to its associated contacts. Every relationship — contact to company, contact to deal, company to deal, deal to ticket — flows through a dedicated bridge table.

This differs from Salesforce, where a Contact has an AccountId field pointing to its Account. In HubSpot, joining contacts to companies requires passing through contact_company. Without this, multi-company contacts are missed entirely and join counts are wrong.

The Bridge Table Inventory

Fivetran’s HubSpot connector surfaces these association tables, among others:

  • contact_company — contacts to companies (many-to-many)
  • contact_deal — contacts to deals (many-to-many)
  • deal_company — deals to companies (many-to-many)
  • deal_ticket — deals to support tickets (many-to-many)
  • engagement_contact — engagement records to contacts

Each table has at minimum: the two object IDs, an optional label column, and metadata like _fivetran_synced. The label column is where things get interesting.

Association Labels

On HubSpot Pro and Enterprise, associations can carry a label: “Decision Maker,” “Billing Contact,” “Primary Company,” “Champion.” These labels are defined in HubSpot’s settings and attached at the relationship level, not the contact level. A contact can be a “Decision Maker” on one deal and a “Billing Contact” on another.

Labels are worth extracting. They encode buying committee structure and contact roles in a way that would otherwise require custom properties. A contact_deal association where label = 'Decision Maker' is analytically meaningful — these contacts matter more for deal velocity analysis than passive participants.

Your base model for association tables should preserve the label:

-- base__hubspot__contact_company.sql
SELECT
contact_id,
company_id,
type_id AS association__type_id,
label AS association__label,
_fivetran_synced AS association__synced_at
FROM {{ source('hubspot', 'contact_company') }}

No soft-delete filter needed here — these tables track relationships, not records. A deleted contact’s associations disappear when the contact record is removed.

The Fan-Out Problem

When you join through association tables, your row count multiplies. A contact associated with three companies appears three times in a naive join against contact_company. This is correct behavior for the association grain, but it’s almost never what a mart model wants.

The symptom: you join contacts to companies and run COUNT(DISTINCT contact_id) to get contact counts per company. The results look off. Companies with lots of shared contacts report inflated numbers because you’re counting at the wrong grain.

The solution depends on the question you’re answering.

If you need one row per contact with some company context, pick a primary association:

-- int__contact_primary_company.sql
SELECT
contact_id,
company_id,
association__label
FROM {{ ref('base__hubspot__contact_company') }}
QUALIFY ROW_NUMBER() OVER (
PARTITION BY contact_id
ORDER BY association__synced_at ASC
) = 1

This takes the earliest-synced company as the “primary” — a reasonable proxy for the original company relationship. If your team has established a label convention like “Primary Company,” filter on that instead:

QUALIFY ROW_NUMBER() OVER (
PARTITION BY contact_id
ORDER BY
CASE WHEN association__label = 'Primary Company' THEN 0 ELSE 1 END ASC,
association__synced_at ASC
) = 1

If you need aggregate metrics about each company, aggregate from the bridge rather than joining:

-- int__company_contact_metrics.sql
SELECT
company_id,
COUNT(DISTINCT contact_id) AS company__contacts,
COUNT(DISTINCT CASE
WHEN association__label = 'Decision Maker'
THEN contact_id
END) AS company__decision_makers
FROM {{ ref('base__hubspot__contact_company') }}
GROUP BY company_id

This stays at the company grain and handles multi-company contacts correctly. The intermediate model exposes these metrics to mart models that need them.

Building One Intermediate Model Per Association Type

Don’t try to handle all association types in one model. Build a focused intermediate model for each pairing:

  • int__contact_company_mapped.sql — contact-to-company with primary resolution
  • int__contact_deal_mapped.sql — contact-to-deal pairs with labels
  • int__company_deal_mapped.sql — company-to-deal relationships

These intermediate models become the canonical join path for their association type. When your mart models need to connect contacts to deals, they reference int__contact_deal_mapped, not the raw bridge table. This keeps the fan-out logic in one place.

-- int__contact_deal_mapped.sql
SELECT
contact_id,
deal_id,
association__label,
association__synced_at
FROM {{ ref('base__hubspot__contact_deal') }}

The mart then joins on this:

-- mrt__sales__deal_contacts.sql
SELECT
d.deal_id,
d.deal__name,
d.deal__amount,
c.contact_id,
c.contact__email,
cdm.association__label
FROM {{ ref('hubspot__deals') }} AS d
INNER JOIN {{ ref('int__contact_deal_mapped') }} AS cdm
ON d.deal_id = cdm.deal_id
LEFT JOIN {{ ref('hubspot__contacts') }} AS c
ON cdm.contact_id = c.contact_id

The grain here is one row per deal-contact pair, which is the right grain for a deal-contacts model. Any aggregation happens downstream or in a separate mart.

The Primary Company Question Has No Perfect Answer

HubSpot’s association model genuinely supports many-to-many relationships, so any strategy to reduce it to one-to-one involves a business decision. The three approaches you’ll see in practice:

Earliest association wins. The first company a contact was linked to is their “primary.” Simple, deterministic, but may not reflect reality if a contact switches companies.

Label-based selection. If your team uses a “Primary Company” label consistently, use that. This requires enforcement of the labeling convention in HubSpot itself — if sales reps don’t label, this breaks.

Most recent sync. Take the most recently synced association. This tends to reflect the “current” company relationship but is volatile if the sync timestamp isn’t meaningful.

Whatever you choose, document it as a business rule in your intermediate model. Other mart models will build on this assumption, and the choice should be explicit and traceable.

For deeper context on how this differs from Salesforce’s foreign-key model, see Salesforce vs HubSpot Data Models. For the full HubSpot pipeline, see the HubSpot to BigQuery guide.