ServicesAboutNotesContact Get in touch →
EN FR
Note

Salesforce Polymorphic Relationship Resolution

How to resolve Salesforce's WhoId and WhatId polymorphic foreign keys in the warehouse using ID prefix routing — the pattern, the SQL, and where it recurs.

Planted
dbtbigquerydata modelingdata engineering

Salesforce polymorphic relationships are foreign keys that can point to different object types depending on the record. There is no foreign key constraint indicating which table to join — a single column might reference a Contact in one row and a Lead in another. This is how Salesforce models activities that can relate to any entity.

Where Polymorphic Fields Appear

The two most common polymorphic fields are on Tasks and Events:

  • WhoId — references a person. Can point to a Contact (ID prefix 003) or a Lead (00Q).
  • WhatId — references a thing. Can point to an Account (001), an Opportunity (006), a Case (500), or a custom object.

These fields exist because a single Task — say, a phone call — might be logged against a Contact and linked to an Opportunity. The WhoId tells you who the call was with, and the WhatId tells you what it was about. Both fields can hold IDs from different tables, and the only way to know which table is to decode the ID prefix.

Polymorphic lookups also appear on custom objects when a Salesforce admin configures a field as a “Polymorphic Lookup” type. Any custom object with a WhoId-style field follows the same prefix-routing pattern.

The ID Prefix System

Salesforce IDs are 18-character strings (or 15 in legacy contexts). The first three characters identify the object type. These prefixes are deterministic and globally consistent across all Salesforce orgs:

PrefixObject
001Account
003Contact
006Opportunity
00QLead
500Case
701Campaign

Custom objects get their own prefixes, assigned when the object is created. You can find these in Salesforce’s Setup under the custom object’s definition, or by querying the EntityDefinition object via the API.

The Resolution Pattern

The standard approach is a CASE statement on LEFT(id, 3) that maps prefixes to object type names. This goes in an intermediate model, not a base model — it’s business logic (interpreting what the ID means), which belongs in the intermediate layer.

-- int__salesforce_activity_resolved.sql
WITH
activities AS (
SELECT
activity__id,
activity__who_id,
activity__what_id,
activity__subject,
activity__type,
activity__occurred_at
FROM {{ ref('base__salesforce__task') }}
)
SELECT
activity__id,
activity__subject,
activity__type,
activity__occurred_at,
CASE LEFT(activity__who_id, 3)
WHEN '003' THEN 'Contact'
WHEN '00Q' THEN 'Lead'
END AS activity__who_type,
CASE LEFT(activity__what_id, 3)
WHEN '001' THEN 'Account'
WHEN '006' THEN 'Opportunity'
WHEN '500' THEN 'Case'
END AS activity__what_type,
activity__who_id,
activity__what_id
FROM activities

The activity__who_type and activity__what_type columns now tell downstream models which table to join against. Without this resolution step, you’d need to attempt LEFT JOINs against every possible referenced table and pick whichever one matched — that works but is expensive and unclear.

Joining to Referenced Tables

Once you’ve resolved the type, join to each referenced table in separate CTEs based on the resolved type:

WITH
resolved AS (
SELECT * FROM {{ ref('int__salesforce_activity_resolved') }}
),
contacts AS (
SELECT contact__id, contact__full_name, contact__email
FROM {{ ref('base__salesforce__contact') }}
),
leads AS (
SELECT lead__id, lead__full_name, lead__email
FROM {{ ref('base__salesforce__lead') }}
)
SELECT
resolved.activity__id,
resolved.activity__subject,
resolved.activity__who_type,
COALESCE(contacts.contact__full_name, leads.lead__full_name) AS activity__who_name,
COALESCE(contacts.contact__email, leads.lead__email) AS activity__who_email
FROM resolved
LEFT JOIN contacts
ON resolved.activity__who_type = 'Contact'
AND resolved.activity__who_id = contacts.contact__id
LEFT JOIN leads
ON resolved.activity__who_type = 'Lead'
AND resolved.activity__who_id = leads.lead__id

The key detail: the JOIN condition includes the type check (resolved.activity__who_type = 'Contact'). This ensures you only attempt to join against the correct table for each row. Without the type filter in the join condition, you’d get spurious matches when IDs happen to collide across tables (rare but possible with custom objects).

Making It Reusable

If your Salesforce org has many objects with polymorphic fields, consider a dbt macro that generates the CASE statement from a configuration list:

{% macro resolve_salesforce_polymorphic(column_name, mappings) %}
CASE LEFT({{ column_name }}, 3)
{% for prefix, object_type in mappings.items() %}
WHEN '{{ prefix }}' THEN '{{ object_type }}'
{% endfor %}
END
{% endmacro %}

Called as:

{{ resolve_salesforce_polymorphic('activity__who_id', {'003': 'Contact', '00Q': 'Lead'}) }}
AS activity__who_type

This keeps the prefix-to-type mappings in one place rather than scattered across multiple intermediate models. When your Salesforce admin adds a new custom object that uses polymorphic lookups, you update the macro’s mapping dictionary once.

Common Mistakes

Forgetting custom objects. The standard prefixes cover standard Salesforce objects, but if your org has custom objects referenced by polymorphic fields (e.g., a custom Project__c object linked via WhatId), the CASE statement returns NULL for those rows. Always check what your org actually has in WhatId before assuming the standard prefix list is complete. A quick SELECT DISTINCT LEFT(what_id, 3) query on your raw data reveals the full set.

Hardcoding in multiple models. If the polymorphic resolution logic lives in three different intermediate models, they’ll inevitably drift. Centralize it — either in a single intermediate model that downstream models reference, or in a macro.

Ignoring NULL IDs. WhoId and WhatId are nullable. A Task can exist without a WhoId (it’s not associated with a person) or without a WhatId (it’s not linked to a specific object). Your resolution model should handle NULLs gracefully — the CASE statement naturally returns NULL for NULL inputs, but make sure downstream models don’t filter out records with NULL resolved types unless that’s intentionally desired.

The Broader Pattern

Polymorphic resolution is central to the unified activity timeline pattern, where Tasks and Events are combined into a single view with resolved entity references. It also applies to any reporting that needs to answer “what was this activity about?” — which is essentially every sales activity report.

The pattern doesn’t exist in HubSpot’s data model, which uses explicit association tables instead of polymorphic foreign keys. HubSpot’s approach is more verbose but avoids the prefix-routing complexity entirely.