ServicesAboutNotesContact Get in touch →
EN FR
Note

HubSpot Lifecycle Stages in the Warehouse

How HubSpot's lifecycle stage model maps to warehouse columns, why forward-only transitions make funnel analysis straightforward, and how to handle merged contact artifacts.

Planted
dbtbigquerydata modelinganalyticsdata quality

HubSpot’s lifecycle stage model is unusually friendly to warehouse analytics. Most CRM stage tracking requires either a dedicated history table or snapshot-based reconstruction to know when a contact transitioned between stages. HubSpot does something simpler: every contact gets a dedicated timestamp property for each stage they pass through.

The standard progression is: Subscriber → Lead → Marketing Qualified Lead (MQL) → Sales Qualified Lead (SQL) → Opportunity → Customer → Evangelist. When a contact enters each stage, HubSpot stamps a “Became a [Stage] Date” property on the contact record itself:

  • property_became_a_lead_date
  • property_became_a_marketing_qualified_lead_date
  • property_became_a_sales_qualified_lead_date
  • property_became_a_customer_date

These are first-touch timestamps — they record when the contact first reached that stage, and they don’t get overwritten if the contact is somehow reassigned. This makes them reliable anchors for funnel analysis.

The Forward-Only Constraint

By default, HubSpot prevents lifecycle stage regression. Once a contact is marked as a Customer, HubSpot won’t let you move them back to Lead through normal UI actions. This is a feature for data integrity, not a bug.

The practical result: if a contact has a became_customer_at timestamp, it means they genuinely reached customer status at that point. You don’t need to worry about a stage reversal invalidating the timestamp. The date means what it says.

The exception is custom lifecycle stage configurations where admins have disabled this constraint. If your HubSpot portal has non-standard stage settings, verify the forward-only rule is enforced before treating these timestamps as reliable.

Extracting Lifecycle Timestamps in the Base Model

Map the lifecycle stage properties explicitly in your base model:

-- base__hubspot__contact.sql
SELECT
id AS contact_id,
property_email AS contact__email,
property_lifecyclestage AS contact__lifecycle_stage,
property_hs_lead_status AS contact__lead_status,
CAST(property_createdate AS TIMESTAMP) AS contact__created_at,
CAST(property_became_a_lead_date AS TIMESTAMP) AS contact__became_lead_at,
CAST(property_became_a_marketing_qualified_lead_date
AS TIMESTAMP) AS contact__became_mql_at,
CAST(property_became_a_sales_qualified_lead_date
AS TIMESTAMP) AS contact__became_sql_at,
CAST(property_became_a_customer_date AS TIMESTAMP) AS contact__became_customer_at,
_fivetran_deleted AS contact__is_deleted
FROM {{ source('hubspot', 'contact') }}
WHERE NOT COALESCE(_fivetran_deleted, FALSE)

This is enough for most funnel analysis. The timestamps are on the contact record, not buried in a history table. No joins required for basic lifecycle metrics.

If you’re using the dbt_hubspot package, configure these as pass-through columns on the package’s contacts model to make them available in the package’s output:

vars:
hubspot__contact_pass_through_columns:
- name: property_became_a_lead_date
alias: contact__became_lead_at
- name: property_became_a_marketing_qualified_lead_date
alias: contact__became_mql_at
- name: property_became_a_sales_qualified_lead_date
alias: contact__became_sql_at
- name: property_became_a_customer_date
alias: contact__became_customer_at

Building the Lifecycle Funnel Mart

With lifecycle timestamps on the contact record, the funnel model is a single aggregation:

-- mrt__marketing__lifecycle_funnel.sql
SELECT
COUNT(contact_id) AS funnel__leads,
COUNTIF(contact__became_mql_at IS NOT NULL) AS funnel__mqls,
COUNTIF(contact__became_sql_at IS NOT NULL) AS funnel__sqls,
COUNTIF(contact__became_customer_at IS NOT NULL) AS funnel__customers,
SAFE_DIVIDE(
COUNTIF(contact__became_mql_at IS NOT NULL),
COUNT(contact_id)
) AS funnel__lead_to_mql_rate,
SAFE_DIVIDE(
COUNTIF(contact__became_sql_at IS NOT NULL),
COUNTIF(contact__became_mql_at IS NOT NULL)
) AS funnel__mql_to_sql_rate,
SAFE_DIVIDE(
COUNTIF(contact__became_customer_at IS NOT NULL),
COUNTIF(contact__became_sql_at IS NOT NULL)
) AS funnel__sql_to_customer_rate
FROM {{ ref('hubspot__contacts') }}
WHERE contact__became_lead_at IS NOT NULL

Add a time dimension to track how conversion rates shift month over month:

SELECT
DATE_TRUNC(contact__became_lead_at, MONTH) AS cohort_month,
COUNT(contact_id) AS funnel__leads,
COUNTIF(contact__became_mql_at IS NOT NULL) AS funnel__mqls,
SAFE_DIVIDE(
COUNTIF(contact__became_mql_at IS NOT NULL),
COUNT(contact_id)
) AS funnel__lead_to_mql_rate
FROM {{ ref('hubspot__contacts') }}
WHERE contact__became_lead_at IS NOT NULL
GROUP BY 1
ORDER BY 1

This groups contacts by the month they first became a lead and shows conversion rates for each cohort — useful for tracking whether lead quality is improving or degrading over time.

The Merged Contact Problem

Merged contacts are the main source of lifecycle data anomalies. When HubSpot merges two contact records, the surviving contact inherits properties from both. If the merged-away contact had a became_a_customer_date from a year ago and the surviving contact’s became_a_lead_date is more recent, you end up with a contact that “became a customer” before they “became a lead.”

These sequences are chronologically impossible in real lifecycle progression. They’re almost always merge artifacts.

How to detect them in your base model or a data quality check:

-- dbt test or quality model to flag merge artifacts
SELECT
contact_id,
contact__became_lead_at,
contact__became_mql_at,
contact__became_sql_at,
contact__became_customer_at
FROM {{ ref('base__hubspot__contact') }}
WHERE
-- Customer before lead
(contact__became_customer_at < contact__became_lead_at)
-- MQL before lead
OR (contact__became_mql_at < contact__became_lead_at)
-- SQL before MQL
OR (contact__became_sql_at < contact__became_mql_at)
-- Customer before SQL
OR (contact__became_customer_at < contact__became_sql_at)

What to do with flagged records depends on your analysis goals. For funnel volume counts, you may want to exclude them entirely. For revenue attribution, you may want to keep them but exclude them from stage duration calculations (days from lead to MQL is meaningless if the dates are inverted). Document whatever decision you make — these records will come up in QA conversations.

The merge artifact issue isn’t unique to HubSpot, but the lifecycle timestamp model surfaces it clearly. It’s actually useful signal: if you’re seeing a lot of inverted sequences, it suggests active contact merging activity in HubSpot that may be worth investigating from a data hygiene perspective.

Velocity Metrics

The lifecycle timestamps enable velocity calculations that are genuinely useful for sales and marketing:

SELECT
contact_id,
DATE_DIFF(contact__became_mql_at, contact__became_lead_at, DAY) AS contact__days_lead_to_mql,
DATE_DIFF(contact__became_sql_at, contact__became_mql_at, DAY) AS contact__days_mql_to_sql,
DATE_DIFF(contact__became_customer_at, contact__became_sql_at, DAY) AS contact__days_sql_to_customer
FROM {{ ref('hubspot__contacts') }}
WHERE
contact__became_customer_at IS NOT NULL
-- Exclude merge artifacts
AND contact__became_customer_at > contact__became_lead_at
AND contact__became_sql_at > contact__became_mql_at

These velocity metrics belong in a contact 360 mart or as standalone metrics. Segmenting by acquisition channel, lead source, or deal owner shows differences in stage velocity across segments.

For how lifecycle stage data connects to deal tracking, see HubSpot Deal Stage Modeling. For the full pipeline context, see the HubSpot to BigQuery guide.