Lead scoring assigns a numeric value to each lead reflecting conversion likelihood. CRM-native scoring is limited to what lives inside the CRM — job title, company size, form submissions, email opens. Warehouse-based scoring can incorporate GA4 web analytics, product usage telemetry, support ticket history, and billing signals.
Four signal categories cover the inputs to most lead scoring models.
Demographic Fit
Who is this person? Demographic signals capture individual-level identity and professional context.
The most predictive demographic signals are usually job seniority and department. A VP of Engineering at a SaaS company evaluating developer tooling is a fundamentally different lead than an intern from the same company. Seniority signals authority over the purchase decision. Department signals relevance to what you sell.
Job title as a raw string is noisy — thousands of variations for similar roles. If you’re pulling job titles from HubSpot or Salesforce, consider normalizing them in your intermediate layer before scoring:
-- Normalize raw job titles to seniority levelsCASE WHEN LOWER(contact__job_title) LIKE ANY ('%ceo%', '%cto%', '%cfo%', '%coo%', '%chief%') THEN 'C-Level' WHEN LOWER(contact__job_title) LIKE ANY ('%vp%', '%vice president%') THEN 'VP' WHEN LOWER(contact__job_title) LIKE ANY ('%director%') THEN 'Director' WHEN LOWER(contact__job_title) LIKE ANY ('%manager%', '%lead%', '%head of%') THEN 'Manager' ELSE 'Individual Contributor'END AS contact__seniority_bandNormalize first, score second. Raw titles fed directly into scoring rules create maintenance hell as soon as you encounter “VP-level” or “Dir.” or “Chief of Staff.”
Firmographic Fit
What kind of company are they from? Firmographic signals capture company-level fit against your ideal customer profile (ICP).
Key firmographic signals: employee count, industry, annual revenue, technology stack (if you can infer it). These determine whether this company is even in your addressable market, regardless of how engaged the lead is.
A 500-person SaaS company in your target vertical scores higher than a 5-person agency outside your ICP — even if the agency lead has filled out twice as many forms. Firmographic fit is a multiplier on engagement signals. High engagement from a bad-fit company is still a bad lead.
Firmographic data usually comes from the CRM (whatever your sales team manually enters or your enrichment tool fills in), but the warehouse is where you normalize it for scoring. This is also where you apply your ICP definition — scoring 0 for unqualified industries, for example, rather than leaving those leads in your pipeline to waste sales time.
Behavioral Engagement
What has the lead actually done? Behavioral signals measure demonstrated intent and interest.
These signals are what separates a lead who found you once from one who’s been actively evaluating you. High-signal behavioral events include pricing page views, demo requests, feature comparison page views, and form submissions. Medium-signal events include email opens, content downloads, and blog visits. Low-signal events include homepage visits and unsubscribes (which are actually negative signals).
The important pattern here is that the warehouse can pull behavioral signals that the CRM never sees. A lead who spent 45 minutes on your pricing page in a session that started from a Google Ads click — that’s a GA4 event. A lead who opened the app three times last week but hasn’t upgraded — that’s product telemetry. The CRM captures what the lead told you. The warehouse captures what the lead showed you.
Recency
When did they do it? Recency weights recent activity above older activity.
A lead who requested a demo yesterday is a different situation than one who requested a demo six months ago and then went quiet. The score should reflect that difference.
The standard approach is time decay: instead of adding the raw event count to the score, apply a decay factor based on how long ago the event occurred.
-- Exponential decay: halves the score every ~7 dayslead__score * EXP(-0.1 * DATE_DIFF(CURRENT_DATE(), event__occurred_at, DAY))The constant 0.1 controls the decay rate. At 0.1, the score halves roughly every 7 days. An event from yesterday counts at ~90% of face value. The same event from three weeks ago counts at ~13%.
Recency transforms behavioral scores from cumulative counters (someone who signed up two years ago accumulates more points over time regardless of current interest) into intent signals (only recent activity counts meaningfully toward the score).
Negative Signals
Scoring models focused only on positive signals allow dead leads to accumulate in the high-score tier over time. Negative signals — unsubscribes (-20), bounced emails (-15), extended inactivity (-10 for 30+ days with no activity) — should subtract from the score. Without them, high-scoring leads may not have engaged in months.
The negative signal logic belongs in the same scoring model as the positive signals — typically in a CASE WHEN block or a seed-driven scoring rules table alongside the positive signal weights.
Why This Matters for Warehouse-Based Scoring
CRM-native scoring tools handle demographic and firmographic signals reasonably well, since that data lives in the CRM. They handle some behavioral signals — email opens, form submissions, CRM-tracked activities. What they can’t touch: website analytics (GA4), product usage, support history, financial signals from your billing system.
Building scoring in the warehouse with dbt means you can incorporate all four signal categories from every data source you’ve already modeled. The intermediate feature tables feed from GA4 event data, CRM records, product telemetry — whatever you have. The scoring model joins them together.
This is the fundamental reason to move scoring out of the CRM and into the warehouse: not because warehouse tools are more powerful, but because the warehouse is where all the signal actually lives.
For the implementation of rule-based scoring using these signals, see Rule-Based Lead Scoring in dbt. For ML-based scoring that learns weights from historical conversion data, see BigQuery ML for Lead Scoring.