Rule-based lead scoring is a weighted sum: each signal gets a point value, and the total score determines sales priority. It sounds simplistic, but done well in dbt, it’s version-controlled, testable, and maintainable by non-engineers. That last part is the hard part.
The Basic Pattern
The scoring model sits in your marts layer. It takes a feature table as input — a prepared set of per-lead signals — and outputs a score per lead.
-- models/marts/mrt__sales__lead_scores.sqlSELECT lead_id, (lead__website_visits * {{ var('weight_website_visits', 5) }}) + (lead__pricing_page_views * {{ var('weight_pricing_views', 15) }}) + (lead__email_opens * {{ var('weight_email_opens', 3) }}) + (lead__form_submissions * {{ var('weight_form_submissions', 20) }}) + (lead__content_downloads * {{ var('weight_content_downloads', 10) }}) AS lead__engagement_score,
CASE WHEN lead__job_seniority = 'C-Level' THEN 30 WHEN lead__job_seniority = 'VP' THEN 25 WHEN lead__job_seniority = 'Director' THEN 20 WHEN lead__job_seniority = 'Manager' THEN 10 ELSE 0 END AS lead__demographic_score
FROM {{ ref('int__lead__scoring_features') }}The {{ var() }} calls are the key detail. They pull weights from dbt_project.yml rather than hard-coding numbers in SQL. When marketing wants to increase the weight on demo requests, they ask you to update a YAML file — not debug a SQL model.
Time Decay
Raw behavioral counts are a starting point, but they don’t capture intent velocity. Add exponential decay so recent activity counts more than old activity:
-- Apply before aggregating behavioral signalslead__raw_score * EXP(-0.1 * DATE_DIFF(CURRENT_DATE(), event__occurred_at, DAY))This function halves the score value every ~7 days. A form submission from yesterday counts at roughly 90% of face value. The same submission from three weeks ago counts at ~13%. A lead who was active six months ago stops polluting your hot tier.
The decay constant 0.1 is a starting point. Faster sales cycles (SaaS trials, low-ACV) may warrant faster decay — try 0.15 or 0.2. Longer sales cycles (enterprise, multi-stakeholder deals) may warrant slower decay — try 0.05.
Negative Signals
Every scoring model needs negative signals, and most first drafts forget them.
Without negative scoring, dead leads accumulate in the hot tier over time. Sales sees a “score: 85” lead who last engaged eight months ago, calls them, gets told “we decided to go with a competitor,” and loses trust in the model. Add:
- Unsubscribe:
-20 - Email bounce:
-15 - 30+ days inactivity:
-10 - Competitor domain email:
-50(optional, but powerful if you know your competitor list)
These negatives belong in the same scoring model alongside the positive signals. A lead’s final score is their accumulated positive signals minus whatever negative signals have fired.
Making Weights Maintainable
Hard-coding weights in SQL works for a prototype but breaks down when marketing wants to experiment with different weightings. Three dbt features make this manageable.
dbt vars
In dbt_project.yml:
vars: weight_website_visits: 5 weight_pricing_views: 15 weight_email_opens: 3 weight_form_submissions: 20 weight_content_downloads: 10Every model that references these weights picks up the new values on the next run when you change them here. You can also override at runtime for testing: dbt run --vars 'weight_pricing_views: 25' — no code change required, no commit needed, experiment and revert freely.
Seed files for non-engineers
For teams where marketing needs direct control, a seed file is the better approach. Create seeds/scoring_rules.csv:
signal,weight,categorypricing_page_view,15,behavioraldemo_request,25,behavioralwhitepaper_download,10,behavioralvp_or_above,25,demographictarget_industry,20,firmographicunsubscribe,-20,behavioralAfter editing, dbt seed loads the updated rules. Your scoring model joins against the seed to pick up weights dynamically:
-- In the scoring modelSELECT f.lead_id, SUM( CASE WHEN f.signal_name = r.signal THEN f.signal_value * r.weight ELSE 0 END ) AS lead__total_scoreFROM {{ ref('int__lead__signals_long') }} fJOIN {{ ref('scoring_rules') }} r ON f.signal_name = r.signalGROUP BY f.lead_idThe change goes through Git (since seeds are tracked), so you have a full history of every weighting decision and can revert if a new weighting sends the wrong leads to sales.
Jinja macros for repeated logic
If you apply time decay in multiple models, a macro keeps you DRY. Rather than pasting the EXP(-0.1 * ...) formula everywhere, write it once:
-- macros/apply_time_decay.sql{% macro apply_time_decay(score_col, date_col, decay_rate=0.1) %} {{ score_col }} * EXP(-{{ decay_rate }} * DATE_DIFF(CURRENT_DATE(), {{ date_col }}, DAY)){% endmacro %}Then call it in any model: {{ apply_time_decay('lead__raw_score', 'event__occurred_at') }}. For more on when macros earn their complexity, see dbt Macros.
Setting Score Thresholds
Once you have a total score, set tiers. A simple three-tier system:
| Tier | Score range | Action |
|---|---|---|
| Hot | 80+ | Auto-assign to senior rep, immediate outreach |
| Warm | 40–79 | Nurture track with sales monitoring |
| Cold | < 40 | Stay in marketing automation |
These thresholds will need calibration. Start with round numbers, run the model for a few weeks, and ask sales for feedback. If the hot tier has 30% of leads, the threshold is too low. If it has 0.5% of leads and sales is starving for outreach candidates, the threshold is too high.
The thresholds should live in dbt_project.yml as vars, not hard-coded in the model, so you can adjust them without changing SQL.
Accuracy Expectations
Well-calibrated rule-based scoring typically achieves 60–70% accuracy on conversion prediction. That’s well above random (which sits at your base conversion rate, often 2–5%), but leaves room for improvement.
If you have 1,000+ historical conversions, BigQuery ML can potentially push accuracy to 80–90% by finding patterns in the data that your rules missed. The practical path: build the rule-based model first (a day or two of work), track conversions for a few months, then train and compare.
The hybrid approach often works best in practice: ML generates the primary score (conversion probability), and rules handle the edge cases where you know better than the model. Always mark a lead as hot if they request a demo, regardless of what the model says. The model handles nuance; the rules handle certainty.
See Reverse ETL Patterns for CRM Activation for how to get these scores back into Salesforce or HubSpot where reps actually work.