ServicesAboutNotesContact Get in touch →
EN FR
Topic guide

Lead Scoring in the Warehouse

Hub note for warehouse-native lead scoring — from rule-based weighted models in dbt to BigQuery ML classification, feature engineering, and reverse ETL back to the CRM.

Planted
dbtbigqueryanalyticsdata modelingai

Lead scoring assigns a numeric value to each lead based on identity and engagement signals. Building scoring in the warehouse with dbt provides access to all data sources, version-controlled rules, and testable logic — capabilities CRM-native scoring tools lack.

This hub covers signal collection, rule-based scoring, ML-based scoring, and reverse ETL activation.

Source Article

Lead Scoring Models in dbt and BigQuery — the full guide. Part 5 of the CRM data engineering series.

Related series articles: CRM Architecture, Salesforce Modeling, HubSpot Pipelines, Customer 360 Models.

Concepts

1. Signal Dimensions

Lead Scoring Signal Dimensions — The four categories that drive every lead scoring model: demographic fit (who the person is), firmographic fit (what company they’re from), behavioral engagement (what they’ve done), and recency (when they did it). Covers why the warehouse sees signals the CRM misses.

2. Rule-Based Scoring

Rule-Based Lead Scoring in dbt — Building a configurable weighted scoring model with dbt vars, seed files, and Jinja macros. Includes time decay, negative signals, score thresholds, and the maintainability patterns that let marketing adjust weights without touching SQL. Achieves 60–70% accuracy on conversion prediction.

3. ML-Based Scoring

BigQuery ML for Lead Scoring — Training a logistic regression (or boosted tree) model in BigQuery SQL to predict lead conversion. Covers the TRANSFORM clause (the most important detail most tutorials skip), class imbalance handling with auto_class_weights, and how to evaluate and explain the model to sales. Requires 1,000+ historical conversions; achieves 80–90% accuracy when data is sufficient.

Feature Engineering for ML in dbt — How to build the intermediate feature tables that feed the BigQuery ML model. Time-windowed aggregations (7-day, 30-day, 90-day), domain-separated feature sets, and joining them into a labeled training dataset. Includes the training vs scoring dataset distinction.

4. Activation

Reverse ETL Patterns for CRM Activation — Getting scores out of BigQuery and into Salesforce or HubSpot. Tool landscape (Hightouch, Census/Fivetran Activations, Polytomic), sync frequency recommendations (1–4 hours for sales ops, 15–30 minutes for automation triggers), and the downstream CRM automations that make scoring actionable.