Four capabilities cover the majority of data quality issues regardless of tooling choice, and all are achievable at zero licensing cost with native dbt features and Elementary OSS. These should be in place before evaluating the tool landscape, comparing ML versus statistical detection, or calculating total cost of ownership.
Capability 1: Primary Key and Foreign Key Tests
unique and not_null on every primary key. relationships on every foreign key. These catch the most common breakages with almost no configuration effort.
models: - name: mrt__finance__payments columns: - name: payment_id data_tests: - unique - not_null - name: customer_id data_tests: - not_null - relationships: to: ref('mrt__core__customers') field: customer_id - name: order_id data_tests: - relationships: to: ref('mrt__sales__orders') field: order_idDuplicate primary keys cause join explosions downstream — a single duplicated customer_id in a dimension table multiplies every fact table join, inflating metrics. Orphaned foreign keys produce NULLs in reports that are difficult to trace. Apply primary key tests without exception; five minutes of YAML configuration per model prevents hours of downstream debugging.
For composite keys, use dbt_utils.unique_combination_of_columns:
models: - name: int__daily__user_sessions data_tests: - dbt_utils.unique_combination_of_columns: combination_of_columns: - user_id - session_dateCapability 2: Source Freshness Monitoring
Built into dbt at no cost, source freshness catches the single most common failure mode in data pipelines: data not arriving.
sources: - name: raw_stripe freshness: warn_after: {count: 12, period: hour} error_after: {count: 24, period: hour} loaded_at_field: _loaded_at tables: - name: payments - name: customers - name: subscriptions
- name: raw_hubspot freshness: warn_after: {count: 6, period: hour} error_after: {count: 12, period: hour} loaded_at_field: _fivetran_synced tables: - name: contacts - name: deals - name: companiesRun dbt source freshness on a schedule (or as part of your dbt build pipeline). When the time between the latest loaded_at_field value and the current time exceeds your threshold, dbt raises a warning or error.
Set thresholds based on your actual update cadence, not arbitrary defaults. If Stripe data syncs hourly, warn_after: 2 hours and error_after: 6 hours gives buffer for transient delays without missing real outages. If HubSpot syncs every 15 minutes, tighter thresholds are appropriate.
The most common mistake is configuring source freshness and not running it. Freshness checks only execute when you explicitly call dbt source freshness — they don’t run automatically as part of dbt run or dbt test. Include them in your orchestration.
Capability 3: At Least One Anomaly Detection Type
Volume anomalies — unexpected row counts — flag upstream failures with minimal configuration and provide the highest signal-to-noise ratio of any anomaly detection type. A table that normally receives 10,000 rows per day and suddenly receives 200 almost always indicates a real problem.
models: - name: mrt__core__orders tests: - elementary.volume_anomalies: time_bucket: period: day count: 1 anomaly_sensitivity: 3 training_period: period: day count: 14Why volume anomalies specifically? Because they catch the failure mode that every other test type misses: data that stopped arriving. A not_null test passes when there are zero rows to check. An accepted_values test passes when there are no values to validate. A relationships test passes when there are no foreign keys to look up. Volume anomaly detection is the only mechanism that notices “nothing happened” and flags it as a problem.
For teams on native dbt without Elementary, dbt_expectations.expect_table_row_count_to_be_between provides a simpler version:
models: - name: mrt__core__orders data_tests: - dbt_expectations.expect_table_row_count_to_be_between: min_value: 1000 max_value: 50000The trade-off is that static thresholds require manual updates as data volumes change. Elementary’s Z-score approach adapts automatically. But a static threshold that catches “the table is empty” is infinitely better than no volume check at all.
Capability 4: Alerting in a Channel Your Team Actually Watches
A test that fails silently helps no one. Connect failures to Slack or your existing incident workflow before adding more tests.
Test results that sit in dbt logs until a stakeholder reports a dashboard issue provide no operational value. Detection is only useful when it reaches someone who can act on it.
For Elementary OSS:
edr monitor --slack-token $SLACK_TOKEN --slack-channel-name data-alertsFor dbt Cloud, alerts are built into the platform. For dbt Core with orchestrators like Dagster or Airflow, configure notification on test failure through the orchestrator’s native alerting.
The alerting design matters as much as the alerting existence:
Route alerts to the right channel. A single #data-alerts channel that receives every test failure across every domain becomes noise that everyone ignores. Route finance data alerts to the finance data team, marketing to marketing.
models: your_project: marts: finance: +meta: channel: finance-data-alerts marketing: +meta: channel: marketing-data-alertsSuppress repeat alerts. A persistent failure that fires every hour creates alert fatigue. Use Elementary’s alert_suppression_interval to consolidate:
models: - name: mrt__finance__revenue meta: alert_suppression_interval: 24 # hoursInclude actionable context. An alert that says “test failed on mrt__finance__revenue” requires investigation to understand impact. An alert that includes the model owner, the nature of the failure, and the downstream models affected enables faster response.
Putting It Together
A complete minimum viable stack for a dbt project:
# sources.yml - Capability 2sources: - name: raw_stripe freshness: warn_after: {count: 12, period: hour} error_after: {count: 24, period: hour} loaded_at_field: _loaded_at tables: - name: payments - name: customers
# mrt__finance__payments.yml - Capabilities 1, 3models: - name: mrt__finance__payments meta: owner: "@jessica.jones" channel: finance-data-alerts tests: - elementary.volume_anomalies: time_bucket: period: day count: 1 columns: - name: payment_id data_tests: - unique - not_null - name: customer_id data_tests: - not_null - relationships: to: ref('mrt__core__customers') field: customer_idCombined with edr monitor for Slack alerting (Capability 4), this setup catches:
- Missing data (freshness)
- Structural breakage (primary key violations, orphaned records)
- Silent upstream failures (volume anomalies)
- And surfaces all of it in a channel where someone will act on it
This setup has no licensing cost and requires no vendor evaluation. Elementary OSS plus native dbt features covers all four capabilities.
The scaling thresholds indicate when to invest beyond this baseline. The tool landscape covers what to invest in.