ServicesAboutNotesContact Get in touch →
EN FR
Note

Elementary for dbt

How Elementary extends dbt with data observability — anomaly detection, automated freshness monitoring, test result history, and Slack alerting

Planted
dbtelementarydata qualitytesting

Elementary is a dbt-native data observability tool. It extends dbt with anomaly detection, historical test tracking, freshness monitoring, and alerting — capabilities that sit in a gap between dbt’s built-in generic tests and full-blown commercial observability platforms like Monte Carlo or Bigeye. Everything Elementary produces lives in your warehouse as queryable tables, which means you own the data and can build on top of it with any BI tool.

Architecture: Two Components

Elementary has two parts that serve distinct purposes.

The dbt package installs via packages.yml like any other dbt dependency. It creates metadata tables in a dedicated schema and uses on-run-end hooks to capture artifacts after every dbt run or dbt test. Model execution times, test results, schema snapshots, and run metadata all flow into tables you control.

The CLI (edr) is a standalone Python tool that reads from those warehouse tables. It generates HTML observability reports, sends alerts to Slack or Teams, and executes anomaly detection logic. The CLI connects to your warehouse through its own profile, separate from your dbt profile.

The data flow is linear:

dbt run/test --> on-run-end hooks --> INSERT into Elementary tables --> edr reads tables --> reports/alerts

This separation matters. The dbt package has zero runtime cost beyond the hook inserts. The CLI runs independently, on whatever schedule you choose, and can be pointed at any environment where Elementary tables exist.

Installation

Add the package and configure its schema:

packages.yml
packages:
- package: elementary-data/elementary
version: 0.21.0
# dbt_project.yml
models:
elementary:
+schema: "elementary"

For dbt 1.8+, two flags are required because of changes to how package materializations work:

flags:
require_explicit_package_overrides_for_builtin_materializations: False
source_freshness_run_project_hooks: True

You also need a materialization override macro. Without it, tests run but Elementary’s result tables stay empty — the most common silent failure during setup:

-- macros/elementary_materialization.sql
{% materialization test, default %}
{{ return(elementary.materialization_test_default()) }}
{% endmaterialization %}

Run dbt deps, then dbt run --select elementary to create the metadata tables, then dbt test to populate them with initial results.

Anomaly Detection Tests

Where dbt’s built-in tests validate static rules you define, Elementary’s tests learn patterns from historical data and flag deviations. They use Z-score statistics: a sensitivity of 3 means alerting when a metric falls more than 3 standard deviations from its historical mean.

Volume Anomalies

Detects unusual row counts. If a source table normally receives 10,000 rows per day and suddenly gets 2,000, a not_null test passes but volume_anomalies catches it.

tests:
- elementary.volume_anomalies:
where: "event_date = current_date()"
time_bucket:
period: day
count: 1

Freshness Anomalies

Monitors time between table updates. Unlike dbt’s source freshness checks, which use a fixed threshold you must define, freshness_anomalies learns the normal update cadence and flags deviations. event_freshness_anomalies tracks the lag between when an event occurred and when it was loaded.

Column Anomalies

Tracks column-level metrics — null percentage, average, min, max, distinct count, zero count — and alerts when any metric deviates from its historical baseline. Useful for catching distribution shifts that pass every row-level constraint.

tests:
- elementary.column_anomalies:
column_name: order__amount
column_anomalies:
- average
- max
anomaly_sensitivity: 3
training_period:
period: day
count: 14

The training_period controls how much history Elementary uses to establish baselines. Fourteen days is a reasonable default; increase it if your data has weekly cycles, decrease it if patterns shift frequently.

Schema Changes

Detects added or deleted columns, type changes, and deleted tables. The schema_changes_from_baseline variant validates against an explicitly defined schema, which is useful for models with downstream consumers who depend on a stable contract.

Alerting

The edr monitor command sends notifications for test failures. Slack is the most common destination:

Terminal window
edr monitor --slack-token $SLACK_TOKEN --slack-channel-name data-alerts

Alerts become more useful with metadata in your model YAML:

models:
- name: mrt__finance__revenue
meta:
owner: "@jessica.jones"
channel: finance-data-alerts
alert_suppression_interval: 24

The channel field routes alerts to team-specific Slack channels. alert_suppression_interval (in hours) prevents repeated alerts for the same persistent failure — critical for avoiding alert fatigue.

For path-based routing across an entire directory of models:

dbt_project.yml
models:
your_project:
marts:
finance:
+meta:
channel: finance-data-alerts

Alert grouping consolidates cascading failures into a single message instead of flooding a channel:

Terminal window
edr monitor --group-by table --group-alerts-threshold 5

Reports and Dashboards

edr report generates a self-contained HTML file showing test pass/fail history, model runtime trends, anomaly detection charts, and data lineage. For team access, host it on S3, GCS, or Azure Blob Storage using edr send-report.

Since Elementary stores everything in warehouse tables (elementary_test_results, dbt_run_results, dbt_models), you can also build custom dashboards in your existing BI tool. A useful starting query for tracking data quality over time:

SELECT
DATE(detected_at) AS date,
COUNT(CASE WHEN status = 'pass' THEN 1 END) AS passed,
COUNT(CASE WHEN status = 'fail' THEN 1 END) AS failed,
ROUND(
COUNT(CASE WHEN status = 'pass' THEN 1 END) * 100.0 / COUNT(*), 2
) AS pass_rate
FROM elementary_test_results
WHERE detected_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY 1
ORDER BY 1;

BigQuery-Specific Notes

The CLI profile requires an explicit location parameter (US, EU, or your specific region). dbt infers this, but Elementary does not — the most common source of connection errors on BigQuery.

~/.edr/profiles.yml
elementary:
outputs:
default:
type: bigquery
method: oauth
project: your-project-id
dataset: your_schema_elementary
location: US
threads: 4

The service account running edr needs BigQuery Data Viewer on the Elementary dataset, Metadata Viewer and Resource Viewer on your dbt datasets, and Job User on the project. These are read-oriented roles; the CLI does not write to your production models.

Where Elementary Fits in a Testing Strategy

Elementary is best understood as a complement to explicit rule-based tests, not a replacement. The dbt Testing Taxonomy lays out five testing mechanisms in dbt; Elementary occupies the “unknown unknowns” space — anomalies you would not think to write explicit tests for.

A practical layering:

LayerToolWhat it catches
Primary keys, nulls, referential integritydbt generic testsKnown structural violations
Value ranges, patterns, business rulesdbt-expectationsKnown domain violations
Transformation logic correctnessdbt unit testsLogic bugs in SQL
Volume drops, freshness drift, distribution shiftsElementaryUnknown anomalies
Schema stability for consumersdbt model contractsBreaking schema changes

For incremental models, Elementary’s volume and freshness anomaly tests are particularly valuable. Incremental runs process only new data, so a silent upstream failure that stops sending rows will not cause an error — the incremental model simply processes zero new rows and succeeds. Volume anomaly detection catches this.

OSS vs. Cloud

Elementary OSS provides anomaly detection, HTML reports, and Slack/Teams alerting at no licensing cost. The trade-off is maintenance: expect 2-5 days for initial setup, ongoing configuration tuning, and self-managed report hosting.

Elementary Cloud adds automated ML monitors (more sophisticated than the OSS Z-score approach), column-level lineage extending to BI tools, a data catalog, and incident management integrations with PagerDuty, Jira, and ServiceNow. For teams under 4 engineers, the OSS version is typically sufficient. Beyond that, the operational overhead of self-hosting starts to compete with the cost of a managed solution.