Alerting on Data Quality Issues with Elementary

A dashboard showing yesterday’s failures is helpful. An alert that tells you about a failure before your stakeholders notice is better. Proactive data teams catch problems early because their alerting does the watching for them.

Elementary can send alerts to Slack, Microsoft Teams, PagerDuty, and other incident management tools. This article covers how to configure each channel, route alerts to the right people, and keep the signal-to-noise ratio manageable.

Prerequisites

This guide assumes you have Elementary installed and running. Your elementary_test_results table should contain test execution data from your data quality tests. If you’re starting fresh, check the Elementary setup guide first.

Quick verification:

# Confirm you have test results to alert on
edr report --select last_invocation

If you see test results in the generated report, you’re ready to configure alerting.

Basic Alerting with `edr monitor`

The edr monitor command runs Elementary tests and sends alerts for failures. The basic invocation:

edr monitor --slack-token $SLACK_TOKEN --slack-channel-name data-alerts

This sends alerts to a Slack channel for any test that failed or warned in the most recent run.

edr report generates an HTML file for human review, while edr monitor is designed for automation. Run monitor in your CI/CD pipeline or on a schedule after each dbt build.

Configuring Alert Metadata

Control what appears in alerts through test metadata in your YAML files:

models:
  - name: mrt__finance__revenue
    meta:
      owner: "@jessica.jones"
      subscribers: ["@jessica.jones", "@joe.joseph"]
      description: "Daily revenue aggregation for finance reporting"
      tags: ["critical", "finance"]
      channel: finance-data-alerts
      alert_suppression_interval: 24

The owner and subscribers fields control who gets mentioned. The channel field routes alerts to specific Slack channels. The alert_suppression_interval prevents repeated alerts for the same failing test within the specified hours.

Slack Integration

Slack is the most common alerting destination. Elementary supports two integration methods.

Token-Based (Recommended)

Token-based integration gives you full control: custom channels per model, user tagging, and file uploads for detailed failure information.

Create a Slack app and add these bot token scopes:

channels:join
channels:read
chat:write
files:write
users:read
users:read.email
groups:read

Install the app to your workspace and copy the bot token.

Configure in your Elementary profile or pass directly to the CLI:

# In your Elementary config
slack:
  token: xoxb-your-slack-token
  channel_name: data-alerts
  group_alerts_by: "table"

Or via command line:

edr monitor \
  --slack-token $SLACK_TOKEN \
  --slack-channel-name data-alerts \
  --group-by table

Webhook-Based

Webhooks are simpler to set up but limited to a single channel with no user tagging. Create an incoming webhook in Slack, then:

edr monitor --slack-webhook $SLACK_WEBHOOK_URL

Use this for quick setups or when you don’t need per-model routing.

Channel Routing

Route different alerts to different channels based on model location or metadata.

Per-model routing in the model’s YAML:

models:
  - name: mrt__marketing__campaigns
    config:
      meta:
        channel: marketing-data-alerts

Path-based routing in dbt_project.yml:

models:
  your_project:
    marts:
      marketing:
        +meta:
          channel: marketing-data-alerts
      finance:
        +meta:
          channel: finance-data-alerts

Every model under marts/marketing/ now routes to the marketing channel, and marts/finance/ to the finance channel.

Microsoft Teams Integration

Teams integration uses webhooks with Adaptive Cards for formatting.

Note: Microsoft deprecated Incoming Webhooks in late 2025. Migrate to Power Automate Workflows if you haven’t already.

Current webhook setup:

teams:
  notification_webhook: https://your-org.webhook.office.com/webhookb2/...
  group_alerts_by: "table"

Or via CLI:

edr monitor --teams-webhook $TEAMS_WEBHOOK_URL

Limitations to know:

User mentions are not fully supported
Rich formatting options are more limited than Slack
Webhook deprecation requires migration planning

For organizations standardizing on Teams, consider Power Automate Workflows that trigger on webhook events and provide more control over message formatting.

PagerDuty and Incident Management

Elementary Cloud extends alerting beyond Slack and Teams to incident management platforms: PagerDuty, Opsgenie, Jira, Linear, ServiceNow, and email.

Setting Up PagerDuty (Elementary Cloud)

Navigate to the Environments page in Elementary Cloud
Click “Connect incident management tool”
Select PagerDuty
Authorize Elementary (requires “User” role in PagerDuty)
Configure alert rules

Alert rules map Elementary test failures to PagerDuty incidents based on:

Status: fail vs warn
Tags: critical, high, medium
Resource types: model, source, test

Example rule: “If status = fail AND tag = critical, create P1 incident in PagerDuty.”

Other Integrations

Elementary Cloud also supports:

Opsgenie: Similar to PagerDuty setup, good for teams already using Atlassian
Jira: Create tickets for failures that need tracking
Linear: Integrates with engineering workflows
ServiceNow: Enterprise ITSM integration
Email: Simple notifications without chat platform dependencies
Webhooks (beta): Custom integrations with any system

For OSS users who need PagerDuty, you can bridge the gap by posting Elementary alerts to a Slack channel that triggers PagerDuty via Slack’s PagerDuty integration.

Alert Routing Strategies

Beyond basic channel routing, you can run multiple edr monitor commands with different filters to create sophisticated routing.

Filtering by Tag

# Critical alerts to the urgent channel
edr monitor --filters tags:critical --slack-channel-name critical-alerts

# Finance team alerts
edr monitor --filters tags:finance --slack-channel-name finance-data

Filtering by Owner

edr monitor --filters owners:@finance-team --slack-channel-name finance-data

Filtering by Status

# Only failures, no warnings
edr monitor --filters statuses:fail --slack-channel-name failures-only

# Only warnings for review
edr monitor --filters statuses:warn --slack-channel-name warnings-review

Combining Filters

Multiple filters work as AND conditions:

edr monitor \
  --filters resource_types:model \
  --filters tags:finance,marketing \
  --slack-channel-name business-critical

This alerts on models tagged with either finance OR marketing.

Automation Pattern

Run multiple monitor commands in your CI/CD pipeline:

# In GitHub Actions
- name: Alert on critical failures
  run: edr monitor --filters tags:critical --slack-channel-name critical-alerts

- name: Alert finance team
  run: edr monitor --filters tags:finance --slack-channel-name finance-data

- name: Alert marketing team
  run: edr monitor --filters tags:marketing --slack-channel-name marketing-data

Reducing Alert Fatigue

Nothing kills an alerting system faster than noise. When every alert feels like a false positive, teams stop paying attention.

Suppression Intervals

Prevent repeated alerts for the same failing test:

meta:
  alert_suppression_interval: 24  # Hours

If a test fails at 9am and stays failing, you won’t get another alert until 9am the next day. This is critical for tests that can’t be immediately fixed.

Configure at the test, model, or project level:

# Project-wide default in dbt_project.yml
models:
  your_project:
    +meta:
      alert_suppression_interval: 12

Alert Grouping

Consolidate multiple failures into single messages:

edr monitor --group-by table

Instead of 10 separate alerts for 10 failed tests on the same table, you get one alert listing all failures. This dramatically reduces noise during cascading failures.

Set a threshold for when grouping kicks in:

edr monitor --group-alerts-threshold 5

Below 5 failures, send individual alerts. Above 5, consolidate.

Customizing Alert Content

Control what fields appear in alerts:

meta:
  alert_fields: ["description", "owners", "tags", "subscribers"]

Remove fields that add noise without value.

Handling Sensitive Data

Disable sample data in alerts when tables contain PII:

edr monitor --disable-samples

Or configure per-model:

models:
  - name: mrt__customers__personal_info
    meta:
      disable_samples: true

Elementary Cloud’s Incident Management

Elementary Cloud adds automatic incident grouping: when new failures relate to open incidents, they’re grouped together rather than creating separate tickets. Successful runs automatically resolve incidents, reducing manual cleanup.

On-Call Strategies for Data Teams

Data team on-call differs from traditional software engineering on-call. Data teams often handle support, triage, and development simultaneously. The same person investigating a data quality issue might also be building new pipelines.

Triage Process

Establish a clear categorization:

Severity	Criteria	Response	Channel
Critical	SLA breach, production outage, revenue impact	Immediate page	PagerDuty
Warning	Quality degradation, potential issues	Next business day	Slack
Info	Logged for review, no action needed	Weekly review	None

Runbooks in Test Metadata

Link troubleshooting documentation directly in your test definitions:

data_tests:
  - unique:
      column_name: customer_id
      config:
        meta:
          description: |
            Duplicate customer IDs detected.
            Runbook: https://docs.company.com/data/customer-dedup
            Contact: @data-platform-team

When this test fails, the alert includes the runbook link. New team members can resolve issues without asking where to find documentation.

Metrics to Track

Measure your alerting system’s health:

Metric	What it tells you
Alert volume	Is the system too noisy?
False positive rate	Are alerts actionable?
Time to acknowledge (MTTA)	How quickly do people respond?
Time to resolution (MTTR)	How long do issues stay open?

A high false positive rate erodes trust. A high MTTR might indicate missing runbooks or unclear ownership.

Rotation Considerations

Some patterns that work for data teams:

Pair on-call with development sprints: The on-call person handles incidents AND works on improvements that reduce future incidents
Weekly rotation with handoff document: Document open issues, recurring problems, and context for the next person
Tiered response: Junior engineers handle initial triage, escalate complex issues to senior engineers

What’s Next

You now have alerting configured to notify the right people about the right problems at the right time. The final piece is deciding how much to build yourself versus buying a dedicated observability platform. The next article in this series covers the build vs buy decision for data quality tooling.