OpenClaw Skills for Monitoring

OpenClaw skills are Markdown files that give the agent persistent instructions for a specific task. Where a cron message (--message "...") gives the agent a one-line prompt, a skill file gives it a full operating manual: what to check, how to categorize what it finds, how to format the output, and what project-specific context it needs to do the job well.

For monitoring tasks, skills determine whether the agent runs dbt test and reports “some tests failed,” or whether it reports which models are affected, the type of failure, and a suggested action.

Anatomy of a Monitoring Skill

Skills live in ~/.openclaw/skills/[skill-name]/SKILL.md. Create the directory structure, write the Markdown file, then install:

mkdir -p ~/.openclaw/skills/dbt-monitor
# write SKILL.md (see below)
openclaw skill install ~/.openclaw/skills/dbt-monitor

The SKILL.md file is a plain Markdown document. The agent reads it as context before acting on any request that invokes the skill. Think of it as a briefing document: here’s what you are, here’s what you’re looking for, here’s how you should report it.

A Complete dbt Monitoring Skill

# dbt test monitor

You are a dbt test monitoring assistant. When asked to check dbt tests:

1. Navigate to the dbt project directory
2. Run `dbt test --target prod` and capture the output
3. Parse the results into three categories:
   - FAIL: tests that returned failures
   - ERROR: tests that couldn't execute (compilation errors, connection issues)
   - WARN: tests configured with warning severity
4. For each failure, identify:
   - The test name and type (unique, not_null, accepted_values, relationships, custom)
   - The model being tested
   - The number of failing rows
   - A suggested action based on the failure type
5. Format the output as a clear summary with failures first, errors second, warnings last
6. If all tests pass, say so briefly

Project directory: /path/to/your/dbt/project

The numbered list structure works well because it gives the agent an explicit sequence to follow. The agent isn’t guessing at what “check dbt tests” means — it has a protocol.

Distinguishing Failure Categories

The most important thing a monitoring skill teaches the agent is the difference between types of failures. dbt test output conflates distinct problems under similar-looking messages. Your skill needs to teach the agent to separate them.

FAIL — data has a problem. A test ran and found rows that violated the test condition. not_null_mrt__sales__orders_order__customer_id | FAIL 47 means 47 rows in your orders mart have a null customer ID. The pipeline ran successfully; the data is wrong.

ERROR — the pipeline has a problem. The test couldn’t execute. Compilation errors, missing models, connection failures, or undefined sources all produce ERRORs. The data may be fine; something in the infrastructure or model definition broke.

WARN — worth watching, not blocking. Tests configured with severity: warn report problems but don’t halt the run. These are conditions you’ve decided are tolerable in production but worth knowing about.

Without this distinction, every alert feels equally urgent. With it, you can triage immediately: ERRORs mean something is broken in the pipeline itself and needs investigation now; FAILs might mean upstream data arrived late; WARNs can wait until morning.

Add this explicitly to your skill:

## Failure Type Interpretation

When you see a FAIL:
- This means the data violated the test condition
- Report: test name, model, failing row count
- Suggested action: "Check upstream source freshness" for not_null on mart tables; "Review deduplication logic" for unique test failures

When you see an ERROR:
- This means the test couldn't execute
- Report: error message verbatim, model that couldn't compile
- Suggested action: "Model may not exist or has a compilation error — check the dbt project"

When you see a WARN:
- This means a soft threshold was exceeded
- Report: which test, which model, the count
- Suggested action: "Worth investigating but not blocking"

Slack Formatting

The agent sends its output as text. Slack supports a subset of Markdown, so you can guide the agent to produce readable summaries. Crucially: if you don’t specify a format, the agent will invent one — and it will be different each time, depending on how the output looked that day.

Add a formatting section to your skill:

## Formatting

Format the summary for Slack using this structure:

**dbt test results - [date]**

🔴 *X failures* | 🟡 *Y warnings* | 🟢 *Z passed*

**Failures:**
- `not_null_mrt__sales__orders_order__customer_id` on `mrt__sales__orders` (47 rows)
  → Likely cause: upstream source load incomplete. Check source freshness.
- `unique_mrt__sales__customers_customer__id` on `mrt__sales__customers` (3 rows)
  → Likely cause: duplicate records in source. Review deduplication logic.

**Warnings:**
- `accepted_values_int__orders_enriched_order__status` on `int__orders_enriched` (2 rows outside expected values)

All other tests passed (142/147).

The backticks around model and test names create monospace formatting in Slack, making them visually distinct from the prose. The → arrow for suggested actions creates a consistent visual pattern that’s easy to scan. The total passed count at the end gives context for how many tests are green.

The template format is intentional: you’re giving the agent a sample to pattern-match against, not a description of rules to follow. Showing it an example output is more reliable than describing the output in abstract terms.

Adding Downstream Impact Context

A bare failure message creates anxiety. A message that includes downstream impact lets you triage from your phone without opening a laptop.

Add to your skill:

## Downstream Impact

For each failed model, also run:
`dbt ls --select +[model_name]`

This lists downstream models that depend on the failing model. Include this in the report:
- If 0 downstream models: "No downstream impact"
- If 1-3 downstream models: list them
- If 4+ downstream models: "Affects [count] downstream models including [most important one]"

This tells you whether a failure in a base model means five dashboards are showing stale data or just one unused intermediate model broke. The difference shapes how urgently you respond.

The + selector in dbt ls means “everything downstream of this model.” Running it per failed model adds time to the monitoring job, so apply judgment: you might only want downstream impact for mart-layer failures, not for every intermediate model warning.

Source Freshness as a Pre-Test Check

Some test failures aren’t data quality problems — they’re timing problems. A not_null failure on a mart model at 7 AM might mean the nightly source load hadn’t finished yet when dbt ran. Adding a source freshness check before the test run gives the agent context to interpret what it finds.

## Pre-Test Freshness Check

Before running `dbt test`, run `dbt source freshness` and capture the output.

If any sources are stale (beyond their configured warn or error thresholds), include this at the top of the report:
"⚠️ Source freshness warning: [source_name] last loaded [time_ago]. Test failures may be caused by incomplete source data."

This helps distinguish "data is wrong" from "data hasn't arrived yet."

This is especially valuable for morning monitoring jobs. If your Fivetran sync runs at 5 AM and sometimes takes until 7 AM on large syncs, the agent’s freshness check output explains a pattern that would otherwise look like recurring data quality failures.

Calibrating the Skill Over Time

The first version of a monitoring skill will be imperfect. The agent’s categorizations will occasionally be wrong. The formatting won’t quite match what you wanted. Some suggested actions will be too generic.

Treat the skill file as a living document. When the agent misinterprets something, add a clarification. When the output format isn’t useful, update the template. When a new type of failure starts appearing that the skill doesn’t handle well, add a section for it.

This is the same iteration loop you’d apply to any prompt. The difference is that skill files are persistent — you edit them once and every future run benefits from the improvement. A cron message prompt is static; a skill file evolves with your operational knowledge.

The practical workflow: when a monitoring alert fires and you triage it, note whether the agent’s description was accurate and the suggested action was right. If either was wrong, update the skill immediately while the context is fresh. Over a month of weekly failures, the skill accumulates real operational knowledge about your specific pipeline’s failure modes.

BigQuery and Snowflake Monitoring Skills

The same skill pattern applies to warehouse monitoring tasks. For BigQuery job failure monitoring, the skill instructs the agent to run a specific SQL query against INFORMATION_SCHEMA.JOBS and interpret the results — see BigQuery Job Failure Monitoring for the queries.

For Snowflake cost monitoring, the skill teaches the agent the credit-to-dollar conversion and how to frame results for non-technical stakeholders — see Snowflake Cost Monitoring with Warehouse History for the Snowflake-specific patterns.

The core structure is the same regardless of the target system: define what to check, how to categorize what it finds, and how to format the output. The specifics change based on what the underlying system returns.