OpenClaw Persistent Memory for dbt Context

Monitoring alerts that include business context — “duplicates mean double-counted revenue across all sales dashboards” rather than “unique test failed with 3 rows” — require the agent to have access to dbt project documentation. OpenClaw’s persistent memory stores that context as Markdown files that survive across sessions. Where a skill file gives an agent a standing operating procedure, persistent memory gives it a knowledge base.

What to Store in Memory

Three categories of information belong in persistent memory for dbt monitoring:

Model and column documentation — the descriptions from your schema.yml files, translated into the agent’s context. Not the raw YAML, but a readable format the agent can search and reference.

Downstream dependency information — a compact summary of the model dependency graph derived from manifest.json. Which models depend on which, specifically which mart models are downstream of each source or base model.

Investigation history — a running log of failure events you’ve triaged, including your conclusions. This is what lets the agent tell you “you investigated this on Tuesday and decided it was a vendor delay” instead of treating every recurring failure as a novel event.

Storing Model Documentation

Extract your schema.yml model and column descriptions and store them as a Markdown file in OpenClaw’s memory. The format matters — you want the agent to be able to retrieve relevant sections quickly, not parse raw YAML:

# dbt project documentation

## mrt__sales__customers
Deduplicated customer dimension for all sales reporting.
Used by: revenue dashboards, cohort analysis, customer segmentation.
Owner: data team
Criticality: high

### Columns
- customer__id: Primary key. Deduplicated customer identifier from Shopify.
  Uniqueness is critical — duplicates mean double-counted revenue.
- customer__email: Contact email. Used for CRM matching.
- customer__first_ordered_at: Timestamp of first purchase. Drives cohort assignment.

## mrt__finance__orders
Order-level mart for all revenue reporting.
Used by: CFO dashboard, monthly revenue reports, Salesforce sync.
Owner: finance team
Criticality: high

### Columns
- order__id: Primary key. Source: Shopify order ID.
- order__customer_id: Foreign key to mrt__sales__customers. Not nullable.
- order__revenue_usd: Gross revenue in USD. Includes tax, excludes refunds.

With this loaded, the agent transforms raw failure output into business-context alerts. When unique_mrt__sales__customers_customer__id fails, the agent looks up mrt__sales__customers, reads that “uniqueness is critical — duplicates mean double-counted revenue,” and includes that in its report. You get the business implication without doing the lookup yourself.

This doesn’t require custom code. Load the file using OpenClaw’s memory command and instruct the skill to consult it when reporting failures:

## Cross-referencing documentation

When a test fails on a model:
1. Look up the model in your memory document "dbt-project-docs.md"
2. Include the model description in the failure report
3. Include the relevant column description
4. Note which downstream reports or systems are affected
5. Include the model owner for escalation routing

The limitation is maintenance. There’s no automated sync between your schema.yml and the memory file. When models change, someone needs to update the memory document. This is the most significant operational overhead of this approach — plan for it as part of your model change process, not as an afterthought.

Downstream Impact from manifest.json

dbt’s manifest.json contains the complete project dependency graph. Every model knows its parents and children. The agent can parse this to answer “which models depend on the model that just failed?” — information that shapes severity and response urgency immediately.

The challenge with large projects is size. A manifest for a project with 300+ models can be several megabytes of JSON, which is impractical to feed into a language model’s context window. The solution is pre-processing: extract the dependency graph into a compact format and store that in persistent memory instead.

A compact dependency summary looks like this:

# dbt dependency summary

## Mart-level models and their key sources

mrt__sales__customers
  Direct parents: int__customers_deduplicated
  Key ancestors: base__shopify__customers, base__shopify__orders
  Downstream of this mart: (none — this is a leaf node)

mrt__finance__orders
  Direct parents: int__orders_enriched
  Key ancestors: base__shopify__orders, base__shopify__refunds, stg__exchange_rates
  Downstream of this mart: (none)

int__orders_enriched
  Direct parents: base__shopify__orders, stg__exchange_rates, int__customers_deduplicated
  Downstream marts: mrt__finance__orders, mrt__marketing__attribution
  Downstream mart count: 2

base__shopify__customers
  Downstream intermediate count: 3
  Downstream mart count: 4
  High-criticality dependents: mrt__sales__customers, mrt__finance__orders

Generating this summary can be scripted — iterate through manifest.json nodes, extract the parent/child relationships, and write the formatted output. Once it’s in memory, the agent can look up any failing model and immediately know its downstream footprint without re-parsing the manifest at monitoring time.

Instruct the skill to use it:

## Downstream impact lookup

When a test fails on a model:
1. Look up the model in "dbt-dependency-summary.md" in memory
2. Find how many downstream mart-layer models depend on it
3. Include in the report: "X downstream marts affected, including [names of critical ones]"
4. If 3 or more mart models are downstream, flag as high downstream impact

Historical Failure Tracking

The most powerful use of persistent memory for monitoring is tracking failure history. A plain cron job that runs dbt test has no memory. It sees each failure as a new event and reports it identically whether it’s happening for the first time or the fourteenth.

Persistent memory changes this. After each monitoring run, the agent updates a structured history file:

# dbt failure history

## unique_mrt__sales__customers_customer__id
Last failed: 2026-03-25
First failed: 2026-03-25
Total occurrences last 30 days: 1
Status: active
Notes: First occurrence. Investigating deduplication logic in base__shopify__customers.

## not_null_int__orders_enriched_order__customer_id
Last failed: 2026-03-27
First failed: 2026-03-14
Total occurrences last 30 days: 6
Status: recurring — intermittent
Notes: 2026-03-18 investigated. Likely related to Shopify source freshness delays.
  Failures correlate with late source loads. Watch but don't escalate immediately.

## source_freshness_raw_ga4_events
Last failed: 2026-03-22
First failed: 2026-02-15
Total occurrences last 30 days: 4
Status: recurring — known
Notes: Expected on weekends (vendor batch timing). No action needed for Sat/Sun failures.
  Escalate only if fails on a weekday.

The skill instructs the agent to consult this file when reporting and update it after each run:

## Historical context

Before reporting a failure:
1. Check "dbt-failure-history.md" in memory for prior occurrences of this test
2. If found: include occurrence count and any notes from previous investigations
3. If not found: mark as "First occurrence" — highest urgency

After reporting all failures:
1. Update "dbt-failure-history.md" for each failure encountered today
2. Increment occurrence count
3. Update "Last failed" date
4. Preserve existing notes; add new notes if investigation reveals new information

Each run adds to the institutional knowledge the agent carries forward. Over time, the morning summary distinguishes new failures from recurring patterns that have already been categorized.

The Maintenance Reality

Be realistic about what this setup requires to stay useful. Memory documents that go stale are worse than no memory at all — an agent that confidently reports wrong business context causes more confusion than one that reports none.

The minimum maintenance commitment:

Model documentation: update when models are added, renamed, or significantly changed
Dependency summary: regenerate when the project DAG changes significantly (monthly is usually sufficient for stable projects)
Failure history: the agent maintains this itself after each run, but review it monthly to archive closed issues and keep the file scannable

The documentation and dependency summary are the higher-maintenance items. If your dbt project changes frequently, automate the regeneration rather than relying on manual updates. A simple script that reads schema.yml files and manifest.json and regenerates the memory documents can run as part of your CI/CD pipeline on every PR merge.

For the full picture of how persistent memory feeds into the morning summary pattern, see dbt Quality Morning Summary Pattern.