ServicesAboutNotesContact Get in touch →
EN FR
Note

Cascading Agent Pattern

The architecture where an always-on monitoring agent detects issues and triggers a coding agent to investigate and fix them — how OpenClaw and Claude Code hand off work

Planted
claude codedbtaiautomation

The cascading agent pattern connects two types of AI agents in a detect-then-fix workflow. An always-on monitoring agent (like OpenClaw) detects an issue through a scheduled check or webhook. If the issue requires code changes, it triggers a coding agent (like Claude Code) to investigate and fix it. The monitoring agent then summarizes the results and delivers them to the team.

This pattern is the primary mechanism through which layers in a layered AI stack talk to each other.

The Flow

1. OpenClaw cron job detects issue
(e.g., failing dbt test, cost spike, freshness alert)
2. OpenClaw triages
Simple issue? → Handle directly (alert, restart, log)
Needs code? → Trigger Claude Code session
3. Claude Code investigates
Reads project context (CLAUDE.md, model SQL, lineage)
Diagnoses the root cause
Proposes and implements a fix
Runs dbt build to verify
4. Claude Code reports results back
5. OpenClaw summarizes and posts to Slack
"Test X was failing because of Y. Fix applied in branch Z.
dbt build passes. Ready for review."

The key design principle is clear boundaries. The monitoring agent handles detection and delivery. The coding agent handles investigation and repair. Neither tries to do the other’s job. OpenClaw doesn’t write SQL. Claude Code doesn’t run on a cron schedule.

Why Two Agents Instead of One

A single agent that both monitors and codes would be simpler architecturally. The separation exists for practical reasons:

Different runtime profiles. A monitoring agent needs to be always-on, lightweight, and cheap. It checks things periodically and sleeps between checks. A coding agent needs deep context, heavy compute, and expensive model calls. Running a coding agent 24/7 “just in case” wastes money. Running it on-demand when there’s actually a problem to solve is efficient.

Different security postures. The monitoring agent needs read access to production warehouses. The coding agent needs write access to development schemas and the ability to create branches. Scoping these separately limits the blast radius if either is compromised.

Different failure modes. If the monitoring agent crashes, you miss an alert but no code changes happen. If the coding agent crashes mid-fix, the monitoring agent can report the failure. Separation creates natural fault boundaries.

The Triage Decision

Not every detected issue should cascade to a coding agent. The triage logic in the monitoring agent determines the handoff:

Issue TypeCascade?Why
Freshness check: source table staleNoNothing to fix in code; wait for upstream or alert the team
Cost alert: BigQuery spend spikeNoInvestigate manually; cost spikes often have non-code causes
dbt test failure: not_null violationMaybeCould be a data issue (no code fix) or a missing filter (code fix)
dbt test failure: schema changeYesLikely needs model updates to handle new/removed columns
Build failure: compilation errorYesDefinitely needs code investigation
Anomaly: volume drop detectedNoAlert first; investigate before assuming code is the fix

The default should be conservative. Most issues should alert first and cascade to code only when there’s high confidence that a code change is the right response. An agent that auto-fixes every test failure will create as many problems as it solves — not every failing test means the code is wrong. Sometimes the data is wrong, or the test expectations need updating, or the failure is transient.

What Connects the Agents

Today, the connection between monitoring and coding agents is manual and somewhat brittle. There’s no standard protocol for agent-to-agent communication.

The Markdown file workaround. The most practical approach: the monitoring agent writes its findings to a Markdown file in the project repository. The coding agent’s CLAUDE.md references that file. When Claude Code starts a session, it reads the overnight findings as part of its project context.

# .claude/overnight-findings.md (written by OpenClaw)
## 2026-03-26 07:00 UTC
### Failing tests
- `mrt__sales__orders`: not_null on `customer_id` — 3 NULL rows detected
- `int__ga4__sessions_attributed`: unique on `session_id` — 12 duplicates
### Build status
- All models compiled successfully
- No schema changes detected
# CLAUDE.md (excerpt)
Before starting work, check `.claude/overnight-findings.md` for any
issues detected by the monitoring agent overnight.

This works for the most common scenario — “here are the tests that failed overnight, fix them” — but it’s fragile. The file format isn’t standardized. The coding agent has no way to acknowledge which findings it has addressed. There’s no feedback loop where the coding agent reports its resolution back through the same channel.

What’s missing: a shared context protocol. The ideal solution would be something like MCP but for agent-to-agent communication — a standard protocol where one agent can share structured context with another, including issue descriptions, priority levels, relevant file paths, and resolution status. This doesn’t exist in any standard form today.

The closest emerging concepts are multi-agent frameworks (CrewAI, AutoGen, LangGraph), but these are designed for agents within the same runtime, not for independent agents running on different machines with different security contexts. The monitoring-agent-to-coding-agent handoff is a distributed systems problem, and the tooling hasn’t caught up yet.

The Shared Context Gap

The biggest limitation of the cascading pattern — and of any multi-tool AI stack — is that there’s no shared memory between tools.

  • Claude Code doesn’t know what OpenClaw found overnight
  • OpenClaw doesn’t know what you built in Cursor during the day
  • Cursor doesn’t know which models Claude Code just refactored
  • Each tool starts every session with its own context

What holds things together today is shared project files: CLAUDE.md for conventions, dbt_project.yml for configuration, Git for code state, Slack for communication. Every tool can independently read the same files and follow the same conventions. It’s not a unified memory layer, but it provides enough coordination for practical work.

The gap becomes most painful when context is time-sensitive. Overnight monitoring results are stale by the time you start a Claude Code session in the morning. A model you refactored in Cursor at 3 PM isn’t reflected in OpenClaw’s test run at 7 AM if you didn’t push the changes. Git helps, but only if everyone (including the agents) commits and pulls at the right times.

A Unified Skill System

A related gap: each tool in a layered stack has its own format for reusable workflows.

  • Claude Code uses Skills stored as Markdown files in .claude/commands/
  • OpenClaw uses Skills published on ClawHub
  • Cursor uses rules stored in .cursorrules

There’s no standard. A skill written for Claude Code can’t be used by OpenClaw or Cursor. A ClawHub skill can’t be invoked by Claude Code. If you want the same “analyze failing tests and suggest fixes” workflow across tools, you write it three times in three formats.

A unified skill format — or at least a translation layer between formats — would make the cascading agent pattern more powerful. The monitoring agent could invoke the same skills the coding agent uses, ensuring consistency in how issues are analyzed and resolved. This doesn’t exist today and isn’t likely to emerge soon, given that each tool vendor has incentives to maintain their own ecosystem.

Getting Started with the Pattern

With OpenClaw for monitoring and Claude Code for development:

  1. Configure OpenClaw to write test results to .claude/overnight-findings.md after each scheduled run
  2. Add a line to CLAUDE.md telling Claude Code to read that file at session start
  3. Each morning, start a Claude Code session and check overnight findings

The manual trigger (starting the session and giving the instruction) is intentional. Full automation — OpenClaw triggering Claude Code without human involvement — should follow only after validating that triage logic is reliable and the coding agent produces appropriate fixes. The trust gradient applies: start with a human in the loop and escalate after building confidence in the system.