Test-driven development works well with AI agents because TDD gives the agent a clear, unambiguous target. Instead of “write a good model” (subjective, hard to verify), the target becomes “make these tests pass” (objective, instantly verifiable). Claude Code can iterate in a loop — run tests, see failures, fix code, run tests again — until everything is green.
This pattern takes five steps.
The Five-Step Workflow
Step 1: Write Tests Before the Model Exists
Start by telling Claude what the model should guarantee, without creating any implementation:
Write dbt tests for a new mrt__marketing__customer_lifetime_value model. Include:- unique + not_null on customer_id- accepted_values for customer_segment- relationships test to base__stripe__customersDon't create the model yet.Claude creates a schema.yml entry with the test definitions. At this point, the model SQL file doesn’t exist. That’s intentional — you’re defining the contract before the implementation.
The tests you specify should include the standard test types: unique and not_null on primary keys, relationships on foreign keys, accepted_values on categorical columns, and any business rules you can express as generic tests.
Step 2: Confirm the Tests Fail
Run dbt test --select mrt__marketing__customer_lifetime_value and confirm all tests fail.This step matters for two reasons. First, it verifies the tests are syntactically valid and correctly wired to the (not-yet-existing) model. Second, it establishes the red-green cycle — Claude sees the failures, which become its target for the next step.
Step 3: Implement to Pass
Implement mrt__marketing__customer_lifetime_value to make all tests pass.Don't modify the tests.The “don’t modify the tests” instruction is critical. Without it, Claude may take the path of least resistance: changing a test to match a simpler implementation rather than writing the correct implementation to match the test. The tests are the contract. The implementation adapts to the contract, not the other way around.
Claude will create the model SQL, run the tests, and iterate if something fails. This is where the AI agent pattern shines — Claude handles the entire build-test-fix loop within a single conversation turn.
Step 4: Iterate on Failures
The relationship test is failing. Fix the implementation without changing the test.If Claude’s implementation doesn’t pass all tests on the first try, you don’t need to diagnose the issue yourself. Point Claude at the failing test and let it investigate. It will read the compiled SQL, check the test logic, examine the upstream data, and fix the implementation.
Step 5: Refactor with Confidence
Now that tests pass, refactor the CTEs for better readability while keeping all tests green.Once all tests pass, you have a safety net. Refactoring becomes low-risk because any change that breaks behavior will fail a test. Claude can restructure CTEs, rename intermediate columns, extract common logic, and improve readability — then immediately verify nothing broke.
Why This Works with AI Agents
Traditional TDD has friction: writing tests first feels slow, running tests manually between iterations is tedious, and skipping the “red” step is tempting.
With Claude Code, Claude writes the test YAML, runs tests, sees results, and iterates without manual intervention. The entire red-green-refactor cycle can happen in a single conversation turn. Claude also follows the “don’t modify the tests” instruction literally — human developers under time pressure sometimes modify failing tests instead of fixing failing implementations, but Claude does not.
Automating Test Generation
You can create a slash command that analyzes a model and generates appropriate tests. Save this as .claude/commands/generate-tests.md:
---description: Generate comprehensive dbt tests for a modelallowed-tools: Read, Write, Bash(dbt:*)argument-hint: [model_name]---Analyze $ARGUMENTS and generate appropriate tests:
1. Read the model SQL and existing schema.yml2. Identify primary key columns → add unique + not_null3. Find foreign key relationships → add relationships tests4. Detect enum-like columns → add accepted_values5. Find numeric ranges → add appropriate range tests
Update the schema.yml with new tests.Type /generate-tests mrt__sales__orders and Claude analyzes the model, identifies what should be tested, and updates your schema.yml. It’s not a substitute for thinking about what your model guarantees, but it catches the obvious stuff — the primary key tests, the referential integrity checks, the enum validations — and gives you a starting point to build on.
Enforcing the Discipline
Claude sometimes skips tests or writes implementation first, especially in longer conversations where earlier instructions get deprioritized. You can use Claude Code Hooks to enforce TDD discipline by intercepting file modifications:
- A
PreToolUsehook that blocks model SQL creation when no corresponding YAML tests exist - A hook that validates changes against common violations: implementing functionality without a failing test, modifying tests to make them pass (instead of modifying the implementation)
The /test-prompt pattern adds another layer of safety: before using a prompt on production code, run it on a sandbox model first. The command checks the output for correct BigQuery syntax, proper use of ref() and source(), and documents any edge cases. Test on non-production models before running something across your whole project.
When TDD Makes Sense (and When It Doesn’t)
TDD with Claude Code is highest-value for:
- New mart models with complex business logic — the models where getting it wrong has real consequences
- Refactoring existing models — write tests capturing current behavior first, then restructure
- Models with tricky edge cases — date boundaries, null handling, multi-currency calculations
It’s lower-value for:
- Simple base models that just rename and cast columns — generic tests on primary keys are enough
- One-off exploratory queries that don’t persist
- Models where you’re still figuring out the schema — writing tests for something you’re going to redesign twice is wasted effort
The sweet spot is any model where you can articulate what “correct” means before writing the SQL. If you can’t describe the expected behavior in tests, you probably don’t understand the requirements well enough yet — and that’s valuable information too.