No single tool catches every class of SQL error. Linters catch formatting. AI catches semantic issues. dbt tests catch data violations. A layered pipeline stacks tools at different stages so each layer catches what the others miss, and review depth scales with the risk of the change.
This pattern applies specifically to dbt projects. The tooling and integration points are dbt-specific.
Layer 1: IDE Feedback (Real-Time)
Claude Code, Cursor, or another AI coding assistant provides real-time feedback while you write. This layer catches basic semantic issues: wrong table references, obvious JOIN mistakes, missing columns, patterns that don’t match your project conventions.
The key to making IDE feedback useful is context. A well-configured CLAUDE.md with your naming conventions, partitioning strategy, and common patterns reduces false positives by 50% or more, based on what teams report after adding project-specific rules.
What to include in your CLAUDE.md for review quality:
- Warehouse-specific gotchas: BigQuery partition columns and their required filters, Snowflake clustering key columns
- PII rules: Columns that contain PII and should never appear in mart models
- Business logic definitions: How your company calculates revenue, what “active customer” means, which orders should be excluded from metrics
- Common anti-patterns: The specific mistakes your team makes repeatedly
This layer runs constantly. There’s no CI cost, no pipeline delay. The tradeoff is that it only sees what’s in your current context — it can’t catch cross-model issues or problems that emerge from interactions between files.
Layer 2: Pre-Commit (Every Push)
Pre-commit hooks run automatically when you commit code. They catch two categories of issues: formatting violations (SQLFluff) and dbt-specific structural problems (DataPilot).
repos: - repo: https://github.com/sqlfluff/sqlfluff rev: 3.3.0 hooks: - id: sqlfluff-lint args: [--dialect, bigquery] - repo: https://github.com/AltimateAI/datapilot-cli hooks: - id: datapilot-cliSQLFluff handles formatting: 20+ SQL dialect support, a dedicated dbt plugin, auto-fix for most of what it flags. It’s the standard linter for dbt projects and should be non-negotiable in any CI setup.
DataPilot adds AI-based checks on top: high source/model fanouts, hard-coded references, unused sources, long chains of non-materialized models. These are the structural issues that compound into maintenance problems if left unchecked.
Pre-commit hooks are fast — seconds, not minutes. They give developers immediate feedback before code reaches the PR stage. The limitation is scope: they only check the files you’re committing, not the broader project context.
If you use Claude Code hooks, you can add a PostToolUse hook that runs SQLFluff automatically after every SQL file edit, catching formatting issues even earlier than the pre-commit stage.
Layer 3: PR Review (Every Pull Request)
PR-level review is where cross-model issues surface. CodeRabbit or Greptile reviews the full changeset with codebase context. A column rename that breaks downstream references, a filter change that affects metrics used by other teams — these only become visible when the review tool can see beyond the changed files.
Run dbt Project Evaluator alongside AI review at this stage. AI catches logic issues; dbt Project Evaluator catches structural ones:
- Models without tests
- Direct references to sources in mart models
- Excessively long DAG chains
- Models with too many upstream dependencies
You need both for full coverage. AI review understands query logic but doesn’t enforce project structure conventions. dbt Project Evaluator enforces structure but can’t evaluate whether a JOIN condition is correct.
The PR stage is also where human review happens. The human review checklist applies here: check JOIN conditions, temporal filters, NULL handling, aggregation grain, and filter completeness. AI review flags candidates; humans make the judgment calls.
Layer 4: dbt Cloud CI (Changed Models + Downstream)
In dbt Cloud, Slim CI with dbt build --select state:modified+ tests only changed models and their downstream dependencies. This catches issues that no amount of code review can: data-level violations that only manifest when the query actually runs against warehouse data.
The dbt Testing Taxonomy defines what gets tested at this stage: generic tests (unique, not_null, relationships), singular tests for complex business rules, unit tests for transformation logic, and contract enforcement for schema stability.
On dbt Core without dbt Cloud, an equivalent GitHub Actions workflow achieves the same coverage:
- name: dbt CI run: | dbt build --select state:modified+ --defer --state ./prod-manifestThe --defer flag tells dbt to use production artifacts for unchanged models, so you only build and test what changed. This keeps CI fast while maintaining full downstream coverage.
Matching Review Depth to Risk
Not every layer needs to run on every change. The architecture is designed so review intensity scales with the risk of the change:
| Change Type | Layers That Run |
|---|---|
| Any code change | IDE feedback (constant) |
| Every commit | Pre-commit hooks (SQLFluff + DataPilot) |
| Every PR | AI PR review + dbt Project Evaluator + human review |
| Every merge to main | Slim CI with dbt tests |
| Monthly/quarterly | Deep Claude Code review of critical paths |
The last row is important. For mart models, financial data, and anything customer-facing, a periodic deep review using Claude Code with your full project context catches issues that accumulate slowly — convention drift, deprecated patterns, business logic that no longer matches how stakeholders interpret the metrics.
The Three-Layer Minimum
In practice, three layers cover most of the gap:
- SQLFluff for formatting (fast, deterministic, non-negotiable)
- One AI review tool for PRs (CodeRabbit for lowest friction, Greptile for deepest coverage)
- A context-aware assistant at the IDE level (Claude Code with a good CLAUDE.md)
The specific tools matter less than configuring each layer with your project’s schema, conventions, and business rules. An unconfigured Greptile gives you less than a well-configured CodeRabbit. The tradeoffs of adding more layers are real — CI latency, false positive volume, conflicting feedback — so start with three and add only when you have evidence of gaps.