This note is a reference for tools that apply AI to SQL and dbt code review. Traditional linters (SQLFluff, dbt Project Evaluator) check formatting and structural anti-patterns but do not evaluate whether SQL does what is intended. The tools below target semantic errors — wrong JOIN conditions, missing temporal filters, aggregation mismatches — that pass every existing check. Background: AI-Generated SQL Failure Modes.
Altimate AI (DataPilot CLI + dbt Power User)
Altimate AI is the most dbt-specific option. It operates through two interfaces: the DataPilot CLI (which runs as a pre-commit hook or in CI) and the dbt Power User VS Code extension.
The DataPilot CLI runs dbt-specific AI checks at the pre-commit level. It detects high source/model fanouts, hard-coded references, unused sources, and long chains of non-materialized models. These are the kinds of structural issues that dbt Project Evaluator also catches, but DataPilot adds AI-based logic review on top.
The dbt Power User extension adds BigQuery cost estimation and compiled query preview directly in VS Code. You can see estimated bytes scanned before running anything — useful for catching the missing partition filter problem before it hits your bill.
Benchmark numbers: on the TPC-H 1TB benchmark, Altimate’s optimization Skills for Claude Code produced queries that ran 22% faster while remaining logically equivalent. On ADE-bench (43 real-world dbt tasks across 5 projects), it reached 53% accuracy. That 53% number is worth contextualizing — the benchmark includes complex tasks like full model creation and refactoring, not just review. Their key finding was that the number one source of errors was mismatched conventions, not hallucination or wrong syntax. The fix was straightforward: add an instruction to read existing models first.
The pre-commit configuration is minimal:
repos: - repo: https://github.com/AltimateAI/datapilot-cli hooks: - id: datapilot-cliGreptile
Greptile takes a different approach: full codebase context via RAG. Instead of reviewing individual files or queries in isolation, Greptile indexes your entire repository and uses that context when reviewing PRs.
This matters for dbt projects specifically. When Greptile reviews a PR, it can see that the customer__id column you’re joining on was deprecated two weeks ago in a different model and replaced with account__id. A tool that only sees the changed files would miss this entirely.
In a July 2025 evaluation of 50 real bugs, Greptile caught 82% — the highest of any AI code review tool tested. GitHub Copilot’s PR review feature caught 55% of the same bugs. That 27-percentage-point gap matters when the bugs you miss are the ones that reach production as silent errors.
The codebase awareness is the differentiator. Without it, SQL review is just syntax checking with extra steps. Cross-model issues — column renames that break downstream references, filter changes that affect metrics used by other teams — only surface when the review tool understands the full project.
CodeRabbit
CodeRabbit provides AI PR review across GitHub, GitLab, and Bitbucket. Its standout feature is accessibility: adding it to a GitHub repository takes less than five minutes and starts catching issues immediately, even without custom configuration. It’s also free for open-source projects.
The defaults already flag missing tests, unused models, and SQL anti-patterns. Custom rules reduce false positives, but you get value from day one without configuration investment. For teams that want AI review without a significant setup commitment, CodeRabbit is the lowest-friction entry point.
CodeRabbit doesn’t have Greptile’s depth of codebase awareness, but it catches a meaningful subset of issues at a much lower adoption cost. For many teams, “catches something immediately” beats “catches more but requires setup.”
MotherDuck FixIt
MotherDuck’s FixIt takes a fundamentally different approach from the others: real-time error correction at query execution time. Instead of reviewing code before it runs, FixIt intercepts errors as they happen and suggests fixes with 1-3 second latency.
Their team discovered two implementation details worth knowing. First, prepending line numbers to queries significantly improved LLM accuracy — the model could reference specific lines in its fix suggestions. Second, generating only the fix line (instead of rewriting the full query) improved both speed and correctness. Both findings are useful if you’re building similar tooling internally.
FixIt is most relevant for interactive query development and DuckDB/MotherDuck-specific workflows. It doesn’t replace PR-level review, but it shortens the feedback loop for the write-run-fix cycle during development.
Context Determines Accuracy
Across all tools and benchmarks: context determines accuracy. Without schema information, Tiger Data found 42% of AI-generated SQL queries referenced non-existent objects. Thomson Reuters reduced incorrect filtering from 73% to under 10% by adding evaluation frameworks (TruLens + AgentBench). Tiger Data reported a 27% accuracy improvement from LLM-generated semantic catalogs.
The infrastructure that closes this gap — the dbt MCP Server for lineage and metadata, CLAUDE.md files for conventions, semantic catalogs for column-level meaning — requires setup and ongoing maintenance. All tools above perform better with this context in place.
Choosing Between Them
These tools are not mutually exclusive; teams often layer them (see Layered SQL Review Pipeline for dbt):
| Tool | Stage | Strength | Setup Effort |
|---|---|---|---|
| Altimate DataPilot | Pre-commit | dbt-specific structural checks | Low |
| Greptile | PR review | Full codebase context, highest accuracy | Medium |
| CodeRabbit | PR review | Fastest to adopt, broad coverage | Very low |
| MotherDuck FixIt | Runtime | Real-time fix suggestions | Low (DuckDB only) |
Cross-model awareness — at least one tool seeing more than the changed files in isolation — separates useful review from sophisticated linting.