ServicesAboutNotesContact Get in touch →
EN FR
Note

Claude Code Strengths and Limitations for Data Work

Where Claude Code delivers real value in data engineering — boilerplate, multi-file changes, pattern replication — and where it struggles with novel logic, ambiguity, and over-engineering.

Planted
claude codedbtdata engineeringai

Claude Code is an agentic coding tool that reads files, understands project structure, writes code, runs commands, and iterates based on results. The mental model: Claude Code is a capable junior engineer — fast, productive, good at following patterns — with a human reviewing every line before it ships.

Where Claude Code Works Well

Boilerplate Generation

Base models, source YAML, schema test files — anything with a predictable structure is Claude Code’s strongest territory. These tasks share three properties: the pattern is well-established, the variation is in specifics (table names, column names, data types), and correctness is mechanically verifiable (does it compile? do the tests pass?).

A typical prompt:

Create a base model for raw_shopify.orders following my existing
base model patterns. Include source YAML, schema tests for primary
key uniqueness and not-null on required fields.

Claude Code reads your existing base models, picks up the naming convention, CTE structure, materialization strategy, and deduplication pattern, then generates a model that looks like it belongs in your project. It also generates the corresponding YAML with tests. The output typically compiles on the first try because Claude runs dbt compile as part of its work loop and fixes any issues.

This is where the structural fit between data work and agentic AI is most visible. The high pattern density of dbt projects means Claude Code learns your conventions from a few examples and applies them consistently across dozens of new models.

Multi-File Changes

Renaming a column across downstream models, refactoring a macro and updating all its call sites, restructuring a set of models after a source schema change — these tasks require coordinated edits across multiple files. Claude Code handles them natively because it can read the full project, identify every reference, and update them in one pass.

Rename the column order_amount to order__amount_usd in
base__shopify__orders and update all downstream references.
Run dbt build on the affected models to verify.

This is the kind of work that’s tedious and error-prone for humans (did I find every reference? did I miss one in a macro?) and trivially correct for an agent that can search the entire project. See Advanced Claude Code Workflows for dbt for the systematic approach to multi-file operations.

Codebase Exploration

When you drop into an unfamiliar dbt project — a new client engagement, a codebase you inherited, an open-source project you’re evaluating — Claude Code can explain what the project does, how it’s structured, and how data flows through it. “What does this project do?” produces a useful overview. “Trace the lineage from raw_stripe.payments to the final revenue mart” produces a specific walkthrough.

This works well because it’s a reading-and-summarization task. Claude Code reads the files, follows the ref() calls, and synthesizes a coherent explanation. No business context is needed — the structural understanding is in the code itself.

Pattern Replication

Once you have one good example of any pattern — a well-structured intermediate model, a tested macro, a documentation block — Claude Code applies that pattern elsewhere reliably. The “one good example” principle: invest time in getting one model right, then let the agent replicate it across your project.

Use int__orders__enriched as a template pattern. Create an
equivalent intermediate model for the sessions entity, joining
base__ga4__events with base__ga4__session_params.

The agent reads the reference model, extracts the structural pattern (CTE flow, join approach, naming convention, config block), and applies it to the new context. The result follows your conventions because it learned them from your code, not from a generic training set.

Where Claude Code Struggles

Novel Business Logic

If you’re designing a complex attribution model, building a revenue recognition calculation with edge cases specific to your business, or creating a customer segmentation that depends on organizational knowledge — Claude Code can’t think through the problem for you. It implements what you describe, but the hard thinking stays human.

This limitation is structural, not a bug. Novel business logic depends on context that doesn’t exist in your codebase: what “active customer” means at your company, how your finance team handles refunds, which edge cases matter for regulatory compliance. No amount of reading your project files gives Claude Code access to this knowledge.

The practical implication: use Claude Code to implement business logic after you’ve designed it, not to design it. Write the requirements yourself (or in your prompt), then let the agent translate them to SQL. Review the output against your requirements, not against generic SQL correctness.

Ambiguous Requirements

When prompts are vague, Claude Code makes confident-sounding choices that may not match your intent. “Build me a customer model” produces something — but the grain, the included attributes, the aggregation level, and the business logic are all assumptions the agent made without guidance.

The fix is specificity. Compare:

Vague: “Create a customer metrics model.”

Specific: “Create a mart model at one-row-per-customer grain. Include total_orders, total_revenue_usd, first_order_date, last_order_date, and days_since_last_order. Source from int__orders__enriched. Materialize as table. Add unique and not_null tests on customer__id.”

The second prompt leaves no room for assumption. Each column, the grain, the source, the materialization, and the tests are specified. Claude Code’s output will match your intent because your intent is explicit.

This is one reason CLAUDE.md as Project Memory matters so much. Conventions encoded in CLAUDE.md (naming patterns, materialization defaults, test requirements) act as implicit specificity for every prompt. You don’t have to repeat “use double-underscore naming” because the agent reads it from the file.

Over-Engineering

Left unchecked, Claude Code tends to add more than you asked for. Extra error handling where none is needed. Unnecessary abstractions (“let me create a macro for that”). Comments explaining obvious code. Additional CTEs that break a simple transformation into overly granular steps.

This is a known pattern with LLMs generally — they optimize for appearing thorough. In a dbt context, over-engineering manifests as:

  • Adding SAFE_CAST everywhere, even on columns that can’t possibly fail a cast
  • Creating helper macros for one-off transformations
  • Adding column descriptions to YAML that restate the column name (“customer_id: The ID of the customer”)
  • Wrapping simple logic in unnecessary CTEs

The mitigation is review. When you see something the agent added that you didn’t ask for, ask yourself: does this add value or just add lines? If the latter, tell Claude to remove it. Over time, add instructions to your CLAUDE.md: “Don’t add SAFE_CAST unless the data type conversion could genuinely fail.” “Don’t create macros for logic used in only one model.”

Large Monorepos

On very large codebases — hundreds of models across multiple projects in a monorepo — Claude Code can lose track of relevant files or miss patterns buried deep in the project. The context window, while large, isn’t unlimited. When the project exceeds what Claude Code can hold in memory, it starts making choices about what to read and what to skip. Those choices aren’t always right.

For large projects, the mitigation is structured context. A well-organized CLAUDE.md with pointers to key directories, naming conventions, and architectural decisions helps Claude Code navigate the project without reading everything. Directory-level CLAUDE.md files in monorepo subdirectories scope the context to the relevant section.

The Mental Model in Practice

The “capable junior” framing isn’t dismissive — it’s operational. A capable junior engineer:

  • Follows established patterns reliably
  • Produces clean, conventional code quickly
  • Needs specific instructions for non-obvious requirements
  • Sometimes does more than asked (out of enthusiasm, not malice)
  • Requires code review on every piece of work
  • Can’t make business decisions independently
  • Gets better over time as you encode more conventions

Working effectively with Claude Code means adopting the same management style you’d use with a strong but inexperienced team member. Give clear instructions. Provide good examples. Review everything. Encode the recurring feedback as conventions (in CLAUDE.md) so you don’t repeat yourself.

The context and process around the tool — project configuration, reusable commands, guardrails, and thorough review — are what differentiate effective use from frustrating use.