ServicesAboutNotesContact Get in touch →
EN FR
Note

Codebase Refactoring with Claude Code

How Claude Code enables project-wide dbt refactoring — column renames, naming convention migrations, and ref() updates across dozens of files without the manual search-and-miss problem.

Planted
claude codedbtdata engineeringaiautomation

Renaming columns, migrating naming conventions, and updating model references across a dbt project requires finding every reference across SQL, YAML, docs blocks, singular tests, and macros. Missing one reference produces a broken production run. Claude Code reads the entire project, finds every reference, updates all affected files in one pass, and shows the complete diff for review.

Column Renames Across Downstream Models

The most common refactoring need: a column in an upstream model gets renamed, and everything downstream needs to follow.

Rename customer_id to customer__id across all models in models/marts/
that reference dim_customers. Update the YAML files too.

Claude’s process:

  1. Reads dim_customers to confirm the column name and its current references
  2. Searches all files in models/marts/ for every occurrence of customer_id
  3. Identifies which occurrences are referencing dim_customers versus unrelated tables with the same column name
  4. Makes targeted edits across all affected SQL files
  5. Updates corresponding YAML test and description files
  6. Shows the complete diff for review

What you get is a comprehensive set of changes you’d otherwise make manually, with the key advantage: Claude doesn’t miss files. Human search-and-replace in a dbt project is error-prone because column names appear in SQL, YAML, docs blocks, singular tests, and sometimes macros. Claude reads all of them.

Naming Convention Migrations

Larger projects accumulate naming debt. You adopt double-underscore separators halfway through, and now half the project uses order_id and half uses order__id. Or you change from stg_ prefix to base__ and need to update all ref() calls.

We're moving from ref('stg_orders') to ref('base__shopify__orders') across
the codebase. Find all references to stg_* models and update them to use
the new base__* naming convention.

This is exactly the kind of work where Claude Code’s codebase-wide awareness matters. It’s not just finding the ref('stg_orders') string — it’s understanding that stg_orders was an intermediate model and the new base__shopify__orders is the replacement, then updating every downstream ref() call, every YAML dependency declaration, every source definition that references it.

For a project with 50+ models, this migration might touch 30 files. Manually: a day of careful search-and-replace with high anxiety about what you missed. With Claude Code: a single prompt, a diff review, and a dbt parse to confirm no broken references.

When to Run dbt parse and dbt compile After Refactoring

For any refactoring that touches ref() calls or model names, include verification in the prompt:

Rename all stg_* models to base__* following our naming convention.
After making changes, run dbt parse to verify no broken references.

dbt parse catches broken ref() calls without materializing anything. It’s fast enough to run as a verification step within the same Claude session. Claude sees the output, fixes any missed references, and runs it again until clean.

dbt compile is more thorough — it renders the Jinja and shows you the actual SQL that would run. For column-level renames especially:

After renaming the columns, run dbt compile --select marts/+ to verify
the compiled SQL looks correct before I run a full build.

These verification steps turn refactoring from “I think this is right” to “I can see it’s right.”

A Systematic Approach to YAML Updates

SQL files are the obvious target, but YAML files are where refactoring breaks silently. A column rename in the SQL that isn’t reflected in the YAML schema tests will still build — but the test will fail when next run, or worse, the column description will describe the wrong column.

Include YAML explicitly:

Rename order_amount to order__amount_usd in base__shopify__orders.
Update:
1. The SQL column definition and all its references downstream
2. The schema.yml column name and description
3. Any singular tests in the tests/ directory that reference this column
4. Any docs blocks in models/docs.md that document this column

The thoroughness of the instruction determines the thoroughness of the output. Vague refactoring prompts produce partial refactors that look complete but break subtly.

Multi-Layer Migrations

Some migrations affect the project’s architecture, not just names. Moving models between layers, changing materialization strategies across a set of models, reorganizing folder structure — these require coordinated changes that Claude Code handles natively because it maintains awareness of the full project.

We're splitting models/marts/finance/ into two folders:
models/marts/finance/revenue/ and models/marts/finance/costs/.
Move models:
- mrt__finance__revenue, mrt__finance__arr, mrt__finance__mrr → revenue/
- mrt__finance__cogs, mrt__finance__opex → costs/
Update all ref() calls, the dbt_project.yml config block, and any
documentation that references the old paths.

This would require careful coordination across multiple files manually. Claude Code reads the project structure, makes the moves, updates the dbt_project.yml to apply configurations to the new paths, and updates any model descriptions that mention folder paths.

Reviewing the Diff

For refactoring work, the diff review is the most important step. Claude shows every changed file — read each one.

Watch for:

  • Missed references. Search the diff for the old column name or model name to confirm nothing was skipped.
  • Unintended changes. Claude occasionally makes related improvements that you didn’t ask for. Decide whether to keep them.
  • YAML completeness. Confirm column names in schema.yml match the updated SQL column names.
  • Test validity. If a test referenced the old column name, confirm it was updated correctly.

The diff review for a 30-file refactor takes 10-15 minutes. That’s still dramatically faster than making the changes manually, and the review is the point — you want to understand every change before it ships.

Effect on Maintenance Work

Lower refactoring cost means deferred work gets done. When a column rename takes 20 minutes instead of half a day, documentation stays current, naming remains consistent, and tests get updated rather than deleted when schemas change.