Base Model Generation with Claude Code

Claude Code generates dbt base models via pattern replication: it reads existing base models in the project, extracts their conventions, and applies them to a new source. The workflow covers prompting, CLAUDE.md defaults, and common review issues.

The Core Pattern

The prompt that works:

Create a dbt base model for source_shopify.orders following the patterns
in models/base/

Claude Code will:

Read 2-3 existing base models to extract your conventions
Check the source YAML for the available columns
Generate the SQL applying your CTE structure, naming pattern, and materialization config
Generate the YAML with primary key tests

The output should match existing base model conventions, not generic boilerplate — the 50th base model in a project should be as consistent with project conventions as the first.

Why Vague Prompts Fail

“Create a base model for orders” produces generic output. Without more context, Claude does not know:

Which source system (Shopify, Stripe, a custom API?)
Which GCP project if you have multiple environments
Whether you deduplicate with ROW_NUMBER() and QUALIFY or a different pattern
What your column naming convention is (order_id, order__id, or ord_id?)
Which metadata columns you always include (_loaded_at, _source_table, etc.)

Vague prompts require extensive corrections.

Effective prompts specify constraints:

Create a base model for source_stripe.charges from the production-gcp-project
BigQuery project. Follow patterns in models/base/base_shopify__orders.sql.
Include _loaded_at and _source_table metadata columns. Cast amount to FLOAT64
and divide by 100 (Stripe stores cents). Deduplicate on charge_id using
ROW_NUMBER() QUALIFY.

That prompt leaves nothing to assumption. Claude generates a model that fits your project on the first attempt.

Encoding Defaults in CLAUDE.md

The better long-term solution is encoding your project’s base model conventions in CLAUDE.md as Project Memory. This means you don’t have to repeat constraints in every prompt — Claude reads them once and applies them throughout the session.

A CLAUDE.md section for base models:

## Base Models

- All base models include _loaded_at and _source_table columns
- Timestamps cast to TIMESTAMP, localized to Europe/Paris timezone
- Source freshness: warn_after 12 hours, error_after 24 hours
- Deduplicate using ROW_NUMBER() QUALIFY on the primary key, ordered by _loaded_at DESC
- Naming: base__[source]__[entity] (e.g., base__shopify__orders)
- Materialize as table, tagged with ['base', 'source_name']

## Column naming
- Double-underscore separator: order__id, customer__id, order__created_at
- Singular entity names: customer not customers

With this in place, the prompt can be minimal:

Create a base model for source_stripe.charges.
Primary key: charge_id. Amount stored as cents (divide by 100).

Claude reads CLAUDE.md, applies the conventions, and generates a complete model with the right metadata columns, deduplication pattern, materialization config, and naming.

A Typical Generated Base Model

Here’s what the output should look like for a Stripe charges source, with standard conventions applied:

{{ config(
    materialized='table',
    tags=['base', 'stripe']
) }}

WITH source AS (
    SELECT
        id,
        customer_id,
        amount,
        currency,
        status,
        created,
        _loaded_at
    FROM {{ source('stripe', 'charges') }}
),

deduplicated AS (
    SELECT *
    FROM source
    QUALIFY ROW_NUMBER() OVER (
        PARTITION BY id
        ORDER BY _loaded_at DESC
    ) = 1
),

renamed AS (
    SELECT
        id                                          AS charge__id,
        customer_id                                 AS customer__id,
        CAST(amount AS FLOAT64) / 100               AS charge__amount_usd,
        LOWER(currency)                             AS charge__currency,
        status                                      AS charge__status,
        TIMESTAMP_SECONDS(created)                  AS charge__created_at,
        _loaded_at,
        '{{ source("stripe", "charges") }}'         AS _source_table
    FROM deduplicated
)

SELECT * FROM renamed

Notice the structure: source → deduplicated → renamed → final SELECT. This follows the standard base CTE pattern. No business logic, no joins, just clean typed data with consistent naming.

The YAML Claude generates alongside this:

models:
  - name: base__stripe__charges
    description: "This model contains one row per Stripe charge, deduplicated and renamed to project conventions."
    columns:
      - name: charge__id
        description: "Primary key. Stripe charge identifier."
        tests:
          - unique
          - not_null
      - name: customer__id
        description: "Stripe customer identifier. Foreign key to base__stripe__customers."
        tests:
          - not_null
      - name: charge__amount_usd
        description: "Charge amount in USD. Stripe stores as integer cents; converted to decimal."
        tests:
          - not_null

Handling New Source Onboarding

Where this pattern pays off most is onboarding an entirely new source system. A Stripe connector might expose 15-20 tables: charges, customers, invoices, subscriptions, payment_methods, products, prices, events, and so on.

With Claude Code:

We're onboarding the Stripe source. Create base models for all tables in
source_stripe: charges, customers, invoices, subscriptions, payment_methods,
products, prices. Follow patterns in models/base/. One model per table,
one SQL file and one YAML file per model.

Claude reads one or two existing base models, then generates all 7 in sequence. Review each diff. The process takes 30-45 minutes, and the output is more consistent than models written by different people at different times.

Reviewing Claude’s Output

Review every generated model before committing. The most common issues:

Missing type casts. Claude sometimes leaves columns as-is when they need explicit casting. Check every timestamp, boolean, and monetary column.

Wrong deduplication key. Compound primary keys (partitioned on multiple columns) require explicit instruction. If your source has a composite key, specify it in the prompt.

Invented columns. If Claude doesn’t have access to the actual source schema, it may generate column names based on what “seems right” for that type of table. Verify every column exists in the actual source.

Inconsistent metadata columns. If your CLAUDE.md specifies _loaded_at, check that it appears correctly. Claude sometimes drops metadata columns when it doesn’t see them in the source schema definition.

The review takes 2-3 minutes per model. That’s still a substantial improvement over writing the model from scratch — but skipping review is how subtle inconsistencies compound across a project.

When to Use This Pattern vs. Manual Writing

Claude Code base model generation makes sense when:

Onboarding a new source with multiple tables
Adding individual tables from an existing connector
Standardizing legacy base models to a new naming convention (generate + compare)

Manual writing makes more sense when:

The source has unusual patterns requiring deep thought (complex nested JSON, GA4 event unnesting)
You’re learning the source system and want to understand it thoroughly
The model has business logic that shouldn’t be in base but you’re sorting out the right layer

The dbt Base Layer Patterns note covers what belongs in base models. If you find yourself deciding whether something is “base layer logic,” that’s a signal to write manually — the thinking is the point.