Agentic AI Fit for Data Work

dbt Labs’ 2025 State of Analytics Engineering report found that 70% of analytics professionals use AI for code development. Most use general-purpose tools (ChatGPT, Copilot) that cannot see the codebase, do not know project conventions, and require constant copy-pasting. The structural properties of data work — repetition, context-switching, cross-layer debugging — make it a particularly strong fit for the agentic approach, where a tool reads the project, understands its patterns, and acts on them.

The Repetition Argument

Data work is pattern-heavy. Base models follow predictable structures: select from source, deduplicate, rename, cast, filter. Only the details change — different source tables, different column names, different deduplication keys. The same is true across the dbt Three-Layer Architecture: intermediate models join and enrich in consistent patterns, marts aggregate to specific grains.

This kind of work is uniquely suited to agents because the pattern is well-established and the variation is in the specifics. You don’t need to explain how to write a base model every time. You need to say “create a base model for raw_shopify.orders following my existing base model patterns.” An agent that can read your existing models, detect the pattern, and apply it to new source tables does in seconds what takes a human ten minutes of copy-paste-adapt.

Here’s a simplified example. Say you have an existing base model:

{{ config(materialized='table', tags=['base', 'stripe']) }}

WITH source AS (
    SELECT * FROM {{ source('stripe', 'payments') }}
),

deduplicated AS (
    SELECT *
    FROM source
    QUALIFY ROW_NUMBER() OVER (
        PARTITION BY id ORDER BY _loaded_at DESC
    ) = 1
),

renamed AS (
    SELECT
        id AS payment__id,
        customer_id AS customer__id,
        amount AS payment__amount_cents,
        CAST(created_at AS TIMESTAMP) AS payment__created_at,
        status AS payment__status
    FROM deduplicated
)

SELECT * FROM renamed

An agent reads this, picks up the naming convention (entity__field), the CTE structure (source → deduplicated → renamed), the materialization strategy, and the deduplication pattern. When you ask for a new base model for raw_shopify.orders, it doesn’t generate something generic. It generates something that looks like it belongs in your project.

A chatbot can do this too, but you’d have to paste the example, describe the convention, specify the structure. With an agent, the instruction budget is zero because the agent reads the project itself.

The Context-Switching Argument

A single dbt model involves SQL, Jinja templating, YAML configuration, and occasionally Python. A typical task — say, building an intermediate model — might require:

Writing SQL with CTEs and window functions
Using {{ ref() }} and {{ source() }} Jinja macros
Configuring materialization and tags in a {{ config() }} block
Defining tests and column descriptions in a YAML schema file
Possibly calling a custom macro that uses Jinja control flow

Each of these is a different “language” with different syntax rules. In a traditional workflow, you switch between them constantly — writing SQL, tabbing to the YAML file, checking the macro reference, going back to the SQL. The mental cost isn’t in any single language; it’s in the constant switching.

An agent that understands all of these together — not in isolated snippets but as parts of one project — removes the switching cost. You describe what you want in plain language. The agent generates the SQL, the YAML, and the config block in one pass, all consistent with each other and with your project’s existing patterns. The multi-language nature of dbt work that makes it tedious for humans is exactly what makes it productive for agents.

The Cross-Layer Debugging Argument

When a mart model fails, the problem might be anywhere in the lineage chain. A wrong number in mrt__sales__daily_revenue could trace back to:

A schema change in the source table that the base model didn’t handle
A type cast in the base model that silently converts NULLs
A join in the intermediate model that changes the grain
A filter in the mart that excludes records it shouldn’t
A macro that was recently updated with a subtle logic change

In the old workflow, you’d trace this manually: open the mart, read the SQL, follow the ref() calls upstream, query each intermediate table, compare row counts, check the source. This is slow because each step requires you to hold the full context in your head while navigating between files.

An agent does this naturally. You say “the revenue numbers look wrong in the sales mart — trace through the upstream models and find where the calculation diverges.” The agent reads the mart SQL, identifies the upstream dependencies, queries each layer, and locates the divergence point. It can do this in one pass because it has simultaneous access to every file in your project. The lineage-tracing that’s tedious for humans is trivially cheap for agents. See Debugging dbt with Claude Code for the practical mechanics of how this works.

What Makes This Different From Application Code

Application code also has patterns, context-switching, and debugging. But data engineering has properties that amplify the agent advantage:

Higher pattern density. A typical web application has dozens of different architectural patterns (routing, authentication, state management, API endpoints, database access). A typical dbt project has three primary patterns — base, intermediate, mart — applied repeatedly across hundreds of models. The ratio of “patterns the agent needs to learn” to “instances where it applies them” is much more favorable.

Lower ambiguity in success criteria. A base model either follows the naming convention or it doesn’t. Tests either pass or they don’t. The compiled SQL either runs or it doesn’t. Compared to application code, where “correct” often depends on UX judgment calls, data transformation success is more mechanically verifiable — at least for the structural aspects. (Business logic correctness is a different story; see The Context Gap in AI Data Engineering.)

Stronger convention dependence. The value of a dbt model depends heavily on whether it follows the project’s conventions. A model that works correctly but uses different naming, different materialization, or different testing patterns is a maintenance burden. Agents that read your existing project and replicate its conventions solve the exact problem that makes ad-hoc AI assistance insufficient.

Boundary of the Fit

Data work is well-suited for agentic AI in its mechanical dimensions — repetitive patterns, multi-file coordination, template-following. The context gap remains: business logic, data quality judgment, and architectural decisions do not follow patterns an agent can read from a codebase. The agent handles translation from intent to implementation; knowing what to build remains a human responsibility. The tier of tool matters less than understanding which tasks are pattern-following and which require judgment.