Agentic Workflow Shift in Data Engineering

Agentic tools shift the data engineering workflow from manually translating intent into code to describing intent and reviewing the result. The mechanical translation work — schema lookup, template adaptation, Jinja compilation, test scaffolding — is handled by the agent. The practitioner’s attention shifts to the parts that require judgment.

The Translation Layer

For data engineers, an enormous share of the work is translation. Business requirements to data models. Raw schemas to clean base layers. Stakeholder questions to SQL queries. Tribal knowledge to documentation. Each translation has the same structure: you understand the input (what’s needed), you know the output format (SQL, YAML, Markdown), and the work is in bridging the two.

Before agentic tools, that bridge was manual. You held both sides in your head and typed the translation character by character. Remembering syntax, looking up function signatures, adapting templates, fixing typos — this is what filled the time between “I know what to build” and “I’ve built it.”

Agentic tools compress that bridge. You describe what you need in natural language, the agent generates the code, and you review whether the output matches your intent. The translation still happens — but the agent handles the mechanics while you handle the intent.

This is most obvious for repetitive translations. A base model for a new source table is a translation from “raw schema” to “clean, consistently named, properly typed, tested model.” The translation rules are the same every time: your naming convention, your deduplication pattern, your test expectations. An agent that’s read your project (or been given a CLAUDE.md) knows these rules and applies them without being told.

What Changes in Practice

The shift isn’t abstract. It changes daily rhythms:

Model creation. Instead of spending 20 minutes building a model from scratch (schema lookup, template copy, column mapping, config, tests, compile, debug), you spend 2 minutes writing a prompt and 5 minutes reviewing the output. The net time savings is real, but the more important change is cognitive: you spent those 20 minutes on mechanical work before. Now you spend 5 minutes on judgment work (is this model correct? does it capture the right business logic?).

Multi-file changes. Renaming a column that flows through four downstream models used to mean opening each file, finding the reference, updating it, hoping you didn’t miss one, running the full build to check. Now: “Rename customer_id to customer__id across all models in the orders lineage chain.” The agent reads the DAG, finds every reference, updates them all, and runs the build. See Advanced Claude Code Workflows for dbt for the systematic approach.

Documentation. Writing column descriptions and model documentation is translation work — you know what the model does, and you need to express that in YAML. This is exactly the kind of work an agent handles well: it reads the SQL, understands the transformation, and generates descriptions that a human then reviews and refines.

Debugging. When a model produces wrong results, the old workflow is: hypothesize, query an intermediate table, analyze, hypothesize again, query another table, repeat. The new workflow: describe the symptom (“revenue is wrong for customer X”), let the agent trace through the lineage, and review the diagnosis. The investigation still happens — you just aren’t the one driving each step.

The “How” vs. “What” Shift

Before agentic tools, a significant portion of a data engineer’s time went to implementation questions: how to write this query, handle this Jinja syntax, structure this YAML. With agentic tools, more time goes to modeling decisions: what should this model contain, what’s the right grain, what business logic applies, what tests are needed. The easy work (syntax, templates, boilerplate) is automated; the modeling work stays human.

The risk — explored in depth in AI Developer Skill Atrophy — is that delegating the “how” entirely causes loss of the understanding that comes from doing mechanical work. The workflow shift is sustainable when reviewing AI-generated code builds the same comprehension that writing it used to.

The Review Skill

The shift creates a new core skill: reviewing AI-generated code. This is different from traditional code review.

In traditional code review, you’re looking for human mistakes — typos, logical errors, style inconsistencies, missing edge cases. In AI-generated code review, syntax is almost always correct. What you’re looking for is contextual mistakes:

Did the agent use the right join type? (A LEFT JOIN might be correct syntactically but wrong if the relationship is actually one-to-one and you expect an INNER JOIN.)
Did it handle NULLs correctly for your business context?
Did it apply the right temporal filters? (The context gap shows this is where AI fails most silently.)
Did it follow conventions consistently, or did it introduce a slightly different pattern?
Did it add anything unnecessary? (Agents tend to over-engineer — extra error handling, unnecessary abstractions, comments explaining obvious code.)

A useful mental model: treat the agent like a capable junior — fast, productive, good at following patterns — with the practitioner reading every line before it ships.

What Doesn’t Change

The shift is real but bounded. Agentic tools don’t change the fundamental nature of data work. You still need to:

Understand the business well enough to know what models are needed
Design the data model (layer boundaries, entity definitions, grain decisions)
Define what “correct” means for each transformation
Make quality judgments that depend on organizational knowledge
Communicate with stakeholders about what the data can and can’t answer

These are the parts that make data engineering a skilled profession rather than a typing exercise. The workflow shift makes the typing part faster. The thinking part remains yours.