ServicesAboutNotesContact Get in touch →
EN FR
Note

Data Architecture as Human Judgment

Why data architecture — DAG design, ownership models, temporal logic, team boundaries — resists AI automation and remains a fundamentally human discipline.

Planted
dbtdata engineeringdata modeling

Data Engineering Weekly polled practitioners on what stays core as AI writes more code: 53% said architecture and trade-offs, 25% said product and problem discovery, 20% said quality and ownership. AI generates SQL, boilerplate, column descriptions, and incremental model configurations — the mechanical execution layer. Architecture, ownership, and temporal modeling decisions remain outside what AI can make from available context.

What Architecture Decisions Actually Are

Tristan Handy specifically called out “designing the overall architecture of the DAG, including modularization, team boundaries and ownership” as a fundamentally human task. This is worth unpacking because “architecture” is an overloaded term.

In a dbt project, architecture means:

DAG structure. How models connect. Which transformations are upstream of which. Whether to centralize shared logic in intermediate models or let marts duplicate it. The three-layer pattern (base, intermediate, marts) is an architectural decision, and it determines whether a project scales or collapses under its own weight.

Ownership models. Which team owns which models. Where the boundaries are between domains. Whether the marketing team can modify models that feed the finance team’s reports. These aren’t technical questions — they’re organizational ones that require understanding how the company works.

How to split responsibilities across domains. Should customer lifetime value live in the marketing domain or the finance domain? Both teams need it. Both teams calculate it differently. The answer depends on your organization’s decision-making structure, not on any technical constraint.

Trade-off evaluation. Every architectural choice is a trade-off. Incremental models reduce cost but add complexity. Strict naming conventions improve discoverability but slow down initial development. Contracts on marts guarantee schema stability but require more maintenance. AI can list trade-offs. It can’t evaluate them in your specific context.

AI can generate individual models. It cannot make these structural calls. The information needed — how the organization works, which teams collaborate, where the political boundaries are, what failed in the past — isn’t encoded anywhere the AI can read.

Temporal Logic as Architecture

The Thomson Reuters finding — that 73% of time-based analyses had inconsistent temporal filters — is an architecture story, not a query-writing story.

Temporal inconsistencies reflect design decisions about how time is modeled across the warehouse. Which tables use event time versus processing time? How do slowly changing dimensions track history? When late-arriving data gets incorporated, which downstream queries need to account for the lag? How does the incremental strategy interact with the temporal model?

These choices propagate through every downstream query. An AI generating SQL against a well-designed temporal architecture produces better results than an AI working with a confused one, because the architecture constrains what’s possible. Getting time modeling right at the foundation means less room for AI-generated queries to go wrong.

This is why architecture matters more as AI writes more code. When a human writes every query, temporal inconsistencies are localized — one analyst makes a mistake, it affects their work. When AI generates hundreds of queries against the same warehouse, a bad architectural choice about how time works multiplies through every generated query.

The Engineering Differentiator

Kestra’s engineering blog captured the shift precisely: “AI is commoditizing the ‘data’ part. Anyone can write SQL with assistance. The ‘engineering’ part becomes the differentiator: reliability, incident response, cost.”

This is a useful framing because it separates two things that used to be bundled together. “Data engineering” was both “working with data” (understanding schemas, writing transformations, building pipelines) and “engineering systems” (ensuring reliability, managing costs, handling failures gracefully). AI handles the first part increasingly well. The second part requires judgment about:

  • Reliability. What happens when this model fails at 3 AM? Who gets paged? What’s the recovery procedure? Is the data fresh enough for the 7 AM dashboard refresh, or do we need to build a retry mechanism?

  • Incident response. The revenue report is showing numbers 30% below yesterday. Is it a data issue, a pipeline failure, a schema change, or an actual business event? The diagnosis requires knowing what “normal” looks like, which is accumulated knowledge, not something you can prompt for.

  • Cost. A query that scans 2TB of data every hour costs real money. An incremental strategy that replaces entire partitions versus one that scans the full table on every run has meaningful cost implications. These decisions require understanding both the data patterns and the warehouse’s cost model.

  • Ownership. When a model breaks, who fixes it? When a stakeholder questions a number, who investigates? AI generates code. Humans take responsibility for it.

Zach Wilson summarized the shift: “AI didn’t kill data engineering. It killed pretending data engineering was about typing code.” If your value was syntax mastery — knowing the right incantation of window functions or the correct Jinja syntax for an incremental model — that value is commoditized. What remains is the judgment that determines whether the code should exist in the first place, where it fits in the system, and who’s accountable when it produces wrong results.

Implications

For practitioners: architectural thinking — understanding why the three-layer pattern exists, choosing insert_overwrite versus merge for the right reasons — is the less-commoditized skill.

For organizations: the value has shifted from “people who can write transformations” to “people who decide which transformations should exist and who is responsible when they break.” Architectural judgment develops through years of building, breaking, and fixing systems. It does not appear automatically in practitioners hired only for SQL skill, and it does not transfer from AI tooling.