The Context Gap in AI Data Engineering

Ben Lorica at Gradient Flow named the central problem: “A recurring failure mode in agentic workflows is where an agent has access to data but lacks the business logic to interpret it.” This is the “context gap.” Application code has tests that verify correctness — if a function should return 42 and returns 41, the test fails. Data transformations depend on business meaning: “Revenue” is calculated differently by Finance and Sales; NULLs mean different things in different tables; slowly changing dimensions require understanding historical versus current state. Correctness criteria live in people’s heads, not in any system the AI can query.

What the AI Can’t Know

Tristan Handy, founder of dbt Labs, identified the tasks that resist automation in a March 2025 post: “Collaboratively design changes to existing data assets to accommodate new requirements,” and questions like “what are the edge cases I need to know about when calculating cost of goods sold?” These require organizational knowledge that no schema or lineage graph captures.

Cédric Charlier framed it more concretely. Most AI tooling has no way of answering these questions reliably:

What does “Status” mean? Is it draft, intermediate, final, canceled?
Is “Amount” net or gross?
Does “CustomerID” point to the current version or a slowly changing dimension?

These questions have definitive answers in every organization. But the answers aren’t written down anywhere the AI can access. They exist as accumulated knowledge from years of working with the data, passed along in onboarding conversations, Slack threads, and “oh, actually, that field doesn’t mean what you think” code review comments.

A commenter on Data Engineering Weekly gave the example that makes this visceral for anyone who’s worked with enterprise systems: “In the SAP world, a field like PRCTR behaves differently across company codes. Certain document types in ACDOCA need to be excluded for specific reporting scenarios. That knowledge is tacit.” You can’t encode it in a CLAUDE.md file because nobody has written it all down. It exists as judgment.

Why Data Engineering Is Harder Than Application Engineering for AI

This is worth understanding structurally, not just anecdotally. Application code and data transformation code look similar — both are code, both can be generated by AI — but they have fundamentally different verification properties.

Application code is self-verifying. You write a test that asserts expected behavior. If the behavior changes, the test fails. The correctness criteria are encoded in the test suite.

Data transformation code depends on external meaning. A CASE WHEN status = 'active' statement is only correct if you know what “active” means in the context of this particular system. Does it include trial accounts? Does it include accounts that are paid but haven’t logged in for a year? The SQL is syntactically identical regardless of the answer. Only the business context makes one interpretation correct and the other wrong.

This asymmetry means AI can write data transformation code that is syntactically perfect, logically consistent, and completely wrong from a business perspective. The failure mode research confirms this: 97% of incorrect AI-generated SQL runs without warnings.

Cross-system dependencies compound the problem. A modern data stack involves dbt transformations, Airbyte or Fivetran connectors, cloud storage, API rate limits, orchestration tools, and BI platforms. Each system has its own conventions, limitations, and failure modes. AI can’t reason about the end-to-end flow — what happens when the API rate limit triggers a partial load, which causes an incremental model to process only half the expected rows, which causes a downstream metric to drop by 50%.

Databricks reported that over 80% of new databases are now launched by AI agents. But as they noted, “close enough” SQL “stops being acceptable once software is making decisions.” The margin for error shrinks as AI-generated code moves from analyst notebooks to production pipelines that feed automated decisions.

The Tacit Knowledge Problem

The deepest version of the context gap is tacit knowledge — expertise that the holder doesn’t even know they have. An experienced data engineer doesn’t consciously think “I should check whether this join changes the grain.” They just… check. It’s automatic. It’s the result of having been burned by grain explosions enough times that the check becomes instinctive.

This tacit knowledge is what makes the SAP example so powerful. Nobody sat down and wrote “PRCTR behaves differently across company codes” in a documentation system. The people who know this learned it by encountering wrong results, tracing the issue, and building mental models of how the system actually works (as opposed to how the schema suggests it works).

AI doesn’t accumulate tacit knowledge. Each conversation starts fresh. Even with structured instruction files, you’re encoding explicit knowledge — the things someone thought to write down. The tacit layer, the “oh, also, don’t trust the order_date field for records before 2019 because we migrated from the old system and the dates got mangled” layer, remains inaccessible unless someone converts it to explicit documentation.

This is a real bottleneck, not a theoretical one. Every organization has dozens of these tacit rules. Most have hundreds. The ones that get written down are usually the ones that caused the most painful production incidents. The rest live in the heads of the people who’ve been around long enough to know.

Narrowing the Gap

The context gap isn’t a problem you solve once. It’s an ongoing discipline.

Documentation as AI enablement. Tiger Data found that adding semantic catalogs — LLM-generated descriptions of what tables and columns mean — improved AI accuracy by 27%. Systematic documentation isn’t just for humans anymore. It’s the primary input that makes AI useful.

Structured context files. Claude Code’s biggest error source was mismatched conventions. The fix was CLAUDE.md — a file that tells the AI how this specific project works. The pattern generalizes: every piece of business context you can encode in a file the AI reads automatically is one less failure mode.

Context as a team practice. The organizations that will get the most from AI are the ones that treat context documentation as core engineering work, not a side project. Code review should ask “did you update the column descriptions?” the same way it asks “did you add tests?”

None of this eliminates the context gap. It narrows it. The residual gap — the truly tacit knowledge, the edge cases nobody has documented, the cross-system interactions that are too complex to describe — remains the province of experienced humans. That’s why the emerging discipline of context engineering matters: it’s the systematic practice of structuring what AI needs to know.