ServicesAboutNotesContact Get in touch →
EN FR
Note

Data Contract Adoption Challenges

Why data contract initiatives fail — the execution gap between contract-as-documentation and contract-as-enforcement, and the cultural change that matters more than the YAML.

Planted
dbtdata qualitydata engineering

Data contracts address a socio-technical problem. The specification, tooling, and YAML format are mature. The primary barrier is organizational: getting software engineering teams to define and maintain contracts for data they’ve never treated as a product.

The Execution Gap

Data Engineering Weekly identified a pattern they call the “execution gap.” Software engineering has long treated specifications as executable constraints — an OpenAPI spec isn’t just documentation, it generates client libraries, validates requests, and powers integration tests. The data industry often treats contracts as descriptive artifacts instead.

Writing a YAML file that describes your expected schema is easy. Making sure that file is validated at every step of your pipeline — from source extraction through transformation to consumption — requires intentional tooling and process. That’s where the difference between contract-as-documentation and contract-as-enforcement becomes clear.

Contract-as-documentation looks like this: you write an ODCS contract, store it in a Git repository, maybe generate some nice HTML documentation from it. When someone asks “what fields does the payments data have?”, you point them to the contract. But when a producer changes the schema, nothing prevents the change from breaking your pipeline. The contract is a reference document, not an enforcement mechanism.

Contract-as-enforcement looks like this: the contract is validated in CI/CD before any deployment. A schema change that violates the contract blocks the pull request. Quality rules embedded in the contract are executed against live data after every pipeline run. SLA violations trigger automated alerts routed to the owning team. The contract is code that runs, not prose that describes.

The gap between these two states is where most contract initiatives stall. Teams write contracts (the easy part) and then don’t build the enforcement infrastructure (the hard part). The contracts decay as the underlying data drifts, and within a few months they’re worse than no contracts at all — they actively mislead anyone who trusts them.

The Cultural Problem

Benn Stancil’s initial skepticism of data contracts captured a real tension. He compared the concept to data mesh as “a kind of Rorschach proposition” and argued “the most foolish thing you could do is turn a technology problem into a people problem.”

He eventually conceded he was wrong — “there is something useful here” — but his framing identifies the core difficulty. Data contracts are a people problem dressed in YAML syntax. The technical layer (specifications, tools, enforcement) exists to serve an organizational goal: making producers accountable for the data they emit and giving consumers reliable expectations about the data they receive.

Getting a software engineering team to accept accountability for their data’s downstream impact requires changing how they think about their own system. The payments team sees themselves as building a payments processing service. The idea that they’re also operating a data product — that their database tables have consumers with expectations and SLAs — is a cultural shift, not a tooling change.

Several factors make this shift difficult:

Asymmetric incentives. The payments team’s OKRs are about payment processing latency, error rates, and throughput. Nobody’s measuring them on analytics pipeline uptime. Without explicit organizational accountability for data quality, contracts are a favor the engineering team does for the data team — and favors get deprioritized when deadlines tighten.

Invisible dependencies. Before contracts, most engineering teams genuinely don’t know who uses their data downstream. The Convoy experience is instructive: engineers’ number-one feedback was “I had no idea anyone was using my data like this.” You can’t maintain a contract for a dependency you don’t know exists.

Competing priorities. Every contract a team maintains is overhead — schema reviews, version management, breaking-change coordination. In organizations where engineering capacity is already stretched, adding “maintain data contracts” to a team’s responsibilities meets resistance unless leadership explicitly prioritizes it.

No enforcement culture. In organizations where data quality has historically been “the data team’s problem,” shifting to shared accountability is a governance change, not a tooling change. It requires executive sponsorship and clear policies about what happens when a contract is violated.

What Yali Sassoon Got Right

Snowplow’s Yali Sassoon offered a useful reconciliation that’s often overlooked: contracts work well for deliberately created data (events you define and emit) but aren’t realistic for SaaS exports, where you don’t control the schema.

This distinction matters because it bounds where contracts can actually work. You can write a contract for your payments event stream because your team defines the event schema, controls the emitter, and can enforce changes through CI/CD. You cannot write a meaningful contract for Salesforce data because Salesforce controls the schema, and you’re extracting whatever they provide.

The practical implication: don’t try to put contracts on everything. Focus on data that your organization produces and controls. For external data sources, schema tests and anomaly detection are more appropriate — they validate what you received rather than trying to constrain what you’ll receive.

The Convoy Case Study

Sanderson’s experience at Convoy is the most detailed public case study of a data contract initiative. The results are instructive in both what worked and what was required to make it work.

What worked: Contracts created visibility between teams. Software engineers discovered downstream dependencies they didn’t know existed. Data quality conversations happened before changes, not after breakages. Sanderson measured success not by the number of contracts deployed but by “the number of conversations created between data and dev teams.”

What was required: Convoy had a dedicated team driving adoption. This wasn’t a side project — it was a staffed initiative with organizational support. For most organizations, that level of investment isn’t available upfront.

The Practical Path

For most organizations, the path is more gradual than Convoy’s. A realistic adoption sequence:

1. Pick 2-3 high-impact datasets. Choose sources that break your pipelines most often or that feed your most critical dashboards. These are the datasets where the cost of not having a contract is already tangible.

2. Start with dbt model contracts. Enable contract.enforced on the mart models that consume these sources. This adds schema enforcement within your own DAG with zero cross-team coordination required. It’s not a full data contract, but it demonstrates value quickly.

3. Document the cost of breakages. Track how much time your team spends firefighting data issues from these sources. “We spent 40 hours last quarter fixing breakages from upstream schema changes in the payments data” is a concrete argument for cross-team contracts.

4. Propose a contract to one upstream team. Start with the team you have the best relationship with. Frame it as a benefit to them: “We’ll stop bugging you about schema changes if we can agree on a contract.” The collaborative model works best here — articulate what you need, let them tell you what they can commit to.

5. Demonstrate value, then expand. Once you have one working cross-team contract, the conversation with other teams is easier. You have a concrete example of what contracts look like, how much effort they require, and what value they provide.

The teams that succeed with data contracts share one trait: they treat the organizational change as the primary work and the tooling as secondary. The YAML is the easy part. Getting two teams to have a structured conversation about data expectations, and then maintaining that agreement over time, is where the actual value is created.

Several forces are increasing adoption pressure in 2026: AI and ML systems require trustworthy data with structural guarantees; data mesh adoption creates demand for contracts as the mechanism for decentralized teams to define their data products; GDPR and CCPA compliance benefits from formal data-sharing agreements; and the platform engineering movement treats contracts-as-code as standard infrastructure.