ServicesAboutNotesContact Get in touch →
EN FR
Note

Data Contract Ownership Models

Producer-defined vs consumer-defined data contracts — why who writes the contract determines whether the initiative succeeds.

Planted
dbtdata qualitydata engineering

The first question when implementing data contracts is ownership: who defines the contract? The answer determines whether the contract creates meaningful accountability or becomes shelf-ware documentation that nobody maintains.

Three models exist. Each carries different organizational assumptions.

Producer-Defined Contracts

In this model, the data owner — typically the software engineering team that builds and maintains the source system — sets the terms. The producer decides what fields exist, what types they use, what guarantees they’re willing to make, and what their SLA is.

This was Andrew Jones’s original framing at GoCardless. The producer commits to a schema and set of guarantees, and consumers build against that commitment. Changes to the contract go through a versioning process, and breaking changes require explicit coordination.

Strengths:

  • The producer understands the data’s provenance and limitations better than anyone
  • Guarantees are realistic because the team making them controls the system
  • Aligns with how API contracts work in software engineering (the server defines the interface)

Weaknesses:

  • Producers don’t know how their data is used downstream — they may commit to the wrong things
  • Without consumer input, contracts may guarantee fields nobody cares about while omitting critical ones
  • Producers have weak incentive to maintain contracts for data they see as a byproduct of their service

The incentive problem is real. A software engineer building a payments service cares about payments processing correctly. The fact that their database tables are extracted nightly for analytics dashboards is, from their perspective, someone else’s problem. Without organizational pressure, producer-defined contracts tend to be minimal and generic.

Consumer-Defined Contracts

Chad Sanderson’s evolved position, developed from his experience at Convoy, flips the model: consumers articulate their requirements, and producers commit to meeting them. Sanderson’s reasoning: “an application developer will never comprehensively understand how data downstream is being used.”

In this model, the analytics engineering team (or any downstream consumer) registers a contract against a producer’s data. The contract specifies: “I depend on these fields, with these types, at this freshness, and I need to be notified before any breaking changes.” The producer reviews the contract and either accepts the commitment or negotiates adjustments.

Strengths:

  • Contracts reflect actual downstream needs rather than the producer’s assumptions
  • Creates visibility — producers learn who depends on their data and how
  • Consumer requirements are concrete and testable
  • Natural alignment with how data mesh data products work

Weaknesses:

  • Consumers may request guarantees the producer can’t realistically provide
  • Multiple consumers may have conflicting requirements for the same dataset
  • Requires a process for negotiation and conflict resolution

Sanderson’s most telling insight from Convoy: the number-one feedback from software engineers was “I had no idea anyone was using my data like this.” The visibility created by consumer-defined contracts was itself valuable, independent of any enforcement mechanism. When a consumer registers a contract against your data, you learn something about your own system that you didn’t know before.

Collaborative Contracts

Most practical implementations land somewhere between pure producer-defined and pure consumer-defined. Consumers articulate what they need. Producers evaluate what they can deliver. The contract emerges from negotiation.

This model looks like:

  1. Consumer registers interest. The analytics team says: “We need the payments event to include amount, currency, customer_id, and payment_method, updated within 4 hours, with less than 1% null rate on required fields.”

  2. Producer reviews. The payments team responds: “We can guarantee amount and currency are always present. customer_id is populated for 98% of events — the remaining 2% are guest checkouts. payment_method is being deprecated in v3 of our API in favor of a more detailed payment_details object. We can commit to a 6-hour SLA, not 4.”

  3. Negotiation. The contract is adjusted. Maybe the analytics team accepts 98% completeness on customer_id and adds a quality check for the remaining 2%. Maybe the payments team prioritizes the SLA improvement. The point is that the conversation happens before the pipeline breaks, not after.

  4. Codification. The agreed terms become an ODCS contract in version control, with CI/CD enforcement.

# The contract that emerges from collaboration
schema:
- name: customer_id
isNullable: true # 2% guest checkout rate acknowledged
description: "Customer ID. Null for guest checkouts (~2% of events)."
quality:
- name: customer_id_completeness
rule: "COUNT(CASE WHEN customer_id IS NULL THEN 1 END) / COUNT(*) < 0.05"
severity: warning
slaProperties:
- property: latency
value: "6h"

Organizational Architecture Patterns

How contracts are stored and enforced varies by stack. Three patterns from real implementations:

GoCardless stores contracts as Jsonnet files in Git, with automated deployment to BigQuery and PubSub. This is a producer-centric architecture where the team that owns the service also owns the contract definition. Contract changes go through the same code review process as service changes.

Convoy used Protobuf with Kafka Schema Registry and Debezium. This architecture embeds contracts in the event streaming layer. Schema changes are validated at publish time — if a producer tries to emit an event that doesn’t match the registered schema, the publish fails. This is consumer-protective: it prevents breaking changes from reaching downstream systems.

PayPal used YAML-based contracts integrated with their Data Mesh implementation. Contracts were part of the data product definition, alongside ownership metadata and quality SLAs. This is the most comprehensive model: each data product has a full contract that covers structure, quality, delivery, and governance.

There’s no single correct architecture. The pattern depends on whether your data flows through event streams (Kafka + Schema Registry is natural), batch extracts (Git-based contracts with CI validation), or both (you’ll likely need contracts at multiple enforcement points).

The Visibility Argument

Regardless of which ownership model you choose, the most immediate value of data contracts is often not enforcement but visibility.

Before contracts, the dependency graph between producers and consumers is invisible. The payments team doesn’t know that the analytics team depends on a specific column. The analytics team doesn’t know that a deprecation is planned. Nobody has a complete picture.

Contracts make these dependencies explicit and machine-readable. Even before you add automated enforcement, the act of writing down “I depend on X from team Y” creates organizational value. It enables impact analysis before changes. It routes data quality alerts to the right people. It turns implicit assumptions into explicit agreements.

The cultural challenge is getting teams to invest in this visibility. The Convoy case study suggests the payoff is fast: once engineers see who depends on their data, their interest in maintaining data quality increases. Contracts make dependencies visible to producers as well as protecting consumers.