The first question when implementing data contracts is ownership: who defines the contract? The answer determines whether the contract creates meaningful accountability or becomes shelf-ware documentation that nobody maintains.
Three models exist. Each carries different organizational assumptions.
Producer-Defined Contracts
In this model, the data owner — typically the software engineering team that builds and maintains the source system — sets the terms. The producer decides what fields exist, what types they use, what guarantees they’re willing to make, and what their SLA is.
This was Andrew Jones’s original framing at GoCardless. The producer commits to a schema and set of guarantees, and consumers build against that commitment. Changes to the contract go through a versioning process, and breaking changes require explicit coordination.
Strengths:
- The producer understands the data’s provenance and limitations better than anyone
- Guarantees are realistic because the team making them controls the system
- Aligns with how API contracts work in software engineering (the server defines the interface)
Weaknesses:
- Producers don’t know how their data is used downstream — they may commit to the wrong things
- Without consumer input, contracts may guarantee fields nobody cares about while omitting critical ones
- Producers have weak incentive to maintain contracts for data they see as a byproduct of their service
The incentive problem is real. A software engineer building a payments service cares about payments processing correctly. The fact that their database tables are extracted nightly for analytics dashboards is, from their perspective, someone else’s problem. Without organizational pressure, producer-defined contracts tend to be minimal and generic.
Consumer-Defined Contracts
Chad Sanderson’s evolved position, developed from his experience at Convoy, flips the model: consumers articulate their requirements, and producers commit to meeting them. Sanderson’s reasoning: “an application developer will never comprehensively understand how data downstream is being used.”
In this model, the analytics engineering team (or any downstream consumer) registers a contract against a producer’s data. The contract specifies: “I depend on these fields, with these types, at this freshness, and I need to be notified before any breaking changes.” The producer reviews the contract and either accepts the commitment or negotiates adjustments.
Strengths:
- Contracts reflect actual downstream needs rather than the producer’s assumptions
- Creates visibility — producers learn who depends on their data and how
- Consumer requirements are concrete and testable
- Natural alignment with how data mesh data products work
Weaknesses:
- Consumers may request guarantees the producer can’t realistically provide
- Multiple consumers may have conflicting requirements for the same dataset
- Requires a process for negotiation and conflict resolution
Sanderson’s most telling insight from Convoy: the number-one feedback from software engineers was “I had no idea anyone was using my data like this.” The visibility created by consumer-defined contracts was itself valuable, independent of any enforcement mechanism. When a consumer registers a contract against your data, you learn something about your own system that you didn’t know before.
Collaborative Contracts
Most practical implementations land somewhere between pure producer-defined and pure consumer-defined. Consumers articulate what they need. Producers evaluate what they can deliver. The contract emerges from negotiation.
This model looks like:
-
Consumer registers interest. The analytics team says: “We need the
paymentsevent to includeamount,currency,customer_id, andpayment_method, updated within 4 hours, with less than 1% null rate on required fields.” -
Producer reviews. The payments team responds: “We can guarantee
amountandcurrencyare always present.customer_idis populated for 98% of events — the remaining 2% are guest checkouts.payment_methodis being deprecated in v3 of our API in favor of a more detailedpayment_detailsobject. We can commit to a 6-hour SLA, not 4.” -
Negotiation. The contract is adjusted. Maybe the analytics team accepts 98% completeness on
customer_idand adds a quality check for the remaining 2%. Maybe the payments team prioritizes the SLA improvement. The point is that the conversation happens before the pipeline breaks, not after. -
Codification. The agreed terms become an ODCS contract in version control, with CI/CD enforcement.
# The contract that emerges from collaborationschema: - name: customer_id isNullable: true # 2% guest checkout rate acknowledged description: "Customer ID. Null for guest checkouts (~2% of events)."quality: - name: customer_id_completeness rule: "COUNT(CASE WHEN customer_id IS NULL THEN 1 END) / COUNT(*) < 0.05" severity: warningslaProperties: - property: latency value: "6h"Organizational Architecture Patterns
How contracts are stored and enforced varies by stack. Three patterns from real implementations:
GoCardless stores contracts as Jsonnet files in Git, with automated deployment to BigQuery and PubSub. This is a producer-centric architecture where the team that owns the service also owns the contract definition. Contract changes go through the same code review process as service changes.
Convoy used Protobuf with Kafka Schema Registry and Debezium. This architecture embeds contracts in the event streaming layer. Schema changes are validated at publish time — if a producer tries to emit an event that doesn’t match the registered schema, the publish fails. This is consumer-protective: it prevents breaking changes from reaching downstream systems.
PayPal used YAML-based contracts integrated with their Data Mesh implementation. Contracts were part of the data product definition, alongside ownership metadata and quality SLAs. This is the most comprehensive model: each data product has a full contract that covers structure, quality, delivery, and governance.
There’s no single correct architecture. The pattern depends on whether your data flows through event streams (Kafka + Schema Registry is natural), batch extracts (Git-based contracts with CI validation), or both (you’ll likely need contracts at multiple enforcement points).
The Visibility Argument
Regardless of which ownership model you choose, the most immediate value of data contracts is often not enforcement but visibility.
Before contracts, the dependency graph between producers and consumers is invisible. The payments team doesn’t know that the analytics team depends on a specific column. The analytics team doesn’t know that a deprecation is planned. Nobody has a complete picture.
Contracts make these dependencies explicit and machine-readable. Even before you add automated enforcement, the act of writing down “I depend on X from team Y” creates organizational value. It enables impact analysis before changes. It routes data quality alerts to the right people. It turns implicit assumptions into explicit agreements.
The cultural challenge is getting teams to invest in this visibility. The Convoy case study suggests the payoff is fast: once engineers see who depends on their data, their interest in maintaining data quality increases. Contracts make dependencies visible to producers as well as protecting consumers.