ServicesAboutNotesContact Get in touch →
EN FR
Note

Open Data Contract Standard

ODCS v3.1.0 under the Linux Foundation's Bitol project — what it covers, how it compares to the Data Contract Specification, and where harmonization stands.

Planted
dbtdata qualitydata engineering

Two specifications dominate the data contracts space: the Open Data Contract Standard (ODCS) and the Data Contract Specification (datacontract.com). ODCS, now at v3.1.0 under the Linux Foundation’s Bitol project, has emerged as the de facto standard. Both are open, both use YAML, and the teams behind them are actively harmonizing.

ODCS Origins

ODCS originated as PayPal’s internal Data Contract Template. PayPal used YAML-based contracts integrated with their Data Mesh implementation to manage data sharing agreements between teams. The specification was open-sourced and donated to the Linux Foundation’s Bitol project, where it continues to evolve under community governance.

The progression from a single company’s internal tool to a Linux Foundation standard mirrors patterns seen in other data infrastructure (Apache Spark from AMPLab, Kubernetes from Google). It signals maturity: the specification is no longer dependent on one organization’s priorities.

What ODCS Covers

A full ODCS contract defines significantly more than column names and types. The specification spans several dimensions:

Schema Definition

The foundational layer — what fields exist, their types, and structural constraints:

schema:
- name: payment_id
logicalType: string
physicalType: VARCHAR(36)
isNullable: false
isPrimaryKey: true
description: "Unique payment identifier (UUID format)"
- name: amount
logicalType: decimal
physicalType: NUMERIC(12,2)
isNullable: false
description: "Payment amount in the currency specified"
- name: currency
logicalType: string
physicalType: VARCHAR(3)
isNullable: false
description: "ISO 4217 currency code"

The dual logicalType / physicalType distinction matters. Logical types are platform-agnostic (string, decimal, timestamp). Physical types are warehouse-specific (VARCHAR, NUMERIC, INT64). This separation means one contract can describe data that lands in BigQuery, Snowflake, or Redshift with appropriate type mapping.

Data Quality Rules

Contracts can embed quality expectations directly:

quality:
- name: amount_is_positive
type: custom
dimension: accuracy
rule: "amount > 0"
severity: error
- name: currency_is_valid
type: custom
dimension: validity
rule: "currency IN ('USD', 'EUR', 'GBP', 'CAD', 'AUD')"
severity: warning
- name: null_rate_below_threshold
type: custom
dimension: completeness
rule: "COUNT(CASE WHEN email IS NULL THEN 1 END) / COUNT(*) < 0.05"
severity: warning

These rules are declarative — they describe what should be true, not how to check it. Tooling like Soda or Data Contract CLI translates them into executable checks against your actual data.

SLAs and Delivery Guarantees

slaProperties:
- property: latency
value: "4h"
description: "Data available within 4 hours of event occurrence"
- property: availability
value: "99.9%"
description: "Target uptime for the data product"
- property: freshness
value: "1h"
description: "Maximum age of most recent record"

SLAs are where contracts move beyond what dbt handles. A dbt contract can enforce that a model has the right columns and types. It cannot promise that the data will be available within four hours or that the pipeline has 99.9% uptime. Those commitments require organizational backing, monitoring infrastructure, and incident response processes.

Ownership and Governance

team:
- name: seller-platform
role: producer
contact: seller-platform@company.com
- name: analytics-engineering
role: consumer
contact: analytics@company.com
contractCreatedTs: "2025-09-15T10:00:00Z"
contractStatus: active

Ownership metadata answers the question that’s otherwise resolved by Slack archaeology: “who do I talk to when this data breaks?” Making ownership explicit in a machine-readable format enables automated routing of data quality alerts to the right team.

Custom Properties

ODCS supports arbitrary custom properties for organization-specific needs:

customProperties:
- property: costCenter
value: "SELLER-001"
- property: gdprClassification
value: "contains_pii"
- property: retentionPeriod
value: "7 years"

This extensibility prevents the “we need a field the spec doesn’t support” problem that kills adoption of rigid standards.

The Data Contract Specification (datacontract.com)

The alternative specification, developed by the team behind Data Contract CLI (from Entropy Data), follows similar conventions. It also uses YAML, covers schema, quality, and metadata, and targets the same use cases.

The key difference is provenance and governance: ODCS is under the Linux Foundation with formal governance, while the Data Contract Specification is community-maintained by the Entropy Data team.

In practical terms, the differences are shrinking. The Data Contract CLI has switched to ODCS as its default format. The Entropy Data team is explicit about the direction: “There are no fundamental or conceptual differences between these two major formats. Both are open standards, use YAML, and specify data sets in a similar way. We are striving for harmonization.”

This isn’t the XKCD competing-standards problem. The ecosystem is converging on ODCS, with the Data Contract Specification acting more as an alternative syntax than a competing philosophy.

ODCS vs. dbt Model Contracts

dbt’s native model contracts (available since Core v1.5) and ODCS operate at different scopes:

Dimensiondbt Model ContractsODCS
ScopeSchema of a single dbt modelFull data product lifecycle
Schema enforcementColumn names, types, constraintsColumn names, types, logical/physical typing
Quality rulesNot included (use dbt tests separately)Embedded in the contract
SLAsNot includedLatency, availability, freshness
OwnershipNot included (use meta as workaround)First-class team section
Custom metadataVia meta blockVia customProperties
Enforcement pointdbt build timeVaries by tooling
GovernancePer-model config in YAMLStandalone versioned document

dbt contracts are schema contracts for the transformation layer. ODCS is a comprehensive data product agreement. They complement each other: use dbt contracts to enforce schema within your DAG, and ODCS to define the broader agreement (SLAs, ownership, quality rules) that spans the full pipeline.

For teams already using dbt, the practical path is: start with dbt’s native model contracts on your mart models, then adopt ODCS when you need to formalize agreements with teams outside your dbt project — software engineering teams producing source data, ML teams consuming your outputs, or external partners receiving data products.