Contract-First Development in dbt

Most teams adopt dbt model contracts by retrofitting them onto existing models: the model exists, and someone writes YAML to describe it. An alternative approach for new models intended to serve downstream consumers is to define the contract before writing any SQL.

The API Design Analogy

Software engineers don’t build an API endpoint and then figure out what the response schema should be. They define the interface — what fields the response will have, what types they’ll be, what constraints apply — agree on it with the clients who’ll consume it, and then implement it. The interface definition comes first; the implementation follows.

Data models can work the same way. The question isn’t “what does my SQL produce?” — it’s “what do consumers need?” Defining the contract first means starting with that question and making the SQL satisfy the answer rather than the other way around.

The Contract-First Workflow

Step 1: Agree on the interface. This is a conversation between the team building the model and the teams consuming it. What columns does the consumer need? What types? What constraints? This happens before any code is written. The outcome is agreement, not YAML.

Step 2: Write the YAML contract. Translate the agreement into a dbt model definition with contract: {enforced: true}, column declarations with types, and any constraints:

models:
  - name: mrt__analytics__customers
    access: public
    config:
      materialized: table
      contract:
        enforced: true
    columns:
      - name: customer__id
        data_type: integer
        constraints:
          - type: not_null
          - type: primary_key
      - name: customer__name
        data_type: text
      - name: customer__lifetime_value
        data_type: numeric(38,2)
      - name: customer__segment
        data_type: text
      - name: customer__is_active
        data_type: boolean

Step 3: Implement the model. Write SQL that satisfies the contract. If your query returns a column with the wrong type, or returns columns the YAML doesn’t declare, dbt compile tells you immediately:

Compilation Error in model mrt__analytics__customers
  This model has an enforced contract that failed.

| column_name              | definition_type | contract_type | mismatch_reason    |
| ------------------------ | --------------- | ------------- | ------------------ |
| customer__lifetime_value | TEXT            | NUMERIC(38,2) | data type mismatch |

The compile step is the feedback loop. You’re not running the full build and checking test output — you’re compiling, seeing exactly what doesn’t match, and fixing it. CI validates the rest.

Step 4: Mark the contract as the canonical interface. Once the model builds cleanly, set access: public and communicate the version to consumers. They can start referencing ref('your_project', 'mrt__analytics__customers') knowing the contract guarantees the shape.

Why This Order Matters

The retrofit approach — write the model, add the contract to describe it — treats the contract as documentation of what already exists. The contract reflects implementation decisions made without consumer input.

Contract-first inverts the dependency. Consumers express their needs. The contract captures those needs. The SQL exists to satisfy the contract. This is a meaningful difference in practice:

Columns that consumers don’t need don’t end up in the contract, which keeps marts focused
Type decisions are made consciously rather than defaulting to whatever SQL returns
Breaking changes require explicit renegotiation, because the contract was the original agreement
Multiple downstream consumers can review the proposed YAML and flag problems before implementation begins

The friction cost is upfront coordination, which feels slower. The payoff is that the model’s interface is deliberately designed rather than accidentally documented.

ODCS + Data Contract CLI Integration

The Open Data Contract Standard (ODCS) takes this further. You can define a contract using the ODCS YAML specification — which captures SLAs, ownership metadata, and quality rules beyond what dbt handles — and then generate dbt model YAML from it using the Data Contract CLI:

datacontract export --format dbt

The generated dbt YAML is a subset of the broader contract, focused on the structural guarantees dbt can enforce at compile time. The broader ODCS contract lives as a standalone document covering the full data product agreement: who owns it, what SLAs apply, what quality rules are embedded.

This bridges two levels of contract:

Organizational level: the ODCS contract, agreed to by producers and consumers, covering the full data product lifecycle
Technical level: the dbt model contract, enforced at compile time, covering structural guarantees

The workflow: define the ODCS contract (with or without tooling, a YAML file is enough), run datacontract export --format dbt to generate the initial column declarations, then add the implementation SQL. Changes to the organizational contract can be re-exported to keep the dbt YAML in sync.

For teams that don’t need ODCS’s organizational metadata yet, the contract-first approach doesn’t require it. You just write the YAML yourself, agree on it with consumers before writing SQL, and use dbt compile as the implementation feedback loop.

Practical Starting Point

The contract-first approach is most valuable for:

New models that will be access: public
Models in a dbt Mesh setup where cross-project consumers exist from day one
High-stakes tables (executive dashboards, partner data feeds) where interface stability matters immediately

For internal models that nobody else depends on, the overhead of formal contract-first design isn’t justified. The governance investment scales with how many consumers depend on the interface.

The rollout strategy for existing models describes how to retrofit contracts on already-built models. Contract-first is the forward-looking version of that work — the approach that makes retrofitting unnecessary.