ServicesAboutNotesContact Get in touch →
EN FR
Note

EL Tool Schema Contract Modes

How dlt, Fivetran, and Airbyte handle schema changes during extraction and loading — from dlt's granular freeze/evolve/discard modes to Fivetran's blunt blocking settings.

Planted
dltdata qualitydata engineeringetl

EL tools are the first enforcement point for schema contracts. They sit before any transformation layer and can reject non-conforming data before it reaches the warehouse. Their contract capabilities vary significantly.

dlt: Granular Schema Contracts

dlt (data load tool) has the most capable native contract support of any EL tool. You can set behavior per entity type — tables, columns, and data types — with four independent modes:

  • evolve — accepts all changes. New tables, new columns, type changes all flow through without intervention.
  • freeze — rejects changes and raises exceptions. The pipeline fails if the source schema deviates from what’s expected.
  • discard_rows — silently drops non-conforming rows. The pipeline succeeds but only loads data that matches the contract.
  • discard_columns — drops new columns without failing. Existing columns are enforced; unexpected additions are ignored.

The power is in combining these per entity type. A common production configuration:

import dlt
SCHEMA_CONTRACT = {
"tables": "evolve",
"columns": "freeze",
"data_type": "freeze"
}
@dlt.resource(
write_disposition="merge",
table_name="orders",
primary_key="order_id",
schema_contract=SCHEMA_CONTRACT
)
def orders_resource():
yield [{"order_id": 1, "customer_id": 42, "amount": 99.50}]

This configuration lets new tables appear (useful during development or when an API adds new endpoints) but rejects any column or type changes to existing tables. If an API starts returning a new field or changes a field’s type, the pipeline fails before anything reaches the warehouse.

The evolve + freeze combination for tables vs. columns is particularly practical. During development, you want to discover what the source provides. In production, you want stability. By allowing table evolution but freezing column schemas, you get both: new data categories are captured, but existing schemas are protected.

dlt also supports Pydantic model validation for type-safe schema enforcement, which is even stricter:

from pydantic import BaseModel
import dlt
class Order(BaseModel):
order_id: int
customer_id: int
amount: float
status: str
@dlt.resource(columns=Order)
def orders():
yield [{"order_id": 1, "customer_id": 42, "amount": 99.50, "status": "pending"}]

With Pydantic models, type validation happens in Python before the data even reaches the loading stage. A field that arrives as a string when the model expects an integer raises a validation error immediately.

Fivetran: Blunt Settings

Fivetran takes a fundamentally different approach. It auto-detects schema changes — new columns, removed columns, type changes, new tables — and offers three settings:

  • “Allow all new data” — everything flows through, with email notifications about changes.
  • “Allow new columns” — permits column additions but blocks other changes.
  • “Block all new data” — stops syncs entirely on any schema change.

There’s no per-column or per-type granularity. You can’t say “freeze columns but evolve tables” the way you can with dlt. Changes propagate with email notifications, which means you find out after the fact unless you’re on the strictest blocking setting.

The problem with “Block all new data” is that it’s too aggressive for most teams. It stops syncs entirely on any change, including benign ones like a new column being added to the source. If your sync runs at 3 AM and a schema change blocks it, nobody knows until morning dashboards are empty. The option exists, but it’s not practical for teams that need continuous data flow.

For Fivetran users, the realistic posture is “Allow all new data” with email notifications, combined with downstream enforcement in dbt or Soda. You accept that schema changes will land in your warehouse and catch them during transformation.

Airbyte: Manual Approval Mode

Airbyte sits between dlt and Fivetran in terms of granularity. Its “Detect and manually approve” mode pauses connections for review on breaking changes. When a source schema changes in a way Airbyte considers breaking, the sync stops and waits for someone to approve the change in the UI before data flows again.

This is closer to what a contract should do — blocking breaking changes until they’re reviewed. But the manual step creates its own problems:

  • Time-sensitive syncs get delayed. If a schema change happens during business hours and someone approves it quickly, no harm done. If it happens on a Friday evening, the sync is blocked until Monday.
  • Approval fatigue. If schema changes are frequent, the team starts approving without careful review just to keep data flowing. The mechanism erodes when it fires too often.
  • No programmatic override. You can’t build logic that says “approve automatically if the only change is a new column” — every flagged change requires human intervention.

Airbyte’s non-breaking change handling is automatic. New columns are added to the destination, and type promotions (widening a type) proceed without intervention. Only changes Airbyte considers breaking — column removals, type narrowing — trigger the approval workflow.

Practical Implications

The practical reality is that most analytics teams use Fivetran or Airbyte and live with limited upstream control. Schema enforcement happens at the next layer — in dbt via source tests and contracted base models, or in a post-load tool like Soda.

If you’re building new pipelines or have sources with frequent schema changes, dlt’s contract modes are a genuine differentiator. The ability to say “freeze columns, evolve tables, freeze data types” and have the pipeline enforce that automatically — before data reaches the warehouse — is something neither Fivetran nor Airbyte can match.

For existing Fivetran or Airbyte pipelines, the enforcement has to happen downstream. That’s not a failure — it’s the layered defense model. Your EL tool is one enforcement point, and if it can’t handle granular contracts, you compensate at the next layer.

The decision of which EL tool to use involves many factors beyond schema contracts — see Choosing Between Fivetran, Airbyte, and dlt for the full picture. But for teams where upstream schema stability is a genuine pain point, dlt’s contract support is worth factoring into the decision alongside cost, connector coverage, and operational burden.