Pipeline Enforcement Layer Strategy

No single tool covers the entire pipeline from source to consumption. Data quality validation layers describe what kinds of checks you need (contracts, tests, anomaly detection). Pipeline enforcement layers describe where those checks run. The practical approach is layered defense, where each layer catches what the previous one missed.

The Four Layers

Layer 1: Pre-Warehouse

This is the earliest enforcement point — before data reaches your warehouse at all.

Schema registries enforce compatibility on event streams. When a Kafka producer tries to publish a message that violates the registered schema, the message is rejected. The invalid data never reaches any consumer.

EL tools with contract support reject bad data during extraction and loading. dlt’s schema contract modes (freeze, evolve, discard_rows, discard_columns) provide granular control per entity type. If an API starts returning a new field or changes a field’s type, the pipeline fails before anything reaches the warehouse.

Fivetran and Airbyte offer limited control at this layer. Fivetran provides schema change notifications and a blunt blocking setting. Airbyte’s “Detect and manually approve” mode pauses connections for review on breaking changes but requires human intervention.

The pre-warehouse layer is the most powerful because it prevents bad data from existing in your warehouse at all. It’s also the hardest to adopt, because it requires either controlling the source system (schema registry) or using an EL tool with native contract support (dlt).

Layer 2: Post-Load, Pre-Transformation

Between “data lands in the warehouse” and “dbt builds models on it,” there’s a window for contract verification.

Soda Data Contracts provide a dedicated YAML-based contract engine. You run Soda checks after your EL tool finishes loading and before dbt run starts. If a source table’s schema has drifted or quality rules fail, the pipeline stops before transformation begins.

This layer is particularly valuable when multiple dbt projects consume the same source tables. Instead of each project running redundant source tests, a single Soda verification step validates the data once, at the boundary.

Layer 3: Transformation

This is where most analytics teams already have enforcement, because it’s where dbt operates.

dbt native contracts on base models catch structural mismatches at compile time. If a source column is missing or its type has changed, the model fails to build. Jeremy Cohen recommended putting contracts on base models sitting just above sources as the primary catch point.

dbt-expectations tests on sources validate column sets and value ranges. expect_table_columns_to_match_set catches added or removed columns. expect_column_values_to_be_in_set validates content constraints. These run during dbt build, after data has loaded.

Elementary detects schema changes and anomalies during the dbt run. Schema change detection alerts on deleted tables, added or removed columns, and type changes. Volume and freshness anomaly tests catch statistical deviations from historical baselines.

The transformation layer is reactive — it validates data that’s already in the warehouse. But it’s where most teams start, because it builds on tools you already have (dbt) without requiring additional infrastructure.

Layer 4: Continuous Observability

After models are built and published, continuous monitoring catches drift and degradation in production tables.

Tools like Elementary, Monte Carlo, or Soda Cloud monitor for volume anomalies, freshness issues, distribution shifts, and schema drift on an ongoing basis. This layer operates independently of pipeline runs — it monitors the state of your data warehouse continuously and alerts when something changes.

The fourth layer catches problems that slip through the first three: gradual quality degradation, slow schema drift, seasonality-driven anomalies. It’s the safety net for everything the explicit checks don’t cover.

Where to Start

You don’t need all four layers on day one. Start with the layer closest to where you’re feeling pain.

For most analytics teams, start at Layer 3. You already have dbt. Adding dbt-expectations source tests to your five most critical sources takes an afternoon and catches the most common breakage — column drift, unexpected values, structural changes. Combining that with contracts on base models gives you compile-time protection for free.

If source schema changes are your primary pain point, invest in Layer 1. If you’re evaluating EL tools, factor in schema contract support — dlt’s native contracts are a genuine differentiator compared to Fivetran and Airbyte’s limited options. If you’re locked into Fivetran or Airbyte, skip to Layer 2 with Soda verification.

If multiple projects consume the same sources, add Layer 2. Soda contract verification gives you a centralized gate between loading and transformation, rather than redundant source tests in every project.

Add Layer 4 once the foundation is solid. Continuous observability is most valuable when you have enough coverage in the first three layers that the anomaly detection catches genuine surprises rather than problems you should have anticipated with explicit tests.

CI/CD Integration

The layers tie together through CI/CD. A well-structured pipeline runs stages in order, with each stage gating the next:

Source freshness checks — verify that sources have updated within expected windows
Soda verification (if you use it) — validate source schemas and quality rules
dbt build with contracts and tests — compile-time contract enforcement, then model builds with tests
Elementary anomaly detection — statistical checks against historical baselines

The --empty flag in dbt 1.8+ is particularly useful for CI: dbt build --select state:modified+ --empty --defer --state ./ performs schema-only dry runs without processing data. This keeps CI fast while still validating that contracts are satisfied and models compile correctly. The full data build runs in production; CI validates the contracts.

Relationship to Validation Layers

This note describes where enforcement happens (pre-warehouse, post-load, transformation, observability). The Data Quality Validation Layers note describes what kind of enforcement it is (proactive contracts, reactive tests, anomaly detection). The two frameworks overlay:

Validation Type	Pre-Warehouse	Post-Load	Transformation	Observability
Proactive contracts	Schema registry, dlt freeze	Soda contract checks	dbt model contracts	—
Reactive tests	—	Soda quality rules	dbt-expectations, dbt tests	—
Anomaly detection	—	—	Elementary (during build)	Elementary, Monte Carlo

The insight is that proactive prevention is possible at every pipeline stage, not just at the source. dbt model contracts are proactive within the transformation layer (they prevent bad models from materializing). Soda contracts are proactive at the warehouse boundary (they prevent transformation from starting on bad data). Each layer adds a defense point.

Where This Is Heading

The enforcement landscape is shifting toward write-time validation — rejecting bad data before it’s persisted rather than checking it after it lands. Schema registries already work this way for event streams. dlt does it for batch pipelines. The gap remains in managed EL tools, where Fivetran and Airbyte are slowly adding more granular controls but aren’t there yet.

The organizational dimension applies equally to upstream enforcement. Adding Soda checks or dlt schema contracts is a technical decision. Getting the team that owns the API to participate in contract definitions is an organizational one. The layered approach helps because you can start enforcing where you have control (your dbt project, your EL configuration) and expand outward as contract culture takes hold.