ServicesAboutNotesContact Get in touch →
EN FR
Note

Orchestrator Architectural Philosophies

The three competing mental models in data orchestration — process-oriented (Airflow), data-oriented (Dagster), and function-oriented (Prefect) — and why the abstraction matters more than the feature list.

Planted
dbtdata engineeringautomation

Every orchestrator comparison eventually turns into a feature matrix. Airflow has 90+ provider packages. Dagster has asset lineage. Prefect has dynamic workflows. All three can schedule dbt, retry on failure, and send Slack alerts when something breaks. Feature parity matters less than the underlying mental model — the core abstraction that shapes how you define pipelines, debug failures, and think about your data platform.

A useful heuristic from Branch Boston’s comparison: “Choose the noun that matches your organization’s vocabulary. DAGs for Airflow, flows for Prefect, assets for Dagster.”

Process-Oriented: Airflow

Airflow is built around the process. You write Python scripts that define directed acyclic graphs (DAGs) of tasks. The scheduler picks them up, workers execute them, and the webserver shows you what ran and when. The core model is: describe what tasks to run and in what order. The system tracks task execution history. It doesn’t know or care what data those tasks produce.

# Airflow: you define tasks and their order
from airflow.decorators import dag, task
from datetime import datetime
@dag(schedule="@daily", start_date=datetime(2025, 1, 1))
def my_pipeline():
@task
def extract():
# pull data from API
return raw_data
@task
def transform(data):
# clean and reshape
return clean_data
@task
def load(data):
# write to BigQuery
pass
raw = extract()
clean = transform(raw)
load(clean)

The mental model is imperative: first do this, then do that. When something fails, you see a red task in a DAG. You know which step failed, but piecing together the data impact — which tables are stale, which dashboards are wrong — requires you to maintain that mapping in your head or in documentation.

Airflow 3.0 (2025) added a FastAPI-based API Server and a React UI, but the core model hasn’t changed. The scheduler still thinks in tasks and execution order, not in data products and freshness.

The Airflow Ecosystem Advantage

Where the process-oriented model genuinely wins is ecosystem breadth. With 80,000+ organizations, 3,600+ contributors, and 30M+ monthly PyPI downloads, Airflow has operators and providers for practically every system in existence. If you need to orchestrate across heterogeneous infrastructure — Snowflake, Databricks, Kubernetes, Spark, custom APIs — Airflow’s provider ecosystem has you covered.

The ecosystem isn’t just about convenience. It represents accumulated operational knowledge: retry strategies, connection pooling, error handling patterns that have been battle-tested across thousands of production deployments. A new BigQueryOperator user benefits from years of edge-case handling they didn’t have to discover themselves.

Data-Oriented: Dagster

Dagster flips the model. Instead of defining tasks, you define assets: persistent data objects like BigQuery tables, GCS files, or ML models. Each asset has upstream dependencies, a compute function that produces it, and metadata the system tracks automatically — last materialization time, freshness, health status.

# Dagster: you define data products and their dependencies
from dagster import asset, AssetExecutionContext
@asset
def raw_events(context: AssetExecutionContext):
"""Pull events from API and write to BigQuery."""
data = fetch_from_api()
write_to_bigquery(data, "raw_events")
@asset(deps=[raw_events])
def clean_events(context: AssetExecutionContext):
"""Clean and deduplicate events."""
data = read_from_bigquery("raw_events")
cleaned = deduplicate(data)
write_to_bigquery(cleaned, "clean_events")
@asset(deps=[clean_events])
def daily_metrics(context: AssetExecutionContext):
"""Aggregate to daily metrics for dashboards."""
data = read_from_bigquery("clean_events")
metrics = aggregate_daily(data)
write_to_bigquery(metrics, "daily_metrics")

When you open the Dagster UI, you see your data products and their states, not a list of task executions. You see that daily_metrics was last materialized 2 hours ago, that its upstream clean_events is fresh, and that all asset checks passed. When something fails, you see it in the context of your data lineage — which downstream tables are now stale, which freshness SLAs are at risk.

For analytics engineers who think in tables, models, and freshness, this is Dagster’s core appeal. The vocabulary matches. A dbt model that produces a BigQuery table maps naturally to a Dagster asset. The dagster-dbt integration makes this mapping explicit: one Dagster asset per dbt model, with dependencies from ref() becoming edges in the asset graph.

Function-Oriented: Prefect

Prefect takes a third approach. Python functions decorated with @flow and @task become workflows. There’s no separate DAG definition file, no YAML, no special project structure. Flows can build dependencies dynamically at runtime using plain Python control flow.

# Prefect: plain Python functions become workflows
from prefect import flow, task
@task
def extract_data(source: str) -> dict:
return fetch_from_api(source)
@task
def transform_data(raw: dict) -> dict:
return clean_and_reshape(raw)
@task
def load_data(data: dict, target: str):
write_to_bigquery(data, target)
@flow
def my_pipeline(sources: list[str]):
for source in sources:
raw = extract_data(source)
clean = transform_data(raw)
load_data(clean, f"clean_{source}")

The flow can iterate over a dynamic list of sources, branch conditionally, call sub-flows — anything Python can express. Dependencies build at runtime, not at parse time. Prefect 3.0 (September 2024) rebuilt the engine with transactional semantics and cut overhead by over 90%.

The appeal is simplicity and flexibility. If your pipeline’s shape changes based on input data, Prefect handles this naturally. The cost is that Prefect doesn’t know about your data products the way Dagster does — it tracks flow and task execution, not asset freshness and lineage.

The Convergence Signal

Airflow 3.0 introduced the @asset decorator, a direct nod toward Dagster’s data-oriented model. This is significant because it acknowledges that the process-oriented model has a gap: operators need to know about the data their tasks produce, not just whether the tasks succeeded.

But the asset concept feels bolted on rather than foundational. Airflow’s scheduler, UI, and operational model were designed around task execution. Adding asset awareness as a layer on top is architecturally different from building the entire system around assets from the start. Dagster’s asset graph is the primary navigation model in the UI. Airflow’s asset graph is a secondary view alongside the task-execution-first DAG view.

This matters in practice when debugging. In Dagster, you start from “which asset is stale?” and trace backward. In Airflow, you start from “which task failed?” and trace forward to data impact. For analytics engineers whose primary question is “is my data fresh and correct?”, the data-oriented starting point is more natural.

Choosing the Right Abstraction

The right abstraction depends on how your team thinks about data work:

  • If your team thinks in tables, models, and freshness — if the primary question is “is this data current and correct?” — Dagster’s asset-oriented model matches your vocabulary. Analytics engineers on dbt fall into this category almost by definition.
  • If your team manages diverse infrastructure — if the primary challenge is coordinating tasks across many different systems — Airflow’s process-oriented model and operator ecosystem serve you better.
  • If your team writes Python-first and values flexibility — if your workflows are dynamic, event-driven, and you want minimal framework overhead — Prefect’s function-oriented model gets out of your way.

The worst outcome is choosing an abstraction that fights your team’s natural thinking. An analytics team forced into Airflow’s task model will spend mental energy translating between “which task ran” and “is my data fresh.” A platform team forced into Dagster’s asset model may find it constraining when their primary work is coordinating heterogeneous infrastructure, not tracking data product freshness.