ServicesAboutNotesContact Get in touch →
EN FR
Note

Dagster Freshness Policies and Scheduling

How Dagster tracks asset freshness rather than just execution timestamps, and how to schedule dbt runs using cron schedules, sensors, and automation conditions.

Planted
dbtdata engineeringautomation

Dagster tracks asset freshness, not just execution timestamps. An execution timestamp records when a pipeline ran; freshness tracks whether the data is current enough for its consumers. A pipeline that ran 30 minutes ago but processed stale upstream data is recent by execution time but potentially hours behind on data freshness.

Cloud Run Jobs and Cloud Workflows trigger jobs on time or events but don’t track whether the resulting data meets freshness requirements. Dagster tracks this per asset.

Freshness Policies via dbt Meta

You define freshness expectations directly in your dbt project’s meta section. The Dagster integration reads these from the manifest, so your freshness requirements live alongside your model definitions:

models:
- name: mrt__marketing__daily_spend
meta:
dagster:
freshness_policy:
maximum_lag_minutes: 360 # Must be less than 6 hours old

This tells Dagster that mrt__marketing__daily_spend should never be more than 6 hours stale. If it is, the UI shows a freshness alert, and you can configure notifications through Dagster+ alerts.

The freshness policy is asset-level, not pipeline-level. You can set different freshness requirements for different models based on their business importance:

  • Real-time dashboards (maximum_lag_minutes: 60): Mart models feeding executive dashboards that stakeholders check frequently.
  • Daily reporting (maximum_lag_minutes: 1440): Standard analytics tables where overnight refresh is sufficient.
  • Weekly aggregations (maximum_lag_minutes: 10080): Historical rollups where weekly freshness is acceptable.

Freshness policies also propagate through the asset graph. If a mart model depends on an intermediate model that’s stale, the mart’s freshness status reflects the upstream staleness. You don’t need to set policies on every model — set them on the models that have explicit freshness requirements, and Dagster infers the implications upstream.

Schedule-Based Orchestration

For cron-based scheduling, build_schedule_from_dbt_selection creates a Dagster schedule directly from dbt selection syntax:

from dagster_dbt import build_schedule_from_dbt_selection
daily_dbt_schedule = build_schedule_from_dbt_selection(
[my_dbt_assets],
job_name="daily_dbt_job",
cron_schedule="0 6 * * *",
dbt_select="tag:daily",
)

This runs all dbt models tagged daily at 6 AM. The dbt selection syntax is the same you’d use with dbt run --select tag:daily, so your existing model tagging strategy translates directly.

You can define multiple schedules for different cadences:

hourly_schedule = build_schedule_from_dbt_selection(
[my_dbt_assets],
job_name="hourly_dbt_job",
cron_schedule="0 * * * *",
dbt_select="tag:hourly",
)
daily_schedule = build_schedule_from_dbt_selection(
[my_dbt_assets],
job_name="daily_dbt_job",
cron_schedule="0 6 * * *",
dbt_select="tag:daily",
)

This gives you the same scheduling flexibility as dbt Cloud’s job scheduler, but integrated with the broader Dagster orchestration layer.

Event-Driven Execution with Sensors

Schedules run on fixed cadences. Sensors run in response to events. For data pipelines, event-driven execution is often more appropriate: run dbt when data is ready, not when the clock says to.

A sensor can watch for an upstream event — a Fivetran sync completing, a file landing in GCS, a dlt pipeline finishing — and trigger dbt only when fresh data is available:

from dagster import sensor, RunRequest
@sensor(job=my_dbt_job)
def fivetran_complete_sensor(context):
# Check if Fivetran sync completed since last check
if fivetran_sync_completed():
yield RunRequest(run_key=f"fivetran-{context.cursor}")

The practical benefit is eliminating the common failure mode of schedule-based orchestration: running dbt at 6 AM and hoping the upstream load finished in time. With a sensor, dbt runs only when upstream data is confirmed present. No more empty incremental runs or partial data in downstream tables.

Common Sensor Patterns

Cloud Storage trigger. Watch for files landing in a GCS or S3 bucket. When a CSV or Parquet file arrives, trigger the dbt models that transform that source.

API completion trigger. Poll an ingestion tool’s API (Fivetran, Airbyte, dlt) for sync completion. Trigger dbt when the sync succeeds; alert and skip when it fails.

Cross-asset trigger. Materialize downstream dbt models only after upstream Python assets complete. This is the sensor pattern that makes Dagster’s unified asset graph practical — Python extraction assets and dbt transformation assets coordinate without shared state or external glue.

Automation Conditions

Beyond fixed schedules and explicit sensors, Dagster supports automation conditions (also called auto-materialize policies) that declaratively express when an asset should be refreshed:

  • On upstream change. Rematerialize when any upstream asset has new data. This is the asset-graph-aware version of event-driven execution.
  • On freshness violation. Rematerialize when the freshness policy would be violated if the asset isn’t refreshed soon.
  • On cron schedule. Periodic refresh, similar to a schedule but defined at the asset level rather than the job level.

Automation conditions are more composable than schedules or sensors because they attach to individual assets rather than groups of assets. A mart model can declare “rematerialize when upstream changes” while a base model declares “rematerialize on a cron schedule.” Dagster resolves the dependencies automatically.

Comparison with Other Orchestration Approaches

The freshness and scheduling capabilities in Dagster address a gap that simpler orchestration tools leave open:

CapabilityCloud Run JobsCloud WorkflowsCloud ComposerDagster
Cron schedulingCloud SchedulerCloud SchedulerAirflow schedulerBuilt-in
Event-driven triggersEventarcEventarcSensors (Airflow)Sensors
Freshness trackingNoneNoneLimited (SLAs)Native per-asset
Cross-system depsManualSequential stepsDAG-levelAsset-level
Cost (monthly)< $5< $10$300-400+$10+ (Dagster+)

The GCP orchestration decision framework covers the Cloud Run/Workflows/Composer tradeoffs in detail. Dagster fits differently in the landscape: it’s not a GCP-native service, but it provides asset-level freshness tracking and cross-system dependency management that GCP-native tools don’t.

For teams whose orchestration needs are purely “run dbt on a cron schedule,” Cloud Run Jobs is simpler and cheaper. Dagster’s scheduling and freshness capabilities apply when event-driven coordination across multiple systems is needed, or when data freshness must be tracked and alerted on per asset.