ServicesAboutNotesContact Get in touch →
EN FR
Note

Dagster Software-Defined Assets

The core building block of Dagster — how @dg.asset works, automatic dependency inference, the Definitions object, and how SDAs differ from traditional orchestrator primitives.

Planted
data engineeringautomation

A Software-Defined Asset (SDA) is the central building block of Dagster. It’s a Python function decorated with @dg.asset that produces a persistent data object — a BigQuery table, a GCS file, a Pandas DataFrame, a dbt model. Each asset has a unique key, upstream dependencies inferred from function arguments, and metadata like owners, tags, and freshness policies.

The “software-defined” part means the asset’s definition (what it is, what it depends on, who owns it, how fresh it should be) lives in code alongside the logic that produces it. This is the same principle behind dbt models: the SQL that creates a table and the YAML that describes it live together in version control.

The Basic Pattern

import dagster as dg
import pandas as pd
@dg.asset(
group_name="finance",
owners=["team:analytics"],
)
def mrt__finance__daily_revenue(
context: dg.AssetExecutionContext,
base__stripe__payments: pd.DataFrame,
) -> pd.DataFrame:
"""Daily revenue aggregated from Stripe payments."""
return base__stripe__payments.groupby("payment_date").agg(
total_revenue=("amount_usd", "sum")
)

Three things happen in this definition:

  1. The function name becomes the asset key. mrt__finance__daily_revenue is how this asset is identified everywhere in Dagster — the UI, the lineage graph, schedules, sensors.

  2. Function arguments declare dependencies. The argument base__stripe__payments tells Dagster this asset depends on another asset with that key. Dagster builds the dependency graph automatically — no explicit DAG definition, no depends_on configuration. If you add a new argument, the graph updates.

  3. Metadata is declarative. group_name organizes assets in the UI. owners enables filtering by team. The docstring becomes the asset’s description in the asset catalog.

This automatic dependency inference is the mechanism that makes Dagster’s asset-centric model practical. You declare what each asset needs (its function arguments) and what it produces (its return value), and the orchestrator resolves the rest.

The Definitions Object

The Definitions object is the top-level registry that tells Dagster what exists: all your assets, resources, schedules, and sensors. Think of it as the dbt_project.yml equivalent — one Definitions object per code location, and Dagster reads it to build the asset graph.

import dagster as dg
defs = dg.Definitions(
assets=[mrt__finance__daily_revenue, base__stripe__payments],
resources={
"bigquery": BigQueryResource(project="my-gcp-project"),
"dbt": DbtCliResource(project_dir="./transform"),
},
schedules=[daily_schedule],
sensors=[fivetran_complete_sensor],
)

Everything that Dagster knows about your project comes from what you register in Definitions. Assets not listed here don’t appear in the UI. Resources not listed here can’t be injected into assets. This is the single source of truth for your Dagster code location.

For dbt projects, the dagster-dbt integration generates assets from your manifest automatically, and those generated assets get registered in Definitions alongside any Python assets you define.

How SDAs Differ from Airflow Tasks

The comparison clarifies what SDAs provide that traditional orchestrator primitives don’t:

Airflow TaskDagster SDA
DefinesAn operation to executeA data object to produce
DependenciesExplicit >> or set_downstream()Inferred from function arguments
IdentityTask ID within a DAGGlobal asset key across all code
State trackingExecution status (success/fail/running)Materialization history, freshness, health
MetadataLimited (custom XCom)Owners, tags, descriptions, freshness policies
Re-executionRe-run the taskRe-materialize the asset (and optionally downstreams)

The key difference is that an Airflow task is ephemeral — it runs and produces a side effect. A Dagster SDA is persistent — it represents a data object that exists in the world, and Dagster tracks that object’s state over time.

Asset Metadata and Configuration

SDAs support rich metadata that drives behavior in the UI and orchestration layer:

@dg.asset(
group_name="marketing",
owners=["team:growth", "adrienne@example.com"],
tags={"priority": "high", "domain": "marketing"},
description="Campaign performance aggregated daily from Google Ads and Meta Ads.",
automation_condition=dg.AutomationCondition.eager(),
)
def mrt__marketing__campaign_performance(
context: dg.AssetExecutionContext,
int__google_ads__daily_spend: pd.DataFrame,
int__meta_ads__daily_spend: pd.DataFrame,
) -> pd.DataFrame:
...
  • owners shows up in the asset catalog and lineage views. Filter by owner to see “which assets does the growth team own?” Useful for on-call routing and accountability.
  • tags enable filtering and grouping. Tag assets by domain, priority, or any dimension your team finds useful.
  • automation_condition declares when the asset should be rematerialized — on upstream change, on a cron schedule, or on freshness violation. This replaces explicit schedule or sensor definitions for many use cases.
  • group_name organizes assets visually. Groups collapse in the lineage graph, keeping the view manageable for large projects.

Multi-Asset Definitions

When multiple assets share the same computation (common in dbt, where dbt build produces many tables at once), Dagster supports multi-asset definitions:

@dg.multi_asset(
outs={
"raw__events": dg.AssetOut(),
"raw__users": dg.AssetOut(),
}
)
def extract_from_api(context):
"""Extract events and users from the API in one call."""
data = call_api()
return data["events"], data["users"]

The [[Dagster-dbt Asset Mapping|@dbt_assets decorator]] is a multi-asset definition under the hood. It reads your manifest and produces one asset per dbt model, all from a single function that runs dbt build.

SDAs and dbt Models

For analytics engineers, the most important thing about SDAs is that your dbt models are already SDAs conceptually. Each model produces a table (the asset), declares its dependencies via ref(), and has metadata in schema.yml. The dagster-dbt integration just makes this explicit to the orchestrator.

Where Python SDAs go beyond dbt is in the steps that SQL can’t handle: API calls, ML feature engineering, file processing, external system triggers. A full-stack pipeline might have Python SDAs for ingestion, dbt SDAs for transformation, and more Python SDAs for downstream processing — all in the same dependency graph.

Learning Considerations

SDAs require comfort with Python decorators, type hints, and function signatures. The patterns are consistent: @dg.asset for individual assets, function arguments for dependencies, Definitions for registration. For teams that write only SQL and Jinja, getting fluent with Python project structure is the primary friction. See Dagster Learning Curve for Analytics Engineers.

The Components abstraction moves asset definition to YAML, reducing the Python barrier at the cost of customization flexibility.