A Software-Defined Asset (SDA) is the central building block of Dagster. It’s a Python function decorated with @dg.asset that produces a persistent data object — a BigQuery table, a GCS file, a Pandas DataFrame, a dbt model. Each asset has a unique key, upstream dependencies inferred from function arguments, and metadata like owners, tags, and freshness policies.
The “software-defined” part means the asset’s definition (what it is, what it depends on, who owns it, how fresh it should be) lives in code alongside the logic that produces it. This is the same principle behind dbt models: the SQL that creates a table and the YAML that describes it live together in version control.
The Basic Pattern
import dagster as dgimport pandas as pd
@dg.asset( group_name="finance", owners=["team:analytics"],)def mrt__finance__daily_revenue( context: dg.AssetExecutionContext, base__stripe__payments: pd.DataFrame,) -> pd.DataFrame: """Daily revenue aggregated from Stripe payments.""" return base__stripe__payments.groupby("payment_date").agg( total_revenue=("amount_usd", "sum") )Three things happen in this definition:
-
The function name becomes the asset key.
mrt__finance__daily_revenueis how this asset is identified everywhere in Dagster — the UI, the lineage graph, schedules, sensors. -
Function arguments declare dependencies. The argument
base__stripe__paymentstells Dagster this asset depends on another asset with that key. Dagster builds the dependency graph automatically — no explicit DAG definition, nodepends_onconfiguration. If you add a new argument, the graph updates. -
Metadata is declarative.
group_nameorganizes assets in the UI.ownersenables filtering by team. The docstring becomes the asset’s description in the asset catalog.
This automatic dependency inference is the mechanism that makes Dagster’s asset-centric model practical. You declare what each asset needs (its function arguments) and what it produces (its return value), and the orchestrator resolves the rest.
The Definitions Object
The Definitions object is the top-level registry that tells Dagster what exists: all your assets, resources, schedules, and sensors. Think of it as the dbt_project.yml equivalent — one Definitions object per code location, and Dagster reads it to build the asset graph.
import dagster as dg
defs = dg.Definitions( assets=[mrt__finance__daily_revenue, base__stripe__payments], resources={ "bigquery": BigQueryResource(project="my-gcp-project"), "dbt": DbtCliResource(project_dir="./transform"), }, schedules=[daily_schedule], sensors=[fivetran_complete_sensor],)Everything that Dagster knows about your project comes from what you register in Definitions. Assets not listed here don’t appear in the UI. Resources not listed here can’t be injected into assets. This is the single source of truth for your Dagster code location.
For dbt projects, the dagster-dbt integration generates assets from your manifest automatically, and those generated assets get registered in Definitions alongside any Python assets you define.
How SDAs Differ from Airflow Tasks
The comparison clarifies what SDAs provide that traditional orchestrator primitives don’t:
| Airflow Task | Dagster SDA | |
|---|---|---|
| Defines | An operation to execute | A data object to produce |
| Dependencies | Explicit >> or set_downstream() | Inferred from function arguments |
| Identity | Task ID within a DAG | Global asset key across all code |
| State tracking | Execution status (success/fail/running) | Materialization history, freshness, health |
| Metadata | Limited (custom XCom) | Owners, tags, descriptions, freshness policies |
| Re-execution | Re-run the task | Re-materialize the asset (and optionally downstreams) |
The key difference is that an Airflow task is ephemeral — it runs and produces a side effect. A Dagster SDA is persistent — it represents a data object that exists in the world, and Dagster tracks that object’s state over time.
Asset Metadata and Configuration
SDAs support rich metadata that drives behavior in the UI and orchestration layer:
@dg.asset( group_name="marketing", owners=["team:growth", "adrienne@example.com"], tags={"priority": "high", "domain": "marketing"}, description="Campaign performance aggregated daily from Google Ads and Meta Ads.", automation_condition=dg.AutomationCondition.eager(),)def mrt__marketing__campaign_performance( context: dg.AssetExecutionContext, int__google_ads__daily_spend: pd.DataFrame, int__meta_ads__daily_spend: pd.DataFrame,) -> pd.DataFrame: ...ownersshows up in the asset catalog and lineage views. Filter by owner to see “which assets does the growth team own?” Useful for on-call routing and accountability.tagsenable filtering and grouping. Tag assets by domain, priority, or any dimension your team finds useful.automation_conditiondeclares when the asset should be rematerialized — on upstream change, on a cron schedule, or on freshness violation. This replaces explicit schedule or sensor definitions for many use cases.group_nameorganizes assets visually. Groups collapse in the lineage graph, keeping the view manageable for large projects.
Multi-Asset Definitions
When multiple assets share the same computation (common in dbt, where dbt build produces many tables at once), Dagster supports multi-asset definitions:
@dg.multi_asset( outs={ "raw__events": dg.AssetOut(), "raw__users": dg.AssetOut(), })def extract_from_api(context): """Extract events and users from the API in one call.""" data = call_api() return data["events"], data["users"]The [[Dagster-dbt Asset Mapping|@dbt_assets decorator]] is a multi-asset definition under the hood. It reads your manifest and produces one asset per dbt model, all from a single function that runs dbt build.
SDAs and dbt Models
For analytics engineers, the most important thing about SDAs is that your dbt models are already SDAs conceptually. Each model produces a table (the asset), declares its dependencies via ref(), and has metadata in schema.yml. The dagster-dbt integration just makes this explicit to the orchestrator.
Where Python SDAs go beyond dbt is in the steps that SQL can’t handle: API calls, ML feature engineering, file processing, external system triggers. A full-stack pipeline might have Python SDAs for ingestion, dbt SDAs for transformation, and more Python SDAs for downstream processing — all in the same dependency graph.
Learning Considerations
SDAs require comfort with Python decorators, type hints, and function signatures. The patterns are consistent: @dg.asset for individual assets, function arguments for dependencies, Definitions for registration. For teams that write only SQL and Jinja, getting fluent with Python project structure is the primary friction. See Dagster Learning Curve for Analytics Engineers.
The Components abstraction moves asset definition to YAML, reducing the Python barrier at the cost of customization flexibility.