dlt Dependent Resources

Many API pipelines require multi-step traversal: fetch a list of entities, then for each entity fetch its related records. dlt has first-class support for this pattern via dependent resources.

The Pattern

A dependent resource uses data returned by a parent resource to construct its own endpoint URL. In the REST API Source config, this is expressed with a path template:

"resources": [
    {
        "name": "pokeapi_repos",
        "endpoint": {
            "path": "orgs/PokeAPI/repos",
        },
    },
    {
        "name": "pokeapi_repos_commits",
        "endpoint": {
            "path": "repos/PokeAPI/{resources.pokeapi_repos.name}/commits",
        },
    },
],

The {resources.pokeapi_repos.name} template reference in the second resource’s path pulls the name field from each row yielded by the pokeapi_repos resource. For every repository returned by the first endpoint, dlt makes a separate API call to the commits endpoint with that repository’s name substituted in.

If pokeapi_repos returns 30 repositories, dlt makes 30 calls to the commits endpoint — one per repo. The resulting rows from all 30 calls land in the pokeapi_repos_commits table.

The Template Syntax

The pattern is {resources.<parent_resource_name>.<field_name>}.

<parent_resource_name> must match the name field of the parent resource exactly.
<field_name> is any field returned in the parent resource’s response.

For nested fields, use dot notation: {resources.parent.owner.login} would access the login field inside the owner object of the parent response.

Why This Matters for Data Quality

Without dependent resources, you’d have two options: fetch only the parent list (losing the detail data), or write a Python loop that manually calls the child endpoint for each parent and accumulates the results. The manual loop approach works, but it gives up dlt’s automatic pagination, schema inference, and incremental loading for the child resource.

With dependent resources, the child endpoint gets the same treatment as the parent: pagination is handled automatically, schema is inferred, and incremental loading can be configured independently.

Combining with Incremental Loading

Dependent resources and incremental loading compose cleanly. You can configure the child resource to track its own cursor independently of the parent:

{
    "name": "pokeapi_repos_commits",
    "endpoint": {
        "path": "/repos/PokeAPI/{resources.pokeapi_repos.name}/commits",
        "params": {
            "since": {
                "type": "incremental",
                "cursor_path": "commit.author.date",
                "initial_value": "2024-01-01T00:00:00Z"
            }
        }
    },
},

Here the GitHub API accepts a since parameter to filter commits by date. dlt tracks the maximum commit.author.date seen across all repositories from the previous run, and passes that as the since parameter on the next run. This means the pipeline only fetches commits newer than its last successful run — for every repository, not just the first one.

One thing to be aware of: the cursor state is shared across all invocations of the child resource. If you have 30 repositories and each has commits from different date ranges, dlt tracks a single maximum date across all of them. This is correct behavior — you want the earliest safe starting point — but it means the cursor reflects the most recent commit seen globally, not per-repository.

Scaling Considerations

Dependent resources multiply your API calls. One parent resource with 100 records means 100 calls to the child endpoint, each potentially paginated across multiple pages. This is usually fine, but it has implications:

Rate limits — APIs commonly rate-limit per time window. 100 parent records × 10 pages per child = 1000 API calls. Budget for this when deciding whether to use incremental loading on the parent to reduce the fan-out.

Pipeline duration — The calls happen sequentially by default. Long-running child requests (APIs with slow response times) multiply the total pipeline time proportionally to parent record count.

API tokens — Rate limits on GitHub’s public API are 5000 requests/hour for authenticated requests. A pipeline with 100 repositories and 10 pages of commits each approaches that limit if run frequently.

Setting up authentication (see dlt Authentication Patterns) is particularly important for dependent resource pipelines because of this call volume. Unauthenticated rate limits are typically much lower.

When to Use Dependent Resources vs. RESTClient

The path template syntax works in the REST API Source declarative config. If you’re using RESTClient directly (the imperative path), you implement the same pattern as an explicit Python loop:

@dlt.resource
def repos():
    for page in client.paginate("orgs/PokeAPI/repos"):
        yield page

@dlt.transformer(data_from=repos)
def commits(repo):
    for page in client.paginate(f"repos/PokeAPI/{repo['name']}/commits"):
        yield page

The @dlt.transformer decorator is the RESTClient equivalent of dependent resources — it consumes a parent resource and yields child records. The config-based path template in REST API Source is syntactic sugar for this same pattern, and it’s simpler to write when the API follows standard conventions.

For most multi-step API traversals, REST API Source’s path template syntax is the right choice. Switch to RESTClient’s transformer pattern when you need conditional logic in the child resource, dynamic endpoint construction that goes beyond simple field interpolation, or error handling per parent record.

See dlt REST API Source Configuration for the full picture of how dependent resources fit into the broader REST API Source config.