ServicesAboutNotesContact Get in touch →
EN FR
Note

dlt REST API Source Configuration

How to configure dlt's declarative REST API Source — the client block, resources block, endpoint paths, pagination wiring, and what dlt does automatically with the data.

Planted
dltdata engineeringetl

The REST API Source is dlt’s declarative path to building API pipelines. Instead of writing Python code that explicitly manages requests and pagination, you pass a configuration dictionary that describes how the API works, and dlt handles execution. The entire pipeline — auth, pagination, schema inference, normalization — runs from that config.

This note covers what goes in that config and what dlt does with it.

The Config Structure

A REST API Source configuration has two top-level keys: client and resources.

from dlt.sources.rest_api import rest_api_source
source = rest_api_source({
"client": { ... }, # shared config: base URL, auth, pagination
"resources": [ ... ], # list of endpoints to extract
})

The client block defines everything that’s shared across all endpoints. The resources block defines each individual endpoint. This separation means you configure authentication and base URL once — not per endpoint.

The Client Block

"client": {
"base_url": "https://api.github.com/",
"paginator": HeaderLinkPaginator(links_next_key="next"),
"auth": {
"type": "bearer",
"token": dlt.secrets["sources.github.api_key"],
},
},

base_url — The root URL. Endpoint paths in the resources block are appended to this.

paginator — The pagination strategy that applies to all endpoints unless overridden at the resource level. Here, HeaderLinkPaginator is used because GitHub’s API returns pagination links in the Link response header. It’s worth specifying this explicitly: if you omit it, dlt tries to auto-detect the paginator, and when only one page of results is returned, that detection may warn about a missing paginator.

auth — Authentication configuration. The type field maps to dlt’s built-in auth classes: "bearer", "api_key", "http_basic", "oauth2_client_credentials". For the full auth option set, see dlt Authentication Patterns. Credentials should always come from dlt.secrets — never hardcoded. See dlt Secrets Management.

You can also pass class instances directly instead of config dictionaries:

from dlt.sources.helpers.rest_client.paginators import HeaderLinkPaginator
from dlt.sources.helpers.rest_client.auth import BearerTokenAuth
"client": {
"base_url": "https://api.github.com/",
"paginator": HeaderLinkPaginator(links_next_key="next"),
"auth": BearerTokenAuth(token=dlt.secrets["sources.github.api_key"]),
},

Both forms work; the dictionary form is more portable for AI-generated configs.

The Resources Block

Each entry in the resources list defines one endpoint:

"resources": [
{
"name": "orgs-pokeapi-repos",
"endpoint": {
"path": "orgs/PokeAPI/repos",
},
},
],

name — Sets the resource name, which becomes the destination table name. dlt normalizes it: orgs-pokeapi-repos becomes orgs_pokeapi_repos in the database.

endpoint.path — Appended to base_url. The full URL for the above example becomes https://api.github.com/orgs/PokeAPI/repos.

Resources can override the client-level paginator and auth on a per-endpoint basis. An endpoint that requires a different pagination style or scoped credentials can specify its own config without affecting the others.

Creating the Pipeline

The source object feeds into a standard dlt pipeline:

pipeline = dlt.pipeline(
pipeline_name="github_pipeline",
destination="duckdb",
dataset_name="github_data",
)
load_info = pipeline.run(github_source.with_resources("orgs-pokeapi-repos"))

with_resources() lets you run a subset of the resources in the source — useful when you want to test one endpoint before running the full extraction.

What Happens Automatically

When the pipeline runs, several things happen that you did not write:

Pagination — dlt follows every page until the API signals there are no more. With HeaderLinkPaginator, it reads the Link header after each response and continues until no rel="next" link appears.

Schema inference — dlt analyzes the response JSON, detects field types, and creates the destination table schema from what it finds. You don’t write CREATE TABLE statements.

Nested JSON normalization — If the API returns nested objects or arrays, dlt flattens them into relational tables. A repositories endpoint that includes a topics array produces two tables: the main orgs_pokeapi_repos table and a child orgs_pokeapi_repos__topics table. The child table has a foreign key back to the parent row. This is automatic — it happens without any configuration.

Metadata tables — dlt creates three system tables in your dataset:

  • _dlt_loads — one row per pipeline run, with status and timing
  • _dlt_pipeline_state — incremental state tracking (see dlt Incremental Loading)
  • _dlt_version — dlt version metadata

Schema evolution — If the API adds a new field after your first load, dlt detects the new column and runs the necessary ALTER TABLE automatically on the next run.

Examining Results with DuckDB

When the destination is DuckDB, the pipeline creates a .duckdb file in the project directory. To inspect it:

Terminal window
duckdb github_pipeline.duckdb

Inside the DuckDB CLI:

-- See all tables in the dataset
SHOW ALL TABLES;
-- Inspect the schema of a specific table
DESCRIBE github_data.orgs_pokeapi_repos;
-- Query the data
SELECT id, name, owner__url, stargazers_count
FROM github_data.orgs_pokeapi_repos
LIMIT 5;

Notice the owner__url column name: nested object fields are flattened with double-underscore separators. A JSON field owner.url becomes owner__url in the flattened table. This convention is consistent across all dlt destinations.

Adding Incremental Loading

To configure incremental loading on an endpoint, add a params block with an incremental type:

{
"name": "pokeapi_repos_commits",
"endpoint": {
"path": "/repos/PokeAPI/{resources.pokeapi_repos.name}/commits",
"params": {
"since": {
"type": "incremental",
"cursor_path": "commit.author.date",
"initial_value": "2024-01-01T00:00:00Z"
}
}
}
}

The since key is the API parameter name that GitHub uses to filter commits by date. cursor_path is a dot-notation path to the timestamp field in the response JSON. initial_value sets the starting point for the first run. On subsequent runs, dlt replaces the initial value with the maximum cursor value from the previous run. See dlt Incremental Loading for the full mechanics of state tracking.

The {resources.pokeapi_repos.name} part in the path above is dependent resources — using one endpoint’s output to configure another. See dlt Dependent Resources for how that works.

The Declarative Advantage

The config dictionary is predictable, serializable, and easy to generate programmatically. This is why REST API Source pairs well with AI-assisted development: an LLM reading API documentation can produce the config dict reliably, whereas writing bespoke RESTClient code requires more reasoning about control flow.

For APIs with non-standard pagination or complex auth flows, REST API Source hits its limits and you switch to RESTClient. But for the majority of REST APIs — standard pagination, bearer token or API key auth, JSON responses — the declarative approach gets you to production faster.