ServicesAboutNotesContact Get in touch →
EN FR
Note

dlt RESTClient Mechanics

How dlt's RESTClient works — instantiation, the paginate() method, key parameters, and built-in error handling with retry and backoff.

Planted
dltdata engineeringetl

RESTClient is dlt’s lower-level building block for API extraction. It wraps Python’s requests library and adds pagination, authentication, and retry handling on top. You instantiate a client with configuration, then use paginate() to iterate through API responses within a dlt resource function.

Instantiation

from dlt.sources.helpers.rest_client import RESTClient
from dlt.sources.helpers.rest_client.paginators import OffsetPaginator
client = RESTClient(
base_url="https://api.example.com/v1",
headers={"X-API-Version": "2024-01"},
paginator=OffsetPaginator(limit=100)
)

The key parameters:

  • base_url: The API root URL, shared across all endpoints the client talks to
  • headers: Default headers sent with every request — API version pins, content type declarations, anything that belongs on all requests
  • auth: An authentication strategy object (see dlt Authentication Patterns)
  • paginator: How to handle multi-page responses (see dlt Pagination Patterns)
  • data_selector: JSONPath to the actual data in the response — useful when the payload is nested like {"result": {"items": [...]}}

The paginate() Method

paginate() is what does the heavy lifting. Pass it an endpoint path, and it yields pages of data until the API signals there’s no more:

@dlt.resource(write_disposition="merge", primary_key="id")
def customers():
for page in client.paginate("/customers"):
yield page

Each iteration through the loop gives you the parsed JSON response for one page. The paginator handles all the mechanics of following next-page links, incrementing offsets, or advancing cursors — you just consume the pages.

You can pass per-request parameters directly to paginate():

@dlt.resource(write_disposition="merge", primary_key="id")
def orders(updated_since=dlt.sources.incremental("updated_at", initial_value="2024-01-01")):
params = {"updated_after": updated_since.last_value}
for page in client.paginate("/orders", params=params):
yield page

This makes RESTClient composable with dlt’s incremental loading — you pass the cursor value as a query parameter and let the paginator handle everything else.

Error Handling and Resilience

APIs fail. Rate limits hit at 3am. Networks hiccup mid-sync. dlt handles common failure modes automatically so you don’t have to write retry logic by hand.

For HTTP 429 (rate limit) responses, dlt respects Retry-After headers and implements exponential backoff. The default retries up to 5 times. You can tune the backoff behavior:

client = RESTClient(
base_url="https://api.example.com",
request_backoff_factor=2, # Exponential backoff multiplier
request_max_retry_delay=300 # Maximum seconds between retries
)

With request_backoff_factor=2, retries happen at 2s, 4s, 8s, 16s, 32s (capped at request_max_retry_delay). For APIs with aggressive rate limiting, setting request_max_retry_delay=300 gives a 5-minute ceiling before giving up.

This automatic retry behavior means your RESTClient pipelines are resilient to transient failures without any additional code. Permanent failures — 401 Unauthorized, 404 Not Found on a bad endpoint — surface immediately without wasting retry attempts.

Building dlt Resources

RESTClient instances are designed to be shared across multiple resources in the same source:

@dlt.source
def my_api_source():
client = RESTClient(
base_url="https://api.example.com/v1",
auth=BearerTokenAuth(token=dlt.secrets["api_token"]),
paginator=JSONLinkPaginator(next_url_path="next")
)
@dlt.resource(write_disposition="merge", primary_key="id")
def customers():
for page in client.paginate("/customers"):
yield page
@dlt.resource(write_disposition="merge", primary_key="id")
def orders():
for page in client.paginate("/orders"):
yield page
return customers, orders

The client carries the shared configuration — base URL, auth, default headers — while each resource function defines what endpoint to hit and how to yield the data. This pattern keeps credential handling centralized and avoids repeating configuration across every resource.

When RESTClient Shines

RESTClient’s flexibility pays off when the API does something non-standard: a custom auth flow, a pagination scheme that needs stateful logic, or response handling that requires conditional yielding based on response content. For cases where REST API Source runs out of configuration options, RESTClient is the escape hatch that keeps you within the dlt ecosystem rather than forcing a full custom implementation.