RESTClient is dlt’s lower-level building block for API extraction. It wraps Python’s requests library and adds pagination, authentication, and retry handling on top. You instantiate a client with configuration, then use paginate() to iterate through API responses within a dlt resource function.
Instantiation
from dlt.sources.helpers.rest_client import RESTClientfrom dlt.sources.helpers.rest_client.paginators import OffsetPaginator
client = RESTClient( base_url="https://api.example.com/v1", headers={"X-API-Version": "2024-01"}, paginator=OffsetPaginator(limit=100))The key parameters:
base_url: The API root URL, shared across all endpoints the client talks toheaders: Default headers sent with every request — API version pins, content type declarations, anything that belongs on all requestsauth: An authentication strategy object (see dlt Authentication Patterns)paginator: How to handle multi-page responses (see dlt Pagination Patterns)data_selector: JSONPath to the actual data in the response — useful when the payload is nested like{"result": {"items": [...]}}
The paginate() Method
paginate() is what does the heavy lifting. Pass it an endpoint path, and it yields pages of data until the API signals there’s no more:
@dlt.resource(write_disposition="merge", primary_key="id")def customers(): for page in client.paginate("/customers"): yield pageEach iteration through the loop gives you the parsed JSON response for one page. The paginator handles all the mechanics of following next-page links, incrementing offsets, or advancing cursors — you just consume the pages.
You can pass per-request parameters directly to paginate():
@dlt.resource(write_disposition="merge", primary_key="id")def orders(updated_since=dlt.sources.incremental("updated_at", initial_value="2024-01-01")): params = {"updated_after": updated_since.last_value} for page in client.paginate("/orders", params=params): yield pageThis makes RESTClient composable with dlt’s incremental loading — you pass the cursor value as a query parameter and let the paginator handle everything else.
Error Handling and Resilience
APIs fail. Rate limits hit at 3am. Networks hiccup mid-sync. dlt handles common failure modes automatically so you don’t have to write retry logic by hand.
For HTTP 429 (rate limit) responses, dlt respects Retry-After headers and implements exponential backoff. The default retries up to 5 times. You can tune the backoff behavior:
client = RESTClient( base_url="https://api.example.com", request_backoff_factor=2, # Exponential backoff multiplier request_max_retry_delay=300 # Maximum seconds between retries)With request_backoff_factor=2, retries happen at 2s, 4s, 8s, 16s, 32s (capped at request_max_retry_delay). For APIs with aggressive rate limiting, setting request_max_retry_delay=300 gives a 5-minute ceiling before giving up.
This automatic retry behavior means your RESTClient pipelines are resilient to transient failures without any additional code. Permanent failures — 401 Unauthorized, 404 Not Found on a bad endpoint — surface immediately without wasting retry attempts.
Building dlt Resources
RESTClient instances are designed to be shared across multiple resources in the same source:
@dlt.sourcedef my_api_source(): client = RESTClient( base_url="https://api.example.com/v1", auth=BearerTokenAuth(token=dlt.secrets["api_token"]), paginator=JSONLinkPaginator(next_url_path="next") )
@dlt.resource(write_disposition="merge", primary_key="id") def customers(): for page in client.paginate("/customers"): yield page
@dlt.resource(write_disposition="merge", primary_key="id") def orders(): for page in client.paginate("/orders"): yield page
return customers, ordersThe client carries the shared configuration — base URL, auth, default headers — while each resource function defines what endpoint to hit and how to yield the data. This pattern keeps credential handling centralized and avoids repeating configuration across every resource.
When RESTClient Shines
RESTClient’s flexibility pays off when the API does something non-standard: a custom auth flow, a pagination scheme that needs stateful logic, or response handling that requires conditional yielding based on response content. For cases where REST API Source runs out of configuration options, RESTClient is the escape hatch that keeps you within the dlt ecosystem rather than forcing a full custom implementation.