ServicesAboutNotesContact Get in touch →
EN FR
Note

dlt Pagination Patterns

The built-in paginators dlt provides for common API patterns, and how to extend BasePaginator for APIs that don't follow standard conventions.

Planted
dltdata engineeringetl

dlt provides a paginator system for handling multi-page API responses. You select a paginator matching your API’s pagination style, configure it, and attach it to a RESTClient. The paginate() method handles iteration from there.

Built-In Paginators

dlt ships with paginators for the patterns you’ll encounter in real APIs:

JSONLinkPaginator — For APIs that return the next-page URL inside the JSON body. Common in modern REST APIs.

from dlt.sources.helpers.rest_client.paginators import JSONLinkPaginator
# API returns: {"data": [...], "next": "https://api.example.com/v1/items?cursor=abc123"}
client = RESTClient(
base_url="https://api.example.com/v1",
paginator=JSONLinkPaginator(next_url_path="next")
)

The next_url_path is a JSONPath expression pointing to the next-page URL in the response body.

HeaderLinkPaginator — For APIs that return the next-page URL in the Link response header. GitHub’s API uses this pattern (Link: <https://api.github.com/...>; rel="next"). No configuration needed beyond instantiation — the paginator parses the standard RFC 5988 link header format.

OffsetPaginator — Classic offset/limit pagination. Increments the offset parameter by the page size after each response.

from dlt.sources.helpers.rest_client.paginators import OffsetPaginator
paginator = OffsetPaginator(limit=100)
# Generates: ?limit=100&offset=0, ?limit=100&offset=100, ?limit=100&offset=200...

Stops when the API returns fewer records than the limit (signaling the last page) or when the response contains no records.

PageNumberPaginator — Simple page number increments. For APIs with ?page=1, ?page=2, etc.

JSONResponseCursorPaginator — For APIs that return an opaque cursor token in the response body. The paginator extracts the cursor, passes it as a query parameter on the next request, and stops when no cursor is returned.

Custom Paginators

When none of the built-in paginators fit, extend BasePaginator. You implement two methods:

  • update_state(): Called after each response. Parse the response and extract whatever pagination state you need to carry forward.
  • update_request(): Called before each request. Modify the outgoing request to include the pagination state.
from dlt.sources.helpers.rest_client.paginators import BasePaginator
class ProprietaryHeaderPaginator(BasePaginator):
def __init__(self):
super().__init__()
self._next_token = None
def update_state(self, response):
# Extract pagination token from a custom header
self._next_token = response.headers.get("X-Next-Page-Token")
if not self._next_token:
self._has_next_page = False
def update_request(self, request):
if self._next_token:
request.params["page_token"] = self._next_token

The framework calls these methods in sequence, manages the loop, and handles the stop condition when _has_next_page is False. Your code focuses on the API-specific parsing logic, not the pagination loop itself.

This approach accommodates non-standard patterns such as proprietary formats in response headers, multi-field pagination state, or APIs that signal the last page through a status field in the body rather than an empty results array.

Configuring Paginators in REST API Source

When using REST API Source instead of RESTClient directly, paginators are configured declaratively per endpoint:

{
"name": "orders",
"endpoint": {
"path": "orders",
"paginator": {
"type": "offset",
"limit": 100
}
}
}

The type field maps to the built-in paginators: "json_link", "header_link", "offset", "page_number", "cursor". You can also pass custom paginator instances for endpoints that need non-standard behavior.

Debugging Pagination Issues

The most common pagination problem: you get exactly one page of results when you expect many. This almost always means the paginator isn’t correctly identifying the “more pages” signal.

Debug steps:

  1. Test with a small limit first. Set limit=2 or limit=5 so you can verify pagination kicks in even with a small dataset.
  2. Inspect a raw response. Use requests directly to fetch one page and examine the full response body and headers — find where the next-page signal actually lives.
  3. Check the paginator stop condition. OffsetPaginator stops when len(records) < limit. If your API returns a fixed number of records even on the last page, you need a different paginator type.
  4. Use unique pipeline names when testing variations. dlt stores state per pipeline name — if you’re testing incremental loading alongside pagination, naming conflicts can cause surprising state issues.

When pagination works correctly, the paginate() loop runs silently to completion. When it doesn’t, the loop exits after one iteration and you’ve loaded only the first page. That failure mode is easy to detect and usually easy to fix once you’ve read the actual API response.