Building Custom API Pipelines with dlt

dlt (data load tool) is a Python library for building custom API pipelines. It is pip-installable, requires no containers or orchestration to start, and turns API data into warehouse tables.

This hub maps the concepts needed to build production-quality API pipelines with dlt.

Approach Selection

dlt RESTClient vs REST API Source — The imperative-vs-declarative decision. REST API Source covers standard API patterns; RESTClient handles non-standard auth, pagination, or control flow.

Building Blocks

dlt RESTClient Mechanics — How RESTClient works: instantiation, the paginate() method, key parameters, and built-in retry and backoff handling.

dlt Pagination Patterns — The built-in paginators for common API patterns (JSON link, header link, offset, page number, cursor), and how to extend BasePaginator for non-standard schemes.

dlt Authentication Patterns — Bearer token, API key, HTTP basic, and OAuth2 client credentials, plus how to extend for non-standard auth flows. Pairs with secrets management.

dlt for AI-Assisted Pipeline Development — The declarative REST API Source in practice, BigQuery-specific features, and how dlt’s design makes AI-assisted pipeline development effective.

State and Data Quality

dlt Incremental Loading — How cursor-based incremental loading works in dlt, where state is stored, and how to configure it for both RESTClient and REST API Source approaches.

dlt Secrets Management — The configuration hierarchy that keeps credentials out of code: secrets.toml for local development, environment variables for CI/CD, vault integrations for production.

Testing and Deployment

dlt Pipeline Testing — Testing with DuckDB locally before touching production, schema validation, incremental state testing, and the common failure modes to check for.

dlt Deployment Options — Where to run pipelines: GitHub Actions, Airflow, Cloud Run Jobs, Modal, Dagster, and how the dlt deploy command generates platform-specific scaffolding.

Context

Build vs. Buy Data Pipeline Economics covers the economics behind the managed-vs-custom pipeline decision. Hybrid ELT Strategy describes the portfolio split of managed tools for stable sources and dlt for high-MAR or unsupported APIs.

Once data lands in BigQuery, dbt handles the transformation layer. The Incremental Models in dbt note covers how that works.