ServicesAboutNotesContact Get in touch →
EN FR
Note

dlt: Python-Native Data Loading

A reading path through dlt's core mechanics — from building blocks through BigQuery-specific loading to incremental state tracking.

Planted
dltbigquerydata engineeringetlincremental processing

dlt (data load tool) is a Python library for building ELT pipelines. Pipelines are standard Python scripts — installed with pip, no containers or orchestration server required. The library handles pagination, schema inference, incremental state, and destination-specific loading.

These notes cover dlt’s core mechanics, BigQuery integration, and incremental loading behavior.

Reading Order

  1. dlt Core Concepts — The four building blocks: sources, resources, pipelines, and schemas. Plus the three write dispositions (replace, append, merge) that control how data lands. Start here if you’re new to dlt.

  2. dlt and BigQuery Integration — The BigQuery-specific layer: streaming inserts vs. GCS staging (and why staging almost always wins on cost), bigquery_adapter() for partitioning and clustering, nested JSON normalization into parent-child tables, and the _dlt_ metadata tables dlt creates.

  3. dlt Incremental Loading — How dlt tracks state between runs using dlt.sources.incremental(). Cursor-based tracking, state stored in the destination, declarative REST API config, and how this relates to dbt incremental models downstream.

  4. dlt for AI-Assisted Pipeline Development — Why dlt’s Python-native, declarative design maps well to AI-assisted development. The REST API builder in practice, the AI + dlt workflow, and production results from teams who’ve made the switch.

Decision Context

The build-vs-buy decision framework covers when dlt is the right choice. dlt fits Python-proficient teams who want control, have budget constraints, or need sources without pre-built connectors. It is not suited to non-technical teams, organizations that need 700+ connectors, or teams without capacity to own pipeline infrastructure. See also Fivetran MAR Pricing Shift for the managed ELT pricing context.

Adjacent Reading

  • Build vs. Buy Data Pipelines — The full economics argument for why the managed-vs-custom calculation shifted in 2025.
  • BigQuery Cost Model — Understanding BigQuery’s cost model helps optimize the pipelines you build, especially around streaming vs. batch loading.
  • Incremental Models in dbt — How incremental processing works in the transformation layer, complementing dlt’s extraction-layer incrementality.