ServicesAboutNotesContact Get in touch →
EN FR
Note

dlt Environment Setup

Setting up a dlt project from scratch — Python virtual environment, installation, dlt init, and the project scaffold it creates.

Planted
dltdata engineeringetl

Setting up a dlt project involves three steps: Python environment, dlt installation, and project initialization via dlt init. The initialization step creates the directory structure dlt uses for credentials and configuration.

Python Environment

dlt requires Python 3.9 or newer. Before installing anything, it’s worth verifying your version:

Terminal window
python --version
# or
python3 --version

If you need to install Python: the python.org installer works on all platforms. On macOS, Homebrew is a cleaner approach:

Terminal window
brew install python

Virtual Environment

Always create a project-specific virtual environment. dlt has several transitive dependencies, and isolating them prevents version conflicts with other Python projects on your machine:

Terminal window
mkdir dlt-project
cd dlt-project
# Create the virtual environment
python -m venv venv
# Activate it
source venv/bin/activate # macOS / Linux
# or
venv\Scripts\activate # Windows

The activation changes your shell’s Python and pip to the project-local versions. You’ll see (venv) prepended to your prompt as confirmation.

Installing dlt

With the virtual environment active, install dlt:

Terminal window
pip install dlt

dlt separates destination dependencies — the libraries needed to connect to each warehouse — from the base package. This keeps the default install lightweight. Install only what you need:

Terminal window
pip install "dlt[duckdb]" # local development / testing
pip install "dlt[bigquery]" # Google BigQuery
pip install "dlt[snowflake]" # Snowflake
pip install "dlt[redshift]" # Amazon Redshift

For most development work, start with DuckDB. It’s a local file-based database that requires no server and no credentials — you can verify a pipeline works correctly before touching any cloud warehouse.

You can install multiple destinations together:

Terminal window
pip install "dlt[duckdb,bigquery]"

Project Initialization: dlt init

dlt init is the part that most documentation covers too quickly. It’s worth understanding what it creates.

Terminal window
dlt init rest_api duckdb

The two arguments are: the source type and the destination. For REST API-based pipelines you’ll almost always use rest_api. The destination can be changed later without reinitializing.

Running dlt init creates the following:

dlt-project/
├── .dlt/
│ ├── config.toml # non-sensitive configuration (committed)
│ └── secrets.toml # credentials (gitignored)
├── rest_api_pipeline.py # generated pipeline starter file
└── .gitignore # pre-configured to exclude secrets.toml

.dlt/ directory — The configuration home for this project. dlt looks here first when resolving secrets and config values.

secrets.toml — Where credentials go for local development. It’s pre-added to .gitignore automatically. Open it immediately after dlt init and add your API credentials:

[sources.github]
api_key = "your-token-here"

The key path — sources.github.api_key — becomes how you reference this value in code: dlt.secrets["sources.github.api_key"]. See dlt Secrets Management for the full hierarchy and how credentials move from local development to production.

config.toml — Non-sensitive configuration that can be committed. Things like page sizes, base URLs, log levels. Leave it empty until you have settings that genuinely belong here.

rest_api_pipeline.py — A generated starter file with example code showing how to configure and run a REST API source pipeline. Delete the example code and replace it with your own. The file exists as a starting point, not as something you extend in place.

.gitignore — Automatically configured to exclude secrets.toml and a few dlt-specific files. Don’t override this — leaving credentials out of version control is essential.

Verifying the Setup

After dlt init, verify the project is working before writing any pipeline code:

Terminal window
python rest_api_pipeline.py

The generated example code makes a real API call. If it runs without errors, your Python environment, dlt installation, and basic configuration are all working. If it fails, the error message will tell you exactly what’s missing.

What You’ll Build On Top

With the environment in place, the next steps are:

  1. Replace the example code in rest_api_pipeline.py with your actual API configuration — see dlt REST API Source Configuration for how that config works.
  2. Add your API credentials to secrets.toml — see dlt Secrets Management for the naming conventions dlt expects.
  3. Run the pipeline against DuckDB to verify results before pointing at your production warehouse.

The DuckDB-first development pattern is worth emphasizing: you can iterate on pipeline logic, pagination, and schema without touching cloud resources. Once the pipeline produces correct results locally, swapping destination="duckdb" for destination="bigquery" is a one-line change.

Notes on the Generated Pipeline File

The rest_api_pipeline.py file generated by dlt init rest_api duckdb contains a working example using a public API (often the GitHub API or a similar open API). The example is useful as a reference but isn’t structured the way you’d organize a production pipeline. Don’t feel bound by its organization.

A production pipeline typically separates source configuration from execution into different files or modules, especially when you have multiple sources or destinations. But for a single-source pipeline, one file is perfectly reasonable. The key constraint is that credentials never appear in Python files — they live in secrets.toml or environment variables, accessed via dlt.secrets.