dbt Core Open-Source Fundamentals

dbt Core is the open-source foundation of the dbt ecosystem. It is a command-line tool that compiles SQL models, runs them against your data warehouse, and provides testing, documentation, and dependency management. Everything in dbt Cloud is built on top of Core. Understanding Core means understanding what dbt actually does at its most fundamental level.

What dbt Core Does

dbt (data build tool) sits in the transformation layer of the modern data stack. It does not extract data. It does not load data. It transforms data that is already in your warehouse by executing SQL models in dependency order.

The core workflow:

You write SQL SELECT statements as .sql files (models)
dbt resolves dependencies between models using {{ ref() }} and {{ source() }}
dbt compiles Jinja-templated SQL into raw SQL
dbt executes the compiled SQL against your warehouse
dbt runs tests to validate the output

# The fundamental dbt commands
dbt run          # Execute all models
dbt test         # Run all tests
dbt build        # Run models + tests in dependency order
dbt compile      # Generate compiled SQL without executing
dbt docs generate  # Build documentation site

The dbt build command is what most teams run in production. It executes models and their associated tests in topological order — if model B depends on model A, A runs first, its tests pass, then B runs. A test failure on A prevents B from running, which stops bad data from propagating downstream.

CLI-Driven Development

dbt Core operates entirely through the command line. There is no graphical interface. You write models in your text editor, run commands in your terminal, and review results in your terminal output or compiled artifacts.

This is a feature, not a limitation. CLI-driven development integrates naturally with the tools that software engineers already use:

# Develop locally, test a single model
dbt run --select my_model

# Run a model and everything downstream
dbt run --select my_model+

# Run only models that changed since last commit
dbt run --select state:modified --state ./target

# Full build with fresh sources
dbt source freshness && dbt build

The --select syntax is one of Core’s most powerful features. Node selection lets you target specific models, tags, directories, or graph relationships without modifying configuration files. Combined with state:modified, it enables efficient development loops where you only rebuild what changed.

Version Control Integration

dbt Core projects are plain files on disk: SQL files, YAML configuration, Jinja macros. This means they work natively with Git. Every model change, every test addition, every configuration update flows through the same pull request workflow that software teams use for application code.

my_dbt_project/
├── models/
│   ├── staging/
│   │   └── stg_orders.sql
│   ├── intermediate/
│   │   └── int_orders_enriched.sql
│   └── marts/
│       └── mrt_sales_orders.sql
├── tests/
├── macros/
├── dbt_project.yml
└── profiles.yml

This Git-native structure enables code review for data transformations. A data analyst proposes a model change in a pull request. A colleague reviews the SQL diff, checks the test coverage, and approves. The change merges to main and deploys. The same workflow that prevents bugs in application code prevents data quality regressions in your warehouse.

Tools like Claude Code can accelerate this local development workflow significantly — writing models, generating tests, debugging compilation errors — precisely because dbt Core is just files and a CLI.

The Open-Source Ecosystem

dbt Core’s open-source nature created an ecosystem that is arguably its most durable competitive advantage:

Community packages. Over 200 packages available through dbt Hub. dbt-utils provides utility macros and tests. dbt-expectations ports Great Expectations-style validation. dbt-audit-helper streamlines migration validation. Macros from these packages extend Core’s capabilities without writing custom code.

Warehouse adapters. Community-maintained adapters for BigQuery, Snowflake, Databricks, Redshift, Postgres, DuckDB, and dozens more. The adapter architecture means Core’s transformation logic is warehouse-agnostic — the same model can run on different platforms with adapter-specific compilation.

100,000+ community members. The dbt Slack is one of the most active data communities. Problems get answered quickly. Patterns get shared. The collective knowledge base — blog posts, conference talks, package source code — means you rarely encounter a problem nobody has solved before.

Career portability. dbt appears in nearly every analytics engineering job posting. Skills developed with Core transfer directly because the SQL, Jinja, YAML, and CLI patterns are universal across dbt deployments. Whether a company uses Core or Cloud, the modeling skills are the same.

Local Development Environment

Setting up dbt Core locally requires Python, pip, and a warehouse connection:

# Install dbt with your warehouse adapter
pip install dbt-core dbt-bigquery

# Initialize a new project
dbt init my_project

# Configure warehouse connection
# Edit ~/.dbt/profiles.yml with your credentials

# Verify connection
dbt debug

The profiles.yml file stores warehouse connection details. For local development, this typically uses personal credentials. For production, it uses service accounts or application default credentials.

The local setup means full control over your Python environment. You pin exact versions of dbt-core and adapters. You can run multiple dbt versions for different projects. You can install any Python package alongside dbt. This flexibility is essential for teams that need to integrate dbt with custom Python scripts, data quality tools, or CI/CD systems.

The trade-off is clear: you manage your own environment. Python version conflicts, virtual environment setup, adapter compatibility — these are your responsibility. For engineers comfortable with Python tooling, this is trivial. For analysts whose primary skill is SQL, it can be a barrier.

What dbt Core Does Not Include

Understanding Core means understanding what it deliberately excludes:

No scheduling. Core is a CLI tool. It runs when you invoke it. For production scheduling, you need an external tool: cron, Cloud Scheduler, Airflow, Dagster, or any orchestrator. If you go the self-hosted route, you can deploy dbt Core on a cloud function or use Cloud Run Jobs to handle scheduling.
No web IDE. Development happens in your local editor. VS Code with the dbt Power User extension is the most common setup.
No built-in access control. Multi-user permissions are managed through Git (branch protection, code owners) and your warehouse’s IAM, not through dbt itself.
No managed infrastructure. You containerize, deploy, monitor, and maintain the runtime yourself.

These exclusions are not accidental. They keep Core focused on what it does well — SQL compilation, dependency resolution, testing, documentation — while letting you choose the best external tools for scheduling, hosting, and collaboration.

Team profile

dbt Core suits teams with at least one member comfortable with Python, Git, and CLI tooling, who prefer a code-first approach where every change is version-controlled and code-reviewed. The team either already has an orchestration solution (Airflow, Dagster, cron) or is willing to set one up.

Cost is a common factor. Core is free. For larger teams where dbt Cloud‘s per-user pricing becomes significant — 10+ users at $100–300/month each — self-hosting Core with an open-source orchestrator reduces cost substantially. The Dagster vs dbt Cloud comparison covers this cost dynamic.

The choice is not permanent. Teams can start with Core and move to Cloud when scheduling or collaboration becomes a pain point, or start with Cloud’s free tier and migrate to self-hosted Core as the team grows. Modeling skills (SQL, Jinja, YAML, project structure) transfer completely either way.