dbt Unit Test CI/CD Workflow

This note covers patterns for running dbt unit tests in CI on BigQuery. A naive CI setup wastes money and creates contention from concurrent PR runs; the patterns here address both.

The broader CI/CD strategy covers the full picture (Slim CI, data diffing, linting). This note focuses specifically on the unit test workflow.

The Core Pattern

A production-ready GitHub Actions workflow for BigQuery unit tests:

name: dbt CI

on:
  pull_request:
    branches: [main]

env:
  DBT_PROFILES_DIR: ./
  GOOGLE_APPLICATION_CREDENTIALS: ${{ secrets.GCP_SA_KEY_PATH }}

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dbt
        run: pip install dbt-bigquery

      - name: Set up GCP credentials
        run: echo '${{ secrets.GCP_SA_KEY }}' > /tmp/gcp-key.json
        env:
          GCP_SA_KEY: ${{ secrets.GCP_SA_KEY }}

      - name: Create CI dataset name
        run: |
          echo "CI_DATASET=ci_$(date +'%Y%m%d_%H%M%S')_${GITHUB_SHA::7}" >> $GITHUB_ENV

      - name: Build upstream models (empty)
        run: |
          dbt run --select +test_type:unit --empty --target ci
        env:
          CI_DATASET: ${{ env.CI_DATASET }}

      - name: Run unit tests
        run: |
          dbt test --select test_type:unit --target ci
        env:
          CI_DATASET: ${{ env.CI_DATASET }}

      - name: Cleanup CI dataset
        if: always()
        run: |
          bq rm -r -f ${{ env.CI_DATASET }}

Four design decisions make this workflow production-ready.

Unique Dataset Per CI Run

CI_DATASET=ci_$(date +'%Y%m%d_%H%M%S')_${GITHUB_SHA::7}

Each CI run creates its own BigQuery dataset using a timestamp and the commit SHA. This prevents conflicts when multiple PRs run CI concurrently — a common issue when the team is active and PRs stack up.

Without unique datasets, concurrent CI runs write to the same tables and interfere with each other. Test A builds a model, test B overwrites it with different data, test A’s unit test fails because the schema changed. Unique datasets eliminate this entirely.

The --target ci flag in dbt points to a profiles.yml target that uses the CI_DATASET environment variable as the schema name. Your profiles.yml needs a corresponding entry:

my_project:
  target: dev
  outputs:
    ci:
      type: bigquery
      method: service-account
      project: my-gcp-project
      dataset: "{{ env_var('CI_DATASET') }}"
      threads: 4
      keyfile: /tmp/gcp-key.json

The —empty Flag

dbt run --select +test_type:unit --empty --target ci

This is the single biggest cost optimization. The --empty flag creates tables with correct schemas but zero rows. Unit tests don’t need real upstream data — they use their own mocked inputs. They just need the upstream tables to exist so the SQL compiles.

Without --empty, you’d need to either:

Build all upstream models with real data (expensive, slow)
Maintain a separate CI dataset with pre-built tables (maintenance burden)

With --empty, the build step completes in seconds and consumes minimal BigQuery slots.

Always-Run Cleanup

- name: Cleanup CI dataset
  if: always()
  run: |
    bq rm -r -f ${{ env.CI_DATASET }}

The if: always() ensures cleanup runs even when tests fail. Without it, failed CI runs leave orphaned datasets in BigQuery, and you end up with dozens of ci_20260315_* datasets cluttering your project.

The -r flag removes the dataset recursively (including all tables), and -f forces deletion without confirmation.

Separating Unit Tests from Data Tests

The workflow runs dbt test --select test_type:unit — not dbt test. This is deliberate. Unit tests and data tests serve different purposes and run in different contexts:

Unit tests run in CI on every PR. They use mocked data. They verify logic.
Data tests run in production after models build. They use real data. They verify data health.

Running data tests in CI against an empty dataset is meaningless — there’s no data to validate. Keep the two separate.

Excluding Unit Tests from Production

The flip side: unit tests should never run in production. They use mocked data and add no value there. Exclude them from production builds:

# In your production deployment script
dbt build --exclude-resource-type unit_test

Or set it as an environment variable in your production environment:

export DBT_EXCLUDE_RESOURCE_TYPES=unit_test
dbt build

This creates a clean separation: unit tests gate deployments in CI, data tests monitor health in production. Neither runs where it doesn’t belong.

Cost Considerations for BigQuery

Even with --empty, BigQuery CI runs aren’t free. Each unit test executes a real query. For teams with large test suites, a few additional optimizations help:

Use a dedicated CI reservation with minimal slots. Unit test queries are lightweight — they don’t need the same slot capacity as production workloads. A small reservation (50-100 slots) handles CI runs without competing with production.
Cache the --empty build. If your upstream schemas don’t change often, you can skip the build step on PRs that don’t modify upstream models. Use state:modified+ to selectively rebuild only what changed.
Tag tests by priority. Run tag:critical unit tests on every PR and the full suite on merges to main. This keeps PR feedback fast while still catching issues before release.
Monitor CI dataset costs. Query INFORMATION_SCHEMA.JOBS filtered by the CI service account to track how much your unit test CI runs cost. If it’s growing faster than your test suite, something is inefficient.

Beyond GitHub Actions

The patterns here — unique datasets, --empty builds, separate unit/data test runs, always-clean-up — apply to any CI system. GitLab CI, CircleCI, and Cloud Build all support the same workflow structure. The BigQuery-specific pieces (bq rm, service account auth, dataset naming) stay the same regardless of the CI platform.

For teams using dbt Cloud, the built-in CI job handles some of this automatically. For fine-grained control over unit test execution — separate jobs for unit vs. data tests, custom dataset naming, priority-based test selection — a custom workflow is required.