ServicesAboutNotesContact Get in touch →
EN FR
Note

dbt Package Integration Testing

The integration_tests sub-project pattern for testing dbt packages — using seeds as mock data, comparing outputs to expected results, and running the full suite.

Planted
dbttestingdata engineering

You can’t test a dbt package in isolation because it’s designed to be installed inside another project. The package’s models reference source() definitions that need a target schema, and dbt deps needs a parent project to install into. Running dbt run at the package root doesn’t work the way it does in a regular project.

The solution is an integration_tests/ sub-project inside your package repo that installs the parent package as a local dependency. This pattern is used by every Fivetran package, dbt-utils, and virtually every serious community package.

The Sub-Project Structure

dbt-my_package/
├── dbt_project.yml # The package itself
├── macros/
├── models/
└── integration_tests/ # Separate dbt project
├── dbt_project.yml
├── packages.yml # References parent via local: ../
├── profiles.yml # Optional: CI-specific profiles
├── seeds/
│ ├── mock_events.csv # Input data
│ └── expected_daily_summary.csv # Expected output
├── models/
│ └── _schema.yml # Equality tests
└── tests/
└── assert_custom_logic.sql # Singular tests

The critical piece is packages.yml:

integration_tests/packages.yml
packages:
- local: ../

The local: ../ path tells dbt to install the parent directory as a package dependency. When you run dbt deps inside integration_tests/, dbt symlinks to the parent project. Changes to your package macros and models are immediately available without re-running dbt deps.

Configuring the Test Project

The dbt_project.yml for the integration test project configures seeds to land in the right schema — matching the package’s default source schema so models can find their inputs:

integration_tests/dbt_project.yml
name: 'my_package_integration_tests'
seeds:
my_package_integration_tests:
+schema: my_data # Must match the package's default source schema

This is the linchpin. Your package’s source definitions reference var('my_package_schema', 'my_data'). The seeds configuration puts mock data into that same my_data schema. When the package’s models run, source('my_package', 'events') resolves to the seeded CSV data.

If your package has source tables with custom identifiers, you may need seed-level overrides:

seeds:
my_package_integration_tests:
+schema: my_data
mock_events:
+alias: events # Match the source identifier

Seeds as Mock Data

Create CSV files representing the source data your package expects. These are minimal datasets — just enough rows to exercise the transformations and edge cases.

integration_tests/seeds/mock_events.csv
event_id,event_name,event_timestamp,user_id
1,page_view,2024-01-15 10:00:00,user_001
2,page_view,2024-01-15 10:05:00,user_001
3,purchase,2024-01-15 10:10:00,user_001
4,page_view,2024-01-15 11:00:00,user_002
5,page_view,2024-01-16 09:00:00,user_001

Then create expected output CSVs that define what your models should produce:

integration_tests/seeds/expected_daily_summary.csv
event_date,unique_users,total_events
2024-01-15,2,4
2024-01-16,1,1

Keep seeds focused. You don’t need hundreds of rows to validate a transformation. Five to ten rows that cover the happy path plus two or three edge cases (nulls, duplicates, boundary dates) are usually enough.

Comparing Output to Expected Results

The dbt_utils.equality test compares a model’s actual output against a seed with expected values:

integration_tests/models/_schema.yml
models:
- name: my_package__daily_summary
data_tests:
- dbt_utils.equality:
compare_model: ref('expected_daily_summary')

This test fails if there’s any difference between the model output and the expected seed — missing rows, extra rows, or different values. It’s a full diff, not just a row count comparison.

For models with floating-point calculations where exact equality is unrealistic, use dbt_utils.equality with a precision parameter, or write a singular test with a tolerance:

-- integration_tests/tests/assert_revenue_within_tolerance.sql
SELECT *
FROM {{ ref('my_package__revenue_summary') }} actual
JOIN {{ ref('expected_revenue_summary') }} expected
ON actual.date = expected.date
WHERE ABS(actual.total_revenue - expected.total_revenue) > 0.01

Running the Suite

The standard test workflow runs four commands in sequence:

Terminal window
cd integration_tests/
dbt deps # Install the parent package + any dependencies
dbt seed # Load mock CSV data into the warehouse
dbt run # Execute all package models
dbt test # Verify outputs against expectations

Or as a single command:

Terminal window
cd integration_tests/ && dbt deps && dbt seed && dbt run && dbt test

Each step depends on the previous one. dbt seed creates the mock source data. dbt run builds the models using that data. dbt test compares the results to expected outputs.

During development, you’ll iterate frequently on the run + test steps:

Terminal window
cd integration_tests/ && dbt run -s my_package__daily_summary && dbt test -s my_package__daily_summary

Testing Multiple Adapter Configurations

If your package supports multiple warehouses, the integration test project should work against each one. Define profiles for each adapter in profiles.yml or use environment-specific targets:

integration_tests/profiles.yml
integration_tests:
target: postgres
outputs:
postgres:
type: postgres
# ...
snowflake:
type: snowflake
# ...
bigquery:
type: bigquery
# ...

Run against each adapter:

Terminal window
cd integration_tests/
dbt deps && dbt seed --target snowflake && dbt run --target snowflake && dbt test --target snowflake

This is where CI/CD matrix testing becomes valuable — automating runs across every supported adapter and dbt version combination.

Beyond Equality Tests

Equality tests are the foundation, but packages often need additional validation:

Row count tests verify that transformations don’t unexpectedly drop or duplicate rows:

models:
- name: my_package__daily_summary
data_tests:
- dbt_utils.equal_rowcount:
compare_model: ref('expected_daily_summary')

Generic tests on package models add another layer. Apply the standard testing taxonomyunique, not_null, accepted_values — to your package’s models within the integration test project:

models:
- name: my_package__daily_summary
columns:
- name: event_date
data_tests:
- unique
- not_null

Singular tests handle complex validations that can’t be expressed as equality comparisons or generic tests:

-- integration_tests/tests/assert_no_negative_user_counts.sql
SELECT event_date
FROM {{ ref('my_package__daily_summary') }}
WHERE unique_users < 0

Development Workflow

The standard iteration loop:

  1. Write or modify a macro/model in the package
  2. Add or update mock data in integration_tests/seeds/
  3. Add or update expected results
  4. Run dbt seed && dbt run && dbt test
  5. Iterate until tests pass

Each new feature and each bug fix should have a corresponding test. Mock data grows over time but stays manageable because each CSV only needs to cover the transformations it validates.