dbt-audit-helper is maintained by dbt Labs, currently at version 0.13.0. It’s Fusion-compatible (require-dbt-version: >=1.2.0, <3.0.0), depends on dbt-utils, and supports Snowflake, BigQuery, Postgres, and Redshift. The hash-based quick macros are limited to Snowflake and BigQuery.
All macros follow a consistent pattern: they accept two relations (or queries) and return a comparison result. The progressive validation workflow explains the order in which to use them.
Schema and Count Macros
compare_relation_columns
Compares schema metadata: column names, ordinal positions, and data types. No data scanning — metadata only.
{{ audit_helper.compare_relation_columns( a_relation=ref('mrt__finance__revenue'), b_relation=api.Relation.create( database='analytics_db', schema='production', identifier='mrt__finance__revenue' )) }}Use case: Run first in any comparison workflow. Catches missing columns, type changes, and column reordering before you spend cycles on data comparison.
compare_row_counts
Simple row count comparison between two relations. Added in v0.11.0 (March 2024).
{{ audit_helper.compare_row_counts( a_relation=ref('mrt__finance__revenue'), b_relation=api.Relation.create( database='analytics_db', schema='production', identifier='mrt__finance__revenue' )) }}Use case: Quick sanity check after schema validation passes. Mismatched counts point to filtering or join differences.
Quick Hash Macros
quick_are_relations_identical
Uses hash comparison for a fast pass/fail. Snowflake and BigQuery only.
{{ audit_helper.quick_are_relations_identical( a_relation=ref('mrt__finance__revenue'), b_relation=api.Relation.create( database='analytics_db', schema='production', identifier='mrt__finance__revenue' )) }}Use case: If it passes, you’re done — no deeper comparison needed. If it fails, move to compare_relations.
There’s also quick_are_queries_identical for comparing raw SQL SELECT statements.
Row-Level Comparison Macros
compare_relations
The core macro. Performs row-by-row validation using a UNION-based approach.
{{ audit_helper.compare_relations( a_relation=ref('mrt__finance__revenue'), b_relation=api.Relation.create( database='analytics_db', schema='production', identifier='mrt__finance__revenue' ), exclude_columns=["_fivetran_synced"], primary_key="revenue__id", summarize=true) }}Parameters:
exclude_columns— list of columns to skip (useful for ETL metadata columns)primary_key— the join key for row matchingsummarize—true(default) for counts/percentages;falsefor raw row data within_a/in_bflags
Output with summarize=true:
- True/True: rows in both relations (matches)
- True/False: rows only in A
- False/True: rows only in B
compare_queries
Identical to compare_relations but accepts raw SQL SELECT statements. Use it when you need to filter, rename, or recast columns before comparison. Particularly useful for handling floating-point precision issues:
{{ audit_helper.compare_queries( a_query="SELECT revenue__id, ROUND(revenue__amount, 2) AS revenue__amount FROM " ~ ref('mrt__finance__revenue'), b_query="SELECT revenue__id, ROUND(revenue__amount, 2) AS revenue__amount FROM analytics_db.production.mrt__finance__revenue", primary_key="revenue__id") }}Column-Level Macros
compare_which_relation_columns_differ
Identifies which columns are causing row-level mismatches. Added in v0.12.0 (June 2024).
{{ audit_helper.compare_which_relation_columns_differ( a_relation=ref('mrt__finance__revenue'), b_relation=api.Relation.create( database='analytics_db', schema='production', identifier='mrt__finance__revenue' ), primary_key_columns=['revenue__id']) }}Use case: Run after compare_relations shows mismatches but before manually checking columns one by one.
compare_column_values
Deep-dives into a single column’s values, joined on a primary key.
{{ audit_helper.compare_column_values( a_query="SELECT revenue__id, revenue__amount FROM " ~ ref('mrt__finance__revenue'), b_query="SELECT revenue__id, revenue__amount FROM analytics_db.production.mrt__finance__revenue", primary_key="revenue__id", column_to_compare="revenue__amount") }}Output categories: perfect match, both null, missing from A, missing from B, null in A only, null in B only, values do not match.
Gotcha: The .print_table() method is not compatible with dbt Cloud. Use {{ log(result, info=true) }} as a workaround.
compare_all_columns
Compares all columns across two relations with a per-column summary: perfect_match, null_in_a, null_in_b, missing_from_a, missing_from_b, and conflicting_values.
{{ audit_helper.compare_all_columns( a_relation=ref('mrt__finance__revenue'), b_relation=api.Relation.create( database='analytics_db', schema='production', identifier='mrt__finance__revenue' ), primary_key_columns=['revenue__id']) }}This is the macro best suited for CI integration. Create a test that filters on conflicts:
models: - name: mrt__finance__revenue data_tests: - dbt_utils.expression_is_true: expression: "conflicting_values = 0" where: "1=1"compare_and_classify_relation_rows
Added in v0.12.0, marked “v0, subject to change.” Classifies each row as identical, modified, added, or removed.
{{ audit_helper.compare_and_classify_relation_rows( a_relation=ref('mrt__finance__revenue'), b_relation=api.Relation.create( database='analytics_db', schema='production', identifier='mrt__finance__revenue' ), primary_key_columns=['revenue__id'], columns=['revenue__amount', 'revenue__currency', 'revenue__booked_at'], sample_limit=20) }}The sample_limit parameter (default 20) caps sample rows per classification. The query variant is compare_and_classify_query_results.
Limitations
Primary key requirement. Most macros need a unique, non-null primary key. Make sure models pass unique and not_null tests before running comparisons. Without a reliable key, join-based macros produce misleading results.
Performance on large tables. The macros use FULL OUTER JOIN and UNION ALL under the hood — full table scans. Use quick_are_relations_identical first, filter with WHERE clauses, or compare date-bounded subsets before running full comparisons.
One model at a time. Unlike tools like Datafold that automatically discover affected models, audit-helper requires manual setup per model. Fine for targeted validation, tedious for migrations spanning dozens of models.
dbt Cloud compatibility. The .print_table() method doesn’t work in dbt Cloud’s IDE. Use {{ log(result, info=true) }} instead.