dbt-audit-helper Macro Reference

dbt-audit-helper is maintained by dbt Labs, currently at version 0.13.0. It’s Fusion-compatible (require-dbt-version: >=1.2.0, <3.0.0), depends on dbt-utils, and supports Snowflake, BigQuery, Postgres, and Redshift. The hash-based quick macros are limited to Snowflake and BigQuery.

All macros follow a consistent pattern: they accept two relations (or queries) and return a comparison result. The progressive validation workflow explains the order in which to use them.

Schema and Count Macros

compare_relation_columns

Compares schema metadata: column names, ordinal positions, and data types. No data scanning — metadata only.

{{ audit_helper.compare_relation_columns(
    a_relation=ref('mrt__finance__revenue'),
    b_relation=api.Relation.create(
      database='analytics_db',
      schema='production',
      identifier='mrt__finance__revenue'
    )
) }}

Use case: Run first in any comparison workflow. Catches missing columns, type changes, and column reordering before you spend cycles on data comparison.

compare_row_counts

Simple row count comparison between two relations. Added in v0.11.0 (March 2024).

{{ audit_helper.compare_row_counts(
    a_relation=ref('mrt__finance__revenue'),
    b_relation=api.Relation.create(
      database='analytics_db',
      schema='production',
      identifier='mrt__finance__revenue'
    )
) }}

Use case: Quick sanity check after schema validation passes. Mismatched counts point to filtering or join differences.

Quick Hash Macros

quick_are_relations_identical

Uses hash comparison for a fast pass/fail. Snowflake and BigQuery only.

{{ audit_helper.quick_are_relations_identical(
    a_relation=ref('mrt__finance__revenue'),
    b_relation=api.Relation.create(
      database='analytics_db',
      schema='production',
      identifier='mrt__finance__revenue'
    )
) }}

Use case: If it passes, you’re done — no deeper comparison needed. If it fails, move to compare_relations.

There’s also quick_are_queries_identical for comparing raw SQL SELECT statements.

Row-Level Comparison Macros

compare_relations

The core macro. Performs row-by-row validation using a UNION-based approach.

{{ audit_helper.compare_relations(
    a_relation=ref('mrt__finance__revenue'),
    b_relation=api.Relation.create(
      database='analytics_db',
      schema='production',
      identifier='mrt__finance__revenue'
    ),
    exclude_columns=["_fivetran_synced"],
    primary_key="revenue__id",
    summarize=true
) }}

Parameters:

exclude_columns — list of columns to skip (useful for ETL metadata columns)
primary_key — the join key for row matching
summarize — true (default) for counts/percentages; false for raw row data with in_a/in_b flags

Output with summarize=true:

True/True: rows in both relations (matches)
True/False: rows only in A
False/True: rows only in B

compare_queries

Identical to compare_relations but accepts raw SQL SELECT statements. Use it when you need to filter, rename, or recast columns before comparison. Particularly useful for handling floating-point precision issues:

{{ audit_helper.compare_queries(
    a_query="SELECT revenue__id, ROUND(revenue__amount, 2) AS revenue__amount FROM " ~ ref('mrt__finance__revenue'),
    b_query="SELECT revenue__id, ROUND(revenue__amount, 2) AS revenue__amount FROM analytics_db.production.mrt__finance__revenue",
    primary_key="revenue__id"
) }}

Column-Level Macros

compare_which_relation_columns_differ

Identifies which columns are causing row-level mismatches. Added in v0.12.0 (June 2024).

{{ audit_helper.compare_which_relation_columns_differ(
    a_relation=ref('mrt__finance__revenue'),
    b_relation=api.Relation.create(
      database='analytics_db',
      schema='production',
      identifier='mrt__finance__revenue'
    ),
    primary_key_columns=['revenue__id']
) }}

Use case: Run after compare_relations shows mismatches but before manually checking columns one by one.

compare_column_values

Deep-dives into a single column’s values, joined on a primary key.

{{ audit_helper.compare_column_values(
    a_query="SELECT revenue__id, revenue__amount FROM " ~ ref('mrt__finance__revenue'),
    b_query="SELECT revenue__id, revenue__amount FROM analytics_db.production.mrt__finance__revenue",
    primary_key="revenue__id",
    column_to_compare="revenue__amount"
) }}

Output categories: perfect match, both null, missing from A, missing from B, null in A only, null in B only, values do not match.

Gotcha: The .print_table() method is not compatible with dbt Cloud. Use {{ log(result, info=true) }} as a workaround.

compare_all_columns

Compares all columns across two relations with a per-column summary: perfect_match, null_in_a, null_in_b, missing_from_a, missing_from_b, and conflicting_values.

{{ audit_helper.compare_all_columns(
    a_relation=ref('mrt__finance__revenue'),
    b_relation=api.Relation.create(
      database='analytics_db',
      schema='production',
      identifier='mrt__finance__revenue'
    ),
    primary_key_columns=['revenue__id']
) }}

This is the macro best suited for CI integration. Create a test that filters on conflicts:

models:
  - name: mrt__finance__revenue
    data_tests:
      - dbt_utils.expression_is_true:
          expression: "conflicting_values = 0"
          where: "1=1"

compare_and_classify_relation_rows

Added in v0.12.0, marked “v0, subject to change.” Classifies each row as identical, modified, added, or removed.

{{ audit_helper.compare_and_classify_relation_rows(
    a_relation=ref('mrt__finance__revenue'),
    b_relation=api.Relation.create(
      database='analytics_db',
      schema='production',
      identifier='mrt__finance__revenue'
    ),
    primary_key_columns=['revenue__id'],
    columns=['revenue__amount', 'revenue__currency', 'revenue__booked_at'],
    sample_limit=20
) }}

The sample_limit parameter (default 20) caps sample rows per classification. The query variant is compare_and_classify_query_results.

Limitations

Primary key requirement. Most macros need a unique, non-null primary key. Make sure models pass unique and not_null tests before running comparisons. Without a reliable key, join-based macros produce misleading results.

Performance on large tables. The macros use FULL OUTER JOIN and UNION ALL under the hood — full table scans. Use quick_are_relations_identical first, filter with WHERE clauses, or compare date-bounded subsets before running full comparisons.

One model at a time. Unlike tools like Datafold that automatically discover affected models, audit-helper requires manual setup per model. Fine for targeted validation, tedious for migrations spanning dozens of models.

dbt Cloud compatibility. The .print_table() method doesn’t work in dbt Cloud’s IDE. Use {{ log(result, info=true) }} instead.