ServicesAboutNotesContact Get in touch →
EN FR
Note

Floating-Point Precision in Data Comparison

Why exact equality fails for floating-point values in data comparison, and practical strategies for handling precision mismatches.

Planted
dbtdata qualitytesting

Every data comparison tool that uses exact equality will produce false mismatches on floating-point values. This is a fundamental property of how computers represent decimal numbers, not a bug in the tool. The workarounds apply during refactoring, migration validation, and CI testing.

The Problem

dbt-audit-helper, EXCEPT queries, and most comparison approaches use exact equality (=) to determine whether two values match. There’s no built-in epsilon tolerance for floating-point numbers. A value of 99.9999999999 and 100.0000000001 are treated as a mismatch even when they’re functionally identical for any business purpose.

This surfaces in predictable scenarios:

  • Calculated fields where the order of arithmetic operations differs between two systems (multiplication before division vs. division before multiplication can produce different floating-point results)
  • Aggregations where SUM or AVG operates on rows in a different order (floating-point addition is not commutative due to precision loss at each step)
  • Cross-platform comparisons where one system uses FLOAT64 and another uses NUMERIC/DECIMAL with different internal representations
  • ETL migrations where the old system rounds at step 3 and the new system rounds at step 5, producing values that differ by a fraction of a cent

The Workaround: Cast Before Comparing

The most reliable approach is to round or cast to a specific precision before comparison. With dbt-audit-helper, use compare_queries instead of compare_relations so you can control the SQL:

{{ audit_helper.compare_queries(
a_query="SELECT revenue__id, ROUND(revenue__amount, 2) AS revenue__amount FROM " ~ ref('mrt__finance__revenue'),
b_query="SELECT revenue__id, ROUND(revenue__amount, 2) AS revenue__amount FROM analytics_db.production.mrt__finance__revenue",
primary_key="revenue__id"
) }}

The ROUND(..., 2) ensures both sides are compared at the same precision. Choose the precision that matches your business requirements — two decimal places for currency, four for rates, zero for counts.

For manual EXCEPT queries, the same principle applies:

SELECT
id,
ROUND(amount, 2) AS amount,
ROUND(tax_rate, 4) AS tax_rate
FROM old_system.orders
EXCEPT DISTINCT
SELECT
id,
ROUND(amount, 2) AS amount,
ROUND(tax_rate, 4) AS tax_rate
FROM new_system.orders

When Rounding Isn’t Enough

Rounding handles most cases, but some situations require more thought.

Accumulated precision drift. When a value is the result of many sequential calculations (running totals, compounding interest, multi-step allocations), precision drift can exceed simple rounding thresholds. In these cases, compare the intermediate calculation steps, not just the final result. If the intermediate values match at each stage, the final difference is purely a floating-point artifact.

Different data types. If one system stores values as FLOAT64 and the other as NUMERIC(38,9), the representations differ fundamentally. NUMERIC types use fixed-point decimal arithmetic and don’t suffer from the same precision issues as floating-point types. When migrating between type systems, explicitly CAST both sides to the same type before comparing:

CAST(revenue__amount AS NUMERIC) AS revenue__amount

NaN and Infinity. Some warehouses (BigQuery, Snowflake) support NaN and Infinity as special float values. These don’t compare equal to themselves in standard SQL (NaN != NaN). If your data might contain these values, handle them explicitly:

CASE
WHEN IS_NAN(amount) THEN NULL
WHEN IS_INF(amount) THEN NULL
ELSE ROUND(amount, 2)
END AS amount

Documenting Acceptable Differences

When migrating systems, perfect numerical equivalence is sometimes impossible. The old system and the new system apply operations in different orders, use different intermediate types, or round at different stages. The resulting differences may be technically correct in both systems.

The right response is not to suppress the comparison. It’s to document the known differences explicitly. Create a migration validation log listing each model, its match percentage, the nature of any differences, and whether those differences are accepted. A 99.9997% match where the remaining differences are all sub-penny floating-point artifacts is a pass. An undocumented 99.9997% match is a risk — you don’t know if those differences are precision artifacts or real bugs until you investigate.

This documentation also protects you post-migration. When someone asks “why did this number change by $0.01 compared to the old system,” you have a documented answer instead of a debugging session.

Platform-Specific Notes

BigQuery: Uses FLOAT64 (IEEE 754 double precision) and NUMERIC (38 digits, 9 decimal places). Prefer NUMERIC for financial calculations. BIGNUMERIC extends to 76 digits with 38 decimal places for extreme precision needs.

Snowflake: NUMBER type is exact (up to 38 digits). FLOAT is IEEE 754 double precision. Snowflake’s NUMBER with scale 0 is an integer; with scale > 0 it’s fixed-point decimal.

Redshift: DECIMAL/NUMERIC is exact. FLOAT/REAL/DOUBLE PRECISION are IEEE 754. The APPROXIMATE aggregate functions use different algorithms than exact aggregates and will produce different results.

For cross-platform migrations, always check whether you’re comparing exact types to exact types or floating-point to floating-point. Comparing NUMERIC to FLOAT without explicit casting is the most common source of false mismatches.