ServicesAboutNotesContact Get in touch →
EN FR
Note

GA4 BigQuery Number Discrepancies

Why your BigQuery session and user counts won't match the GA4 interface, and the practical approach to handling the 1-5% variance.

Planted
ga4bigqueryanalyticsdata quality

Expect 1-5% variance between BigQuery queries and GA4 interface reports. This is normal, architecturally inevitable, and shouldn’t trigger extensive reconciliation efforts. Understanding the specific causes helps you set appropriate expectations with stakeholders and avoid wasting time chasing perfect agreement between two systems that are designed to produce different numbers.

Probabilistic Counting (HyperLogLog++)

The GA4 interface uses HyperLogLog++ for counting unique users and sessions. This is a probabilistic algorithm that trades perfect accuracy for speed and memory efficiency. The error rate is typically under 1-2% for large cardinalities.

BigQuery, when you write COUNT(DISTINCT user_pseudo_id), returns exact counts. It deduplicates every value in memory. The result is precise but more expensive to compute.

The practical impact: on a metric of 1 million unique users, HLL++ might report 995,000 or 1,005,000. BigQuery reports exactly 1,000,000 (or whatever the true count is). Both are correct within their respective systems’ definitions.

If you want to mimic GA4’s counting behavior in BigQuery for comparison purposes, use APPROX_COUNT_DISTINCT():

SELECT
APPROX_COUNT_DISTINCT(user_pseudo_id) AS approx_users,
COUNT(DISTINCT user_pseudo_id) AS exact_users
FROM sessionized_events
WHERE event_date = '2026-01-15'

The approximate count will be closer to the GA4 number, though not identical because GA4 uses different precision parameters internally.

This is typically the largest source of discrepancy. GA4’s Consent Mode allows users to reject tracking. For users who reject consent, GA4 models their likely behavior based on patterns observed from consenting users. It estimates conversions, engagement, and session counts for the non-consenting population and includes these modeled numbers in the interface.

This modeled data never reaches BigQuery exports. You only get events from users who consented to tracking.

The gap depends on your consent rejection rate. A European site with a strict cookie banner might see 30-40% consent rejection, meaning the GA4 interface reports 30-40% more sessions and users than BigQuery can account for. That’s not an error — it’s the intended design. GA4 shows a model of total traffic; BigQuery shows observed traffic.

There’s no way to reconcile this gap from the BigQuery side. You don’t have the data that GA4 modeled. Document this for stakeholders: “BigQuery reports consented traffic only. GA4 includes modeled estimates for non-consenting users.”

Google Signals Cross-Device Deduplication

Google Signals enables cross-device user deduplication in the GA4 interface. When a user is signed into their Google account on both their phone and laptop, GA4 can recognize them as one person and deduplicate their sessions accordingly.

BigQuery sees each device as a separate user_pseudo_id. The same person on phone and desktop counts as two users in your BigQuery data. There’s no connection between their device identifiers unless they authenticate in your application and you capture a user_id.

This means GA4 may report fewer unique users than BigQuery for the same time period. The difference is proportional to how many users are signed into Google and use multiple devices — which varies significantly by audience.

Data Processing Delays

GA4 updates export tables for up to 72 hours after the event date. Events arrive late due to network delays, batched mobile SDK uploads, and server-side processing queues. The GA4 interface processes some of these late events faster than they appear in BigQuery.

Comparing data less than 72 hours old will show discrepancies that resolve themselves over time. This is the easiest gap to handle: wait 72 hours before comparing.

For your incremental dbt models, this means configuring a lookback window that reprocesses recent partitions:

{% if is_incremental() %}
WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
{% endif %}

A 7-day lookback captures any reasonable late arrival while limiting reprocessing cost.

Thresholding

GA4 applies thresholding when Google Signals is enabled and the user count for a dimension value falls below a privacy threshold. When this kicks in, GA4 removes rows from reports to prevent identification of individual users. The data isn’t lost — it’s just hidden in the interface.

BigQuery doesn’t apply thresholding. You see every event that was collected and exported. This can cause BigQuery to show more detailed breakdowns than the GA4 interface for low-traffic dimensions.

Handling the Variance

  1. Document the expected variance. BigQuery and GA4 will differ by 1-5%, sometimes more on European properties with high consent rejection. This is by design.

  2. Compare data older than 72 hours. Eliminate the processing delay variable before investigating other discrepancies.

  3. Use BigQuery as the source of truth for dashboards. BigQuery gives exact, reproducible, auditable numbers. GA4’s interface gives modeled estimates. Mixing them causes confusion.

  4. Monitor the gap over time. A sudden change in the variance (from 3% to 15%, for example) signals a configuration change — likely a consent banner update, a Google Signals toggle, or a tracking implementation issue.

  5. Consent Mode behavioral modeling cannot be reconstructed from BigQuery data. The modeled data is not in the export. The gap between BigQuery and GA4 interface numbers on European properties reflects consent rejection rate, not a data quality issue.

Perfect reconciliation is not achievable given the architectural differences between the two systems.