ServicesAboutNotesContact Get in touch →
EN FR
Note

GA4 Schema Evolution Monitoring

GA4's BigQuery schema changes without announcements and new fields are never retroactive. How to detect additions before they break production queries.

Planted
ga4bigqueryanalyticsdata engineeringdata quality

GA4’s BigQuery export schema is a moving target. Google adds new fields without announcements, publishes no official changelog, and never retroactively populates historical tables with fields added after their export date. If you treat the schema as stable, you’ll miss new capabilities — and potentially break models when unexpected fields appear.

The Historical Pattern

The schema has grown significantly since the GA4 BigQuery export launched around 2019 under the “App+Web” name, inheriting its structure from Firebase’s export. Major additions by approximate timeframe:

TimeframeAddition
March 2020ecommerce RECORD
June 2021privacy_info for Consent Mode
May 2023collected_traffic_source for event-level attribution
July 2023is_active_user field
October 2023items.item_params nested RECORD
July 2024Batch sequencing fields, session_traffic_source_last_click

The July 2024 additions — batch sequencing fields (batch_event_index, batch_ordering_id, batch_page_id) and session_traffic_source_last_click — represent the most significant structural changes to the export schema in GA4’s history. Teams who had built sessionization and attribution models before this date needed to update their logic to take advantage of the new fields. Teams who weren’t monitoring the schema missed them entirely.

The Non-Retroactive Rule

Every new field is absent from historical tables. session_traffic_source_last_click is null for every row exported before July 2024. collected_traffic_source is null before May 2023. This is not a data quality issue — it’s a fundamental property of date-sharded tables.

The practical implications:

Attribution models need conditional logic when they span the addition date of a field. A query over the last 2 years using session_traffic_source_last_click for session attribution will produce correct results for data from July 2024 forward and nulls for everything before. The traffic source fields guide covers the conditional logic pattern for handling this gap.

Backfilling is impossible for fields that didn’t exist at export time. There’s no retroactive GA4 export that adds session_traffic_source_last_click to January 2024 tables. Your historical data will always have this gap.

Schema tests that check for field presence need to account for the introduction date. A test that session_traffic_source_last_click IS NOT NULL will fail on all historical data before July 2024.

Monitoring for Schema Changes

Since there’s no official changelog, practitioners must monitor INFORMATION_SCHEMA to catch additions.

Querying INFORMATION_SCHEMA for Column Changes

-- Check current columns in the events table schema
SELECT
column_name,
data_type,
is_nullable
FROM `project.analytics_123456789.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS`
WHERE table_name LIKE 'events_%'
AND table_name NOT LIKE 'events_intraday_%'
ORDER BY table_name DESC, ordinal_position
LIMIT 500

Run this against your most recent table and compare against a snapshot from 30 days ago. Any new rows in the output indicate schema additions.

A Snapshot-Based Monitoring Pattern

In your data platform, maintain a small reference table of known schema columns and their introduction dates:

CREATE TABLE `project.analytics_meta.ga4_schema_log` (
column_name STRING,
first_seen_date DATE,
data_type STRING,
notes STRING
);

A scheduled query or dbt model can check for columns present in recent tables but absent from this log — signaling a new field to evaluate.

What to Do When You Find a New Field

When INFORMATION_SCHEMA reveals a new column:

  1. Document it: When was it first present? What table suffix? Update your internal schema reference.
  2. Evaluate it: Is this field useful for your use cases? Does it replace or improve on existing logic?
  3. Test historical availability: Run the INFORMATION_SCHEMA query against older tables to find the introduction date, so your queries can handle the null-before-introduction case.
  4. Update models if valuable: If the field improves an existing model (like session_traffic_source_last_click improving attribution), plan a model update with appropriate conditional logic for the historical transition period.

The dbt Source Validation Angle

dbt Source Schema Validation can catch when GA4 adds a column that your base model doesn’t expose, but only if you’ve configured your source with columns in the YAML. The opposite direction — a field you expect being absent from a historical table — requires a different check.

A custom test that validates field availability for a given date range:

-- This should return 0 rows (no nulls for post-July-2024 data)
SELECT COUNT(*) AS null_count
FROM {{ ref('base__ga4__events') }}
WHERE event_date >= '2024-07-01'
AND session_traffic_source_last_click IS NULL
AND event_name = 'session_start'

Pairing this with a reasonable threshold (some nulls are expected even in post-July-2024 data due to edge cases) and alerting when it exceeds expectations gives you signal if the field stops populating — a potential sign of a GA4 linking configuration change.

Treating the Schema as Infrastructure

GA4’s schema is versioned infrastructure that changes on Google’s timeline without announcements. A quarterly review of INFORMATION_SCHEMA changes, combined with monitoring GA4 release notes, surfaces new fields before they are discovered through broken models or missed in production pipelines.