GA4’s BigQuery export schema is a moving target. Google adds new fields without announcements, publishes no official changelog, and never retroactively populates historical tables with fields added after their export date. If you treat the schema as stable, you’ll miss new capabilities — and potentially break models when unexpected fields appear.
The Historical Pattern
The schema has grown significantly since the GA4 BigQuery export launched around 2019 under the “App+Web” name, inheriting its structure from Firebase’s export. Major additions by approximate timeframe:
| Timeframe | Addition |
|---|---|
| March 2020 | ecommerce RECORD |
| June 2021 | privacy_info for Consent Mode |
| May 2023 | collected_traffic_source for event-level attribution |
| July 2023 | is_active_user field |
| October 2023 | items.item_params nested RECORD |
| July 2024 | Batch sequencing fields, session_traffic_source_last_click |
The July 2024 additions — batch sequencing fields (batch_event_index, batch_ordering_id, batch_page_id) and session_traffic_source_last_click — represent the most significant structural changes to the export schema in GA4’s history. Teams who had built sessionization and attribution models before this date needed to update their logic to take advantage of the new fields. Teams who weren’t monitoring the schema missed them entirely.
The Non-Retroactive Rule
Every new field is absent from historical tables. session_traffic_source_last_click is null for every row exported before July 2024. collected_traffic_source is null before May 2023. This is not a data quality issue — it’s a fundamental property of date-sharded tables.
The practical implications:
Attribution models need conditional logic when they span the addition date of a field. A query over the last 2 years using session_traffic_source_last_click for session attribution will produce correct results for data from July 2024 forward and nulls for everything before. The traffic source fields guide covers the conditional logic pattern for handling this gap.
Backfilling is impossible for fields that didn’t exist at export time. There’s no retroactive GA4 export that adds session_traffic_source_last_click to January 2024 tables. Your historical data will always have this gap.
Schema tests that check for field presence need to account for the introduction date. A test that session_traffic_source_last_click IS NOT NULL will fail on all historical data before July 2024.
Monitoring for Schema Changes
Since there’s no official changelog, practitioners must monitor INFORMATION_SCHEMA to catch additions.
Querying INFORMATION_SCHEMA for Column Changes
-- Check current columns in the events table schemaSELECT column_name, data_type, is_nullableFROM `project.analytics_123456789.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS`WHERE table_name LIKE 'events_%' AND table_name NOT LIKE 'events_intraday_%'ORDER BY table_name DESC, ordinal_positionLIMIT 500Run this against your most recent table and compare against a snapshot from 30 days ago. Any new rows in the output indicate schema additions.
A Snapshot-Based Monitoring Pattern
In your data platform, maintain a small reference table of known schema columns and their introduction dates:
CREATE TABLE `project.analytics_meta.ga4_schema_log` ( column_name STRING, first_seen_date DATE, data_type STRING, notes STRING);A scheduled query or dbt model can check for columns present in recent tables but absent from this log — signaling a new field to evaluate.
What to Do When You Find a New Field
When INFORMATION_SCHEMA reveals a new column:
- Document it: When was it first present? What table suffix? Update your internal schema reference.
- Evaluate it: Is this field useful for your use cases? Does it replace or improve on existing logic?
- Test historical availability: Run the INFORMATION_SCHEMA query against older tables to find the introduction date, so your queries can handle the null-before-introduction case.
- Update models if valuable: If the field improves an existing model (like
session_traffic_source_last_clickimproving attribution), plan a model update with appropriate conditional logic for the historical transition period.
The dbt Source Validation Angle
dbt Source Schema Validation can catch when GA4 adds a column that your base model doesn’t expose, but only if you’ve configured your source with columns in the YAML. The opposite direction — a field you expect being absent from a historical table — requires a different check.
A custom test that validates field availability for a given date range:
-- This should return 0 rows (no nulls for post-July-2024 data)SELECT COUNT(*) AS null_countFROM {{ ref('base__ga4__events') }}WHERE event_date >= '2024-07-01' AND session_traffic_source_last_click IS NULL AND event_name = 'session_start'Pairing this with a reasonable threshold (some nulls are expected even in post-July-2024 data due to edge cases) and alerting when it exceeds expectations gives you signal if the field stops populating — a potential sign of a GA4 linking configuration change.
Treating the Schema as Infrastructure
GA4’s schema is versioned infrastructure that changes on Google’s timeline without announcements. A quarterly review of INFORMATION_SCHEMA changes, combined with monitoring GA4 release notes, surfaces new fields before they are discovered through broken models or missed in production pipelines.