GA4 BigQuery Export: The Complete Schema Reference

Part 1 of the GA4 + BigQuery series for analytics engineers

The GA4 BigQuery export delivers raw, unsampled, event-level data piped directly into your warehouse. No more sampling warnings. No more “(other)” dimension bucketing. Full SQL access to every click, scroll, and purchase.

But the reality is messier. You’ll encounter nested arrays requiring UNNEST gymnastics, three different traffic source structures that populate under different conditions, undocumented schema changes that break production queries, and subtle gotchas that Google’s documentation glosses over entirely.

This reference is the practical companion to the official docs. It’s the field guide I wish existed when I first opened an events_ table and wondered why my session counts were wrong. Whether you’re setting up your first export or debugging a pipeline that’s been running for years, this guide covers what you actually need to know.

What Gets Exported

When you link a GA4 property to BigQuery, a dataset named analytics_<property_id> appears in your project. Inside, you’ll find date-sharded tables (not partitioned tables) that capture every event flowing through your property.

The Four Table Types

Table	Purpose	Key Limitation
`events_YYYYMMDD`	Daily export, your source of truth	10+ hour latency
`events_intraday_YYYYMMDD`	Streaming near-real-time data	Missing attribution fields
`pseudonymous_users_YYYYMMDD`	User export keyed by device ID	Optional, must enable separately
`users_YYYYMMDD`	User export keyed by custom user_id	Optional, must enable separately

The daily tables (events_YYYYMMDD) contain fully processed data with complete user attribution. They typically arrive mid-afternoon in your property’s timezone, though there’s no guaranteed SLA for standard properties. These tables continue updating for up to 72 hours to capture late-arriving events (mobile app data and Measurement Protocol hits often arrive delayed).

The streaming tables (events_intraday_YYYYMMDD) provide data within minutes of event occurrence. They’re deleted automatically once the corresponding daily table completes. Critical limitation: the traffic_source, user_ltv, and is_active_user fields are never populated in intraday tables.

The user tables export one row per user identifier with audience memberships, lifetime value metrics, and predictive scores. The pseudonymous_users_ table keys on user_pseudo_id (device ID), while users_ keys on your custom user_id. Both require explicit enablement in the BigQuery linking settings.

Daily vs Streaming: When to Use Which

timeline
    title GA4 Export Timeline (Day N)
    section Day N
        Events occur : User activity throughout the day
        Intraday available : Within minutes of each event
    section Day N+1
        Daily export completes : ~10-16 hours after midnight
        Intraday table deleted : After daily export finalizes
    section Day N+3
        Late data window closes : 72 hours for stragglers

The choice isn’t either/or. Enable both exports and use them for different purposes.

Daily exports are your source of truth for reporting. They include complete user attribution and guaranteed data completeness (after the 72-hour window). The tradeoff is latency: you won’t see yesterday’s data until mid-afternoon today.

Streaming exports enable same-day monitoring and real-time dashboards. Expect 0.5-2% discrepancy between intraday and final daily data, with some edge cases showing up to 20% differences. Don’t use intraday tables for final reporting.

Cost differs significantly. Daily batch loading is free—you pay only for storage and queries. Streaming ingestion costs $0.05 per GB, roughly 600,000 events. A site with 1 million daily events pays approximately $1.50/month for streaming.

Standard properties face a 1 million events per day limit on daily exports (no limit on streaming). GA4 360 properties can export up to 20 billion events daily.

The Events Table Schema

Every row in the events table represents a single event. This is the fundamental shift from Universal Analytics, where one row represented one session with events nested inside. Understanding this event-centric model is essential.

events_YYYYMMDD (one row = one event)
│
├── Top-level fields
│   ├── event_name, event_timestamp, event_date
│   ├── user_pseudo_id, user_id
│   └── platform, stream_id
│
├── Nested RECORDs (single values)
│   ├── device.category, device.browser, device.os...
│   ├── geo.country, geo.city, geo.region...
│   ├── traffic_source.source, .medium, .name
│   ├── collected_traffic_source.manual_source...
│   └── privacy_info.analytics_storage...
│
└── REPEATED RECORDs (arrays, require UNNEST)
    ├── event_params[] ─── key + value.string_value/int_value/double_value
    ├── user_properties[] ─── key + value + set_timestamp_micros
    └── items[] ─── item_id, item_name, price, quantity...
                └── items.item_params[] (nested array)

Core Event Fields

The foundational fields appear at the top level of every row:

Field	Type	Description
`event_name`	STRING	Event name (`page_view`, `purchase`, custom events)
`event_timestamp`	INTEGER	UTC time in microseconds since Unix epoch
`event_date`	STRING	Date in property timezone (YYYYMMDD format)
`event_value_in_usd`	FLOAT	Currency-converted value when event includes `value` param
`event_bundle_sequence_id`	INTEGER	Sequence within the upload bundle

Added in July 2024, three batch-ordering fields solve a longstanding problem: events batched together previously shared identical timestamps with no way to determine their original sequence.

Field	Purpose
`batch_event_index`	Order within the batch
`batch_page_id`	Groups events from the same page
`batch_ordering_id`	Unique ordering identifier

The Nested event_params Structure

Most of your data lives here. The event_params field is a REPEATED RECORD, meaning it’s an array of key-value pairs attached to each event.

Each element contains a key (STRING) and a value RECORD with four possible fields:

Value Field	Type	Used For
`string_value`	STRING	Text values, URLs, categories
`int_value`	INTEGER	Counts, IDs, boolean flags (0/1)
`float_value`	FLOAT	Currently unused by GA4
`double_value`	FLOAT	Decimal values, prices

GA4 auto-detects which field to use. The ga_session_id lands in int_value. The page_location lands in string_value. Custom parameters may land anywhere depending on the values you send.

Common parameters you’ll extract frequently:

Parameter	Value Field	Description
`ga_session_id`	int_value	Session identifier (Unix timestamp of session start)
`ga_session_number`	int_value	Session count for this user
`page_location`	string_value	Full page URL
`page_title`	string_value	Page title
`page_referrer`	string_value	Previous page URL
`source`	string_value	Traffic source
`medium`	string_value	Traffic medium
`campaign`	string_value	Campaign name
`engaged_session_event`	int_value	1 if session is engaged, 0 otherwise

To extract a parameter, you’ll use UNNEST with a subquery:

SELECT
  event_name,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
  (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
FROM `project.analytics_123456789.events_*`
WHERE _TABLE_SUFFIX = '20260112'

We’ll cover UNNEST patterns in depth in Part 2 of this series.

User Identification

Three fields handle user identity at the top level:

Field	Description
`user_pseudo_id`	Device/browser identifier (the value from the `_ga` cookie on web, Firebase Instance ID on apps)
`user_id`	Your custom identifier, set via `gtag('set', 'user_id', '...')` or equivalent
`user_first_touch_timestamp`	Microsecond timestamp of user’s first interaction with your property

The user_properties REPEATED RECORD mirrors the event_params structure, storing user-scoped properties set via setUserProperty. Unlike event parameters, user properties persist across events and include a set_timestamp_micros field tracking when each was last updated.

Device, Geography, and Platform

The device RECORD contains hardware and software details:

Field	Example Values
`device.category`	mobile, tablet, desktop
`device.operating_system`	Android, iOS, Windows, Macintosh
`device.operating_system_version`	14.0, 10
`device.browser`	Chrome, Safari, Firefox
`device.browser_version`	120.0.6099.109
`device.mobile_brand_name`	Apple, Samsung, Google
`device.mobile_model_name`	iPhone, Pixel 7

The geo RECORD provides IP-based location:

Field	Description
`geo.continent`	Americas, Europe, Asia
`geo.country`	United States, France, Japan
`geo.region`	California, Île-de-France
`geo.city`	San Francisco, Paris
`geo.metro`	DMA region (US only)

The platform field (STRING) indicates web, iOS, or Android. The stream_id identifies which data stream generated the event.

Traffic Source Fields: Three Structures, Three Purposes

GA4 exports three different traffic source structures, each serving a distinct attribution purpose:

flowchart LR
    subgraph ts["traffic_source"]
        ts_when["When: Once per user lifetime"]
        ts_what["Attribution: First touch"]
        ts_note["⚠️ Never in intraday tables"]
    end

    subgraph cts["collected_traffic_source"]
        cts_when["When: Every event with UTMs"]
        cts_what["Attribution: Event-level"]
        cts_note["Contains: gclid, dclid, manual_*"]
    end

    subgraph stslc["session_traffic_source_last_click"]
        stslc_when["When: Per session"]
        stslc_what["Attribution: Last click"]
        stslc_note["Added: July 2024"]
    end

    user((User arrives)) --> ts
    event((Event fires)) --> cts
    session((Session starts)) --> stslc

traffic_source (RECORD): First-touch attribution showing how the user was originally acquired. Values never change after first set. Contains source, medium, name (campaign). Not populated in intraday tables.

collected_traffic_source (RECORD): Event-level UTM parameters and click IDs captured at the moment of the event. Contains manual_source, manual_medium, manual_campaign_name, plus click IDs like gclid, dclid, and srsltid.

session_traffic_source_last_click (RECORD): Added July 2024. Comprehensive last-click attribution with nested records for platform-specific campaign data: google_ads_campaign, sa360_campaign, dv360_campaign, cm360_campaign, and manual_campaign.

When building attribution models, understand which field answers your question. “How did we acquire this user?” → traffic_source. “What campaign was active when this specific event fired?” → collected_traffic_source. “What drove this session?” → session_traffic_source_last_click. For a deeper dive into warehouse-native attribution modeling, see the dedicated guide.

E-commerce Structures

The ecommerce RECORD stores transaction-level data:

Field	Description
`ecommerce.transaction_id`	Order ID
`ecommerce.purchase_revenue`	Revenue in local currency
`ecommerce.purchase_revenue_in_usd`	Revenue converted to USD
`ecommerce.shipping_value`	Shipping cost
`ecommerce.tax_value`	Tax amount
`ecommerce.total_item_quantity`	Total items in transaction

The items REPEATED RECORD contains one entry per product:

Field	Description
`items.item_id`	Product SKU
`items.item_name`	Product name
`items.item_brand`	Brand
`items.item_category` through `item_category5`	Category hierarchy
`items.price`	Unit price
`items.quantity`	Quantity
`items.coupon`	Applied coupon
`items.item_params`	Custom product parameters (REPEATED RECORD)

The items.item_params field (added October 2023) is itself a nested REPEATED RECORD within items, enabling custom product dimensions.

The privacy_info RECORD captures consent state:

Field	Values	Description
`privacy_info.ads_storage`	Yes, No, Unset	Ad cookie consent
`privacy_info.analytics_storage`	Yes, No, Unset	Analytics cookie consent
`privacy_info.uses_transient_token`	TRUE/FALSE	Using cookieless measurement

When analytics_storage is denied under Consent Mode, events still collect but user_pseudo_id and ga_session_id are stripped. This creates orphaned rows that can’t be attributed to users or sessions, a challenge explored further in the article on user stitching.

Critical Gotchas

The event_params Type Problem

GA4 auto-detects parameter data types, storing values in whichever of the four value fields matches. Custom parameters may land in different fields depending on the values sent. Send “123” as a string once and it goes to string_value; send 123 as a number and it goes to int_value.

When uncertain about a parameter’s type, use COALESCE:

SELECT
  COALESCE(
    value.string_value,
    CAST(value.int_value AS STRING),
    CAST(value.double_value AS STRING)
  ) AS param_value
FROM UNNEST(event_params)
WHERE key = 'custom_param'

session_start Events Are Unreliable

⚠️ Warning: Do not count session_start events to calculate sessions.

The session_start event is evaluated client-side, triggered when an event includes the _ss parameter. This causes duplicate session_start events within single sessions and missing events in others. The problem is particularly acute for sub-properties where the session’s first event may occur before filters apply.

GA4’s own interface no longer uses session_start for its sessions metric. It estimates from unique session IDs instead. Do the same:

-- Correct: count distinct session IDs
SELECT
  COUNT(DISTINCT CONCAT(
    user_pseudo_id,
    CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING)
  )) AS sessions
FROM `project.analytics_123456789.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20260101' AND '20260112'

ga_session_id Is Not Unique Across Users

The ga_session_id is simply the Unix timestamp of when the session started. Multiple users starting sessions at the same second share identical values. Always concatenate with user_pseudo_id to create truly unique session identifiers. This composite key pattern is central to building session tables from GA4 events.

Timezone Traps

Three different timezone contexts coexist:

Field	Timezone
`event_timestamp`	UTC (microseconds)
`event_date`	Property timezone (YYYYMMDD string)
`_TABLE_SUFFIX`	Pacific Time

Converting to your local timezone:

SELECT
  DATETIME(TIMESTAMP_MICROS(event_timestamp), 'Europe/Paris') AS local_time
FROM `project.analytics_123456789.events_*`

When users deny analytics_storage consent, Advanced Consent Mode still collects events as “cookieless pings” but strips user_pseudo_id and ga_session_id. These orphaned rows inflate event counts while being impossible to attribute.

If consent is granted later on the same page, GA4 “backstitches” consent to previously denied hits. You’ll see this as user_pseudo_id becoming populated on events that previously lacked it.

BigQuery Data Differs from GA4 UI

Expect discrepancies. BigQuery exports exclude:

Modeled conversions from Consent Mode
Google Signals user deduplication
Behavioral modeling for missing data

The UI uses HyperLogLog++ approximation for large cardinalities while BigQuery provides exact counts. High-cardinality dimensions showing “(other)” in the UI appear with full detail in BigQuery.

The User-Provided Data Trap

⚠️ Critical Warning: Enabling the “User-provided data” feature in GA4 admin permanently disables user_id export to BigQuery with no reversal option.

If your pipelines depend on user_id in BigQuery, verify this setting before anyone enables it.

Schema Evolution and UA Comparison

How the Schema Has Changed

GA4’s BigQuery export launched around 2019 under the “App+Web” name, inheriting its structure from Firebase’s export schema. Google provides no official changelog, so practitioners must monitor INFORMATION_SCHEMA queries to catch additions.

Major schema additions by approximate date:

Timeframe	Addition
March 2020	`ecommerce` RECORD
June 2021	`privacy_info` for Consent Mode
May 2023	`collected_traffic_source` for event-level attribution
July 2023	`is_active_user` field
October 2023	`items.item_params` nested RECORD
July 2024	Batch sequencing fields, `session_traffic_source_last_click`

New fields are never retroactive. Historical tables won’t contain fields added after their export date.

Key Differences from Universal Analytics

If you’re migrating from UA, the mental model shift is significant:

Aspect	Universal Analytics	GA4
Core unit	Session	Event
Table name	`ga_sessions_YYYYMMDD`	`events_YYYYMMDD`
Row represents	One session	One event
Session ID	`visitId` (direct field)	`ga_session_id` (requires UNNEST)
User ID	`fullVisitorId`	`user_pseudo_id`
Page data	`hits.page.pagePath`	`page_location` in event_params
Custom dimensions	Indexed array (dimension1, dimension2)	Key-value pairs
Availability	GA360 only	All properties

What’s gone entirely: hits.isExit, pre-calculated session metrics (totals.bounces, totals.pageviews), session-scoped custom dimensions (GA4 supports only user and event scope).

Migration reality: UA and GA4 data cannot be directly combined. The fundamental model difference means historical UA data must remain in separate tables. Any unified analysis requires custom transformation layers that map concepts between the two schemas.

Querying Efficiently

Always Filter _TABLE_SUFFIX

GA4 uses date-sharded tables, not partitioned tables. Without a _TABLE_SUFFIX filter, you’ll scan your entire dataset, potentially years of data.

-- Query last 30 days
SELECT
  event_name,
  COUNT(*) AS events
FROM `project.analytics_123456789.events_*`
WHERE _TABLE_SUFFIX BETWEEN
  FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
  AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY event_name

Inline UNNEST vs CROSS JOIN UNNEST

Two patterns for working with nested data:

Inline UNNEST extracts specific parameters while maintaining one row per event. Use this for most queries:

SELECT
  event_name,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location
FROM `project.analytics_123456789.events_*`

CROSS JOIN UNNEST expands rows, creating one row per parameter. Use for exploration:

SELECT
  ep.key,
  COUNT(*) AS occurrences
FROM `project.analytics_123456789.events_*`,
  UNNEST(event_params) AS ep
GROUP BY ep.key
ORDER BY occurrences DESC

Reusable Helper Macros

Reduce repetition with dbt macros (these become the backbone of a production-ready GA4 dbt project):

-- macros/ga4_param_string.sql
{% macro ga4_param_string(column, param_name) %}
(SELECT value.string_value FROM UNNEST({{ column }}) WHERE key = '{{ param_name }}')
{% endmacro %}

-- macros/ga4_param_int.sql
{% macro ga4_param_int(column, param_name) %}
(SELECT value.int_value FROM UNNEST({{ column }}) WHERE key = '{{ param_name }}')
{% endmacro %}

Usage in a dbt model:

SELECT
  {{ ga4_param_int('event_params', 'ga_session_id') }} AS session_id,
  {{ ga4_param_string('event_params', 'page_location') }} AS page_location
FROM {{ source('ga4', 'events') }}
WHERE _TABLE_SUFFIX = '20260112'

Cost Control Checklist

Select only needed columns: avoid SELECT * on tables with nested structures
Run dry runs first: estimate cost before executing large queries
Materialize frequent aggregations: scheduled queries creating summary tables
Use APPROX_COUNT_DISTINCT: when exact precision isn’t required for user counts

For a broader look at keeping BigQuery spend under control, see the BigQuery cost optimization guide.

Key Takeaways

The GA4 BigQuery export provides raw analytics data access, but its event-centric model and nested structures require a deliberate approach:

Enable both daily and streaming exports. Use daily as your source of truth, streaming for same-day monitoring.
Master the inline UNNEST pattern. You’ll use it constantly for parameter extraction.
Always combine user_pseudo_id + ga_session_id for session analysis. Neither field is unique on its own.
Never count session_start events for session metrics. Count distinct session IDs instead.
Treat the schema as a moving target. Monitor for field additions, test queries after major GA4 releases.

Next in This Series

Now that you understand the schema, the real work begins: flattening those nested structures efficiently. Part 2: Unnesting GA4 Events covers patterns for every use case, from simple parameter extraction to complex multi-level unnesting for e-commerce items with custom dimensions. If you want to jump straight into building dbt models, start with Your First GA4 dbt Models.