ServicesAboutNotesContact Get in touch →
EN FR
Note

GA4 event_params Type Detection

How GA4 auto-detects parameter types across string_value, int_value, and double_value fields — and the defensive COALESCE pattern when the type isn't guaranteed.

Planted
ga4bigqueryanalyticsdata engineering

The event_params array in GA4’s BigQuery export stores each parameter as a key-value pair where the value is a RECORD with four possible fields: string_value, int_value, float_value, and double_value. GA4 auto-detects which field to populate based on the data type of the value sent.

This design keeps the schema flexible — you can add new parameters without altering table structure — but it introduces a class of subtle bugs that are easy to miss because queries don’t error out. They just return nulls.

How Type Assignment Works

GA4 assigns values to type fields as follows:

Value FieldTypeUsed For
string_valueSTRINGText values, URLs, categories, identifiers
int_valueINTEGERCounts, IDs, Unix timestamps, boolean flags (0/1)
float_valueFLOATCurrently unused by GA4
double_valueFLOATDecimal values, prices

The auto-detection is consistent for built-in GA4 parameters. You can rely on:

  • ga_session_id → always int_value (Unix timestamp)
  • ga_session_number → always int_value
  • page_location → always string_value
  • page_title → always string_value
  • page_referrer → always string_value
  • engaged_session_event → always int_value (0 or 1)

Extracting these with the wrong type field returns null silently:

-- WRONG: Returns NULL for session ID
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
-- CORRECT: Returns the integer session timestamp
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')

The Custom Parameter Problem

Custom parameters — dimensions you instrument yourself — can land in different type fields depending on how the values were collected. The problem compounds with implementation inconsistencies over time:

  • A parameter sent as a string "123" goes to string_value
  • The same parameter sent as the integer 123 goes to int_value
  • If both happen across different events or over time, you have values split across two type fields

This happens in practice. A developer changes a GA4 tag implementation to send a previously-string parameter as a number. Historical events have the value in string_value. New events have it in int_value. A query that only reads from one field gets half the data.

The COALESCE Defensive Pattern

When you’re uncertain about a parameter’s type — or when type assignment has been inconsistent — use COALESCE across all relevant type fields:

SELECT
COALESCE(
value.string_value,
CAST(value.int_value AS STRING),
CAST(value.double_value AS STRING)
) AS param_value
FROM UNNEST(event_params)
WHERE key = 'custom_param'

This returns the first non-null value across the three fields, casting numeric types to string for a consistent output type. It’s defensive but not free — you’re evaluating three expressions instead of one. For high-volume queries over large date ranges, this overhead accumulates.

A more targeted approach, once you’ve audited which types are actually in play:

-- If you know the param is either string or int, never double:
COALESCE(
value.string_value,
CAST(value.int_value AS STRING)
) AS param_value

Auditing Type Distribution

Before building production models that extract a custom parameter, audit how values are actually distributed across type fields:

SELECT
COUNTIF(value.string_value IS NOT NULL) AS string_count,
COUNTIF(value.int_value IS NOT NULL) AS int_count,
COUNTIF(value.double_value IS NOT NULL) AS double_count,
COUNTIF(value.float_value IS NOT NULL) AS float_count,
COUNT(*) AS total
FROM `project.analytics_123456789.events_*`,
UNNEST(event_params)
WHERE key = 'your_custom_param'
AND _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'

If string_count is 95,000 and int_count is 5,000, you know you need COALESCE. If only int_count is non-zero, you can extract directly from int_value with confidence.

Run this audit across a representative date range — not just recent data — especially if your GA4 implementation has been active for years and may have evolved.

Encoding in dbt Macros

When you’re building reusable extraction macros, encode the type assumption explicitly so future maintainers understand the intent:

-- macros/ga4_param_string.sql
{% macro ga4_param_string(column, param_name) %}
(SELECT value.string_value FROM UNNEST({{ column }}) WHERE key = '{{ param_name }}')
{% endmacro %}
-- macros/ga4_param_int.sql
{% macro ga4_param_int(column, param_name) %}
(SELECT value.int_value FROM UNNEST({{ column }}) WHERE key = '{{ param_name }}')
{% endmacro %}
-- macros/ga4_param_coerce.sql
-- Use when type is uncertain or historically inconsistent
{% macro ga4_param_coerce(column, param_name) %}
COALESCE(
(SELECT value.string_value FROM UNNEST({{ column }}) WHERE key = '{{ param_name }}'),
CAST((SELECT value.int_value FROM UNNEST({{ column }}) WHERE key = '{{ param_name }}') AS STRING),
CAST((SELECT value.double_value FROM UNNEST({{ column }}) WHERE key = '{{ param_name }}') AS STRING)
)
{% endmacro %}

The macro name signals the assumption. ga4_param_string tells you string is expected. ga4_param_coerce tells you type is uncertain. Documentation in the macro name is better than a comment that might drift from reality.

Relationship to user_properties

The same type detection logic applies to the user_properties array. User properties are set via setUserProperty and stored in an identical nested structure: each entry has a key and a value RECORD with string_value, int_value, float_value, and double_value. The same auditing and defensive extraction patterns apply.

The one difference with user_properties is the additional set_timestamp_micros field, which records when the property was last updated — useful for tracking property drift over time.