ServicesAboutNotesContact Get in touch →
EN FR
Note

GA4 Engagement Event Query Recipes

Production-ready BigQuery SQL for GA4 engagement events — page views, scroll depth, outbound clicks, file downloads, and video engagement funnels.

Planted
ga4bigqueryanalytics

Beyond e-commerce, GA4 tracks dozens of engagement events through its enhanced measurement features. Each event type stores its data in different event_params keys, and extracting them follows the same correlated subquery pattern — but knowing which parameters to extract for each event type saves significant discovery time.

These are copy-paste recipes. Adjust the _TABLE_SUFFIX range and project references for your setup.

Page Views with Clean URL Parsing

The page_view event is the most commonly queried GA4 event. The useful parameters live in event_params, not as top-level fields:

SELECT
PARSE_DATE('%Y%m%d', event_date) AS event_date,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_location') AS page_url,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_title') AS page_title,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_referrer') AS referrer,
-- Extract clean path without protocol or query params
REGEXP_REPLACE(
REGEXP_REPLACE(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'),
r'^https?://[^/]+', ''
),
r'[\?#].*$', ''
) AS page_path,
-- Identify landing pages
IFNULL(
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'entrances'),
0
) = 1 AS is_landing_page
FROM `project.dataset.events_*`
WHERE event_name = 'page_view'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'

The double REGEXP_REPLACE for page_path is a pattern you’ll reuse across every page-level query. First pass strips protocol and domain (https://example.com/products becomes /products). Second pass strips query parameters and fragments. If you’re building this in dbt, put the path extraction in a reusable macro.

The entrances parameter identifies landing pages — the first page view in a session. It’s an int_value, not a string, so extract it accordingly.

Scroll Depth Analysis

GA4’s enhanced measurement fires a scroll event when users reach 90% scroll depth on a page (this threshold is not configurable). The event itself is simple — the interesting analysis comes from calculating scroll rate against page views:

WITH page_views AS (
SELECT
REGEXP_REPLACE(
REGEXP_REPLACE(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'),
r'^https?://[^/]+', ''
),
r'[\?#].*$', ''
) AS page_path,
COUNT(*) AS pageviews
FROM `project.dataset.events_*`
WHERE event_name = 'page_view'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY page_path
),
scrolls AS (
SELECT
REGEXP_REPLACE(
REGEXP_REPLACE(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'),
r'^https?://[^/]+', ''
),
r'[\?#].*$', ''
) AS page_path,
COUNT(*) AS scroll_events
FROM `project.dataset.events_*`
WHERE event_name = 'scroll'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY page_path
)
SELECT
pv.page_path,
pv.pageviews,
IFNULL(s.scroll_events, 0) AS scroll_events,
SAFE_DIVIDE(s.scroll_events, pv.pageviews) AS scroll_rate
FROM page_views pv
LEFT JOIN scrolls s ON pv.page_path = s.page_path
WHERE pv.pageviews > 100
ORDER BY scroll_rate DESC

The HAVING pageviews > 100 filter eliminates low-traffic pages where scroll rate is noisy. Adjust the threshold based on your traffic volume.

A low scroll rate on a long-form article may indicate low engagement past the fold. A high scroll rate on a short page is expected. Scroll rate is most meaningful when compared across similar page types.

The click event fires for outbound links (when enhanced measurement is enabled). The key parameters:

SELECT
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'link_url') AS link_url,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'link_domain') AS link_domain,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'outbound') AS is_outbound,
COUNT(*) AS clicks
FROM `project.dataset.events_*`
WHERE event_name = 'click'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY link_url, link_domain, is_outbound
ORDER BY clicks DESC

The outbound parameter is a string 'true' or 'false', not a boolean — type detection stores it as string_value. Filter with = 'true' if you only want external clicks.

File Download Tracking

GA4 automatically tracks downloads of common file types (pdf, xlsx, docx, csv, etc.) when enhanced measurement is enabled:

SELECT
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'file_name') AS file_name,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'file_extension') AS file_extension,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'link_url') AS download_url,
COUNT(*) AS downloads,
COUNT(DISTINCT user_pseudo_id) AS unique_users
FROM `project.dataset.events_*`
WHERE event_name = 'file_download'
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY file_name, file_extension, download_url
ORDER BY downloads DESC

file_name and file_extension are extracted from the URL by GA4’s JavaScript — they can be wrong if the URL doesn’t contain a traditional filename pattern. Always verify against link_url for accuracy.

Video Engagement Funnel

GA4 tracks YouTube embeds automatically (when enhanced measurement is on) through three events: video_start, video_progress, and video_complete. The video_progress event fires at 10%, 25%, 50%, and 75% milestones.

SELECT
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'video_title') AS video_title,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'video_provider') AS video_provider,
COUNTIF(event_name = 'video_start') AS starts,
COUNTIF(event_name = 'video_progress'
AND (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'video_percent') = 25
) AS reached_25_pct,
COUNTIF(event_name = 'video_progress'
AND (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'video_percent') = 50
) AS reached_50_pct,
COUNTIF(event_name = 'video_progress'
AND (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'video_percent') = 75
) AS reached_75_pct,
COUNTIF(event_name = 'video_complete') AS completions,
SAFE_DIVIDE(
COUNTIF(event_name = 'video_complete'),
COUNTIF(event_name = 'video_start')
) AS completion_rate
FROM `project.dataset.events_*`
WHERE event_name IN ('video_start', 'video_progress', 'video_complete')
AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY video_title, video_provider
HAVING starts > 10
ORDER BY starts DESC

The nested COUNTIF with a correlated subquery inside is worth studying. It combines event-level filtering (event_name = 'video_progress') with parameter-level filtering (video_percent = 50) in a single aggregation pass. This pattern works for any funnel analysis where you need to count events that match both an event type and a specific parameter value.

The HAVING starts > 10 filter removes low-traffic videos where the funnel percentages are statistically unreliable. Video completion rates vary by content type — 30-40% is typical for marketing videos; instructional content often sees 60%+.

Session Metrics

Session-level aggregation requires the composite session keyuser_pseudo_id combined with ga_session_id:

WITH session_events AS (
SELECT
CONCAT(
user_pseudo_id, '-',
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
) AS session_id,
user_pseudo_id,
event_timestamp,
event_name,
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'engagement_time_msec') AS engagement_time_msec,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'session_engaged') AS session_engaged,
traffic_source.source,
traffic_source.medium,
device.category AS device_category,
geo.country
FROM `project.dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
)
SELECT
session_id,
user_pseudo_id,
MIN(TIMESTAMP_MICROS(event_timestamp)) AS session_start,
MAX(TIMESTAMP_MICROS(event_timestamp)) AS session_end,
TIMESTAMP_DIFF(
MAX(TIMESTAMP_MICROS(event_timestamp)),
MIN(TIMESTAMP_MICROS(event_timestamp)),
SECOND
) AS session_duration_seconds,
COUNT(*) AS session_events,
COUNTIF(event_name = 'page_view') AS session_pageviews,
SUM(engagement_time_msec) / 1000 AS session_engagement_seconds,
MAX(session_engaged) = '1' AS session_is_engaged,
ANY_VALUE(source) AS source,
ANY_VALUE(medium) AS medium,
ANY_VALUE(device_category) AS device_category,
ANY_VALUE(country) AS country
FROM session_events
GROUP BY session_id, user_pseudo_id

Note that session_engaged is a string '1', not an integer — another type detection gotcha. And engagement_time_msec is cumulative across events within a session, so SUM gives total engagement time.

For production use, this pattern belongs in a dbt model rather than ad-hoc SQL. See the three-layer architecture for how this fits into a full pipeline.