GA4 dbt Project Configuration

A GA4 dbt project needs configuration that makes it both reusable across properties and explicit about its behavioral defaults. The dbt_project.yml carries four categories of configuration: project identity, variable-driven behavior, folder-level materializations, and test defaults.

The Complete dbt_project.yml

name: 'ga4_analytics'
version: '1.0.0'

profile: 'ga4_analytics'

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]

target-path: "target"
clean-targets:
  - "target"
  - "dbt_packages"

vars:
  # GCP Configuration
  ga4_project_id: "{{ env_var('GA4_PROJECT_ID') }}"
  ga4_dataset: "{{ env_var('GA4_DATASET', 'analytics_123456789') }}"

  # Processing Configuration
  ga4_start_date: "20230101"
  ga4_static_incremental_days: 3

  # Business Configuration
  ga4_conversion_events:
    - 'purchase'
    - 'sign_up'
    - 'generate_lead'
    - 'contact_form_submit'

  # URL Cleaning
  ga4_query_params_to_remove:
    - 'gclid'
    - 'fbclid'
    - 'utm_id'
    - '_ga'

models:
  ga4_analytics:
    base:
      +schema: base
      +materialized: view
      +tags: ['ga4', 'base']
      ga4:
        +materialized: incremental

    intermediate:
      +schema: intermediate
      +materialized: incremental
      +tags: ['ga4', 'intermediate']

    marts:
      +schema: marts
      +materialized: table
      +tags: ['ga4', 'marts']

tests:
  +severity: warn
  +store_failures: true

The Variable System

Every property-specific value should be a variable rather than hardcoded in models. This makes the project reusable: clone it for a new GA4 property, update dbt_project.yml, and you’re done.

ga4_project_id and ga4_dataset: The GCP project and BigQuery dataset where GA4’s export lives. Using env_var() keeps credentials out of version control. The ga4_dataset has a fallback default for local development.

ga4_start_date: The earliest date to process from. On initial full refresh, this determines how far back history goes. Setting it too early wastes compute; setting it too late loses history. Use the property’s first meaningful export date.

ga4_static_incremental_days: Controls the lookback window size. Default 3 days handles most late-arriving data scenarios. Increase to 5 for properties with heavy Google Ads conversion tracking. See GA4 Sharded-to-Partitioned Base Model for the reasoning.

ga4_conversion_events: The list of events that count as conversions. Used in the sessionized model for the session__has_* flags. Making this a variable means you add a new conversion event type in one place rather than hunting through model SQL.

ga4_query_params_to_remove: Parameters to strip from page URLs. gclid, fbclid, utm_id, and _ga are tracking parameters that make the same page look like different pages in landing page analysis. This list varies by organization.

Folder-Level Materializations

The models: block sets materialization defaults by folder, overriding them only where needed:

models:
  ga4_analytics:
    base:
      +materialized: view      # Default: view
      ga4:
        +materialized: incremental  # GA4 subfolder: incremental
    intermediate:
      +materialized: incremental
    marts:
      +materialized: table

The base folder defaults to view — non-GA4 base models (if you add other sources) get views by default. The ga4 subfolder overrides this to incremental because GA4 event volume makes anything else prohibitively expensive. The override is at the subfolder level, not per-model.

This structure follows the pattern: folder-level configuration for defaults, model-level configuration only for exceptions.

Schema Separation

The +schema config at each folder level creates separate schemas in BigQuery:

base__ga4__events → {project}.base.base__ga4__events
int__ga4__events_sessionized → {project}.intermediate.int__ga4__events_sessionized
mrt__analytics__sessions → {project}.marts.mrt__analytics__sessions

This separation matters for access control (analysts might have read access to marts but not intermediate) and for cost attribution (BigQuery INFORMATION_SCHEMA can show costs by schema).

Test Defaults

tests:
  +severity: warn
  +store_failures: true

severity: warn globally means no test failure blocks the pipeline. For a GA4 project where the data arrives from external systems you don’t control, hard failures would require constant firefighting. Warnings surface issues for investigation without stopping reporting.

store_failures: true writes failing rows to the dbt_test__audit schema. When the channel grouping test warns about unexpected values, you can query the stored failures to see which source/medium combinations triggered it — essential for diagnosing UTM naming problems.

Override to error severity for tests where pipeline blocking is appropriate:

- name: event__key
  tests:
    - unique:
        severity: error

Running the Project

Initial setup:

# Verify connection and configuration
dbt debug

# Full refresh to build from ga4_start_date
dbt build --full-refresh

# Generate docs
dbt docs generate && dbt docs serve

Daily incremental:

dbt build

Backfill a specific date range:

dbt build --vars '{"ga4_start_date": "20240101"}'

Recommended schedule: Run 4-6 hours after midnight in your primary timezone. GA4’s daily export typically completes within a few hours of the day ending. Running too early means some of the day’s data hasn’t landed yet; running too late delays reporting unnecessarily.

Project Folder Structure

models/
├── base/
│   └── ga4/
│       ├── _ga4__sources.yml
│       ├── _ga4__models.yml
│       └── base__ga4__events.sql
├── intermediate/
│   └── ga4/
│       ├── _int_ga4__models.yml
│       ├── int__ga4__event_items.sql
│       └── int__ga4__events_sessionized.sql
└── marts/
    └── ga4/
        ├── _mrt_ga4__models.yml
        ├── mrt__analytics__sessions.sql
        └── mrt__analytics__users.sql

macros/
└── ga4/
    ├── extract_event_param.sql
    ├── extract_event_param_numeric.sql
    └── default_channel_grouping.sql

tests/
└── singular/
    ├── test_sessions_missing_session_start.sql
    └── test_purchase_without_session.sql

The ga4 subfolder under each layer keeps the project clean if you add other source systems. The macro subfolder prevents naming collisions. Singular tests in their own folder are easily discoverable.