A GA4 dbt project needs configuration that makes it both reusable across properties and explicit about its behavioral defaults. The dbt_project.yml carries four categories of configuration: project identity, variable-driven behavior, folder-level materializations, and test defaults.
The Complete dbt_project.yml
name: 'ga4_analytics'version: '1.0.0'
profile: 'ga4_analytics'
model-paths: ["models"]analysis-paths: ["analyses"]test-paths: ["tests"]seed-paths: ["seeds"]macro-paths: ["macros"]
target-path: "target"clean-targets: - "target" - "dbt_packages"
vars: # GCP Configuration ga4_project_id: "{{ env_var('GA4_PROJECT_ID') }}" ga4_dataset: "{{ env_var('GA4_DATASET', 'analytics_123456789') }}"
# Processing Configuration ga4_start_date: "20230101" ga4_static_incremental_days: 3
# Business Configuration ga4_conversion_events: - 'purchase' - 'sign_up' - 'generate_lead' - 'contact_form_submit'
# URL Cleaning ga4_query_params_to_remove: - 'gclid' - 'fbclid' - 'utm_id' - '_ga'
models: ga4_analytics: base: +schema: base +materialized: view +tags: ['ga4', 'base'] ga4: +materialized: incremental
intermediate: +schema: intermediate +materialized: incremental +tags: ['ga4', 'intermediate']
marts: +schema: marts +materialized: table +tags: ['ga4', 'marts']
tests: +severity: warn +store_failures: trueThe Variable System
Every property-specific value should be a variable rather than hardcoded in models. This makes the project reusable: clone it for a new GA4 property, update dbt_project.yml, and you’re done.
ga4_project_id and ga4_dataset: The GCP project and BigQuery dataset where GA4’s export lives. Using env_var() keeps credentials out of version control. The ga4_dataset has a fallback default for local development.
ga4_start_date: The earliest date to process from. On initial full refresh, this determines how far back history goes. Setting it too early wastes compute; setting it too late loses history. Use the property’s first meaningful export date.
ga4_static_incremental_days: Controls the lookback window size. Default 3 days handles most late-arriving data scenarios. Increase to 5 for properties with heavy Google Ads conversion tracking. See GA4 Sharded-to-Partitioned Base Model for the reasoning.
ga4_conversion_events: The list of events that count as conversions. Used in the sessionized model for the session__has_* flags. Making this a variable means you add a new conversion event type in one place rather than hunting through model SQL.
ga4_query_params_to_remove: Parameters to strip from page URLs. gclid, fbclid, utm_id, and _ga are tracking parameters that make the same page look like different pages in landing page analysis. This list varies by organization.
Folder-Level Materializations
The models: block sets materialization defaults by folder, overriding them only where needed:
models: ga4_analytics: base: +materialized: view # Default: view ga4: +materialized: incremental # GA4 subfolder: incremental intermediate: +materialized: incremental marts: +materialized: tableThe base folder defaults to view — non-GA4 base models (if you add other sources) get views by default. The ga4 subfolder overrides this to incremental because GA4 event volume makes anything else prohibitively expensive. The override is at the subfolder level, not per-model.
This structure follows the pattern: folder-level configuration for defaults, model-level configuration only for exceptions.
Schema Separation
The +schema config at each folder level creates separate schemas in BigQuery:
base__ga4__events→{project}.base.base__ga4__eventsint__ga4__events_sessionized→{project}.intermediate.int__ga4__events_sessionizedmrt__analytics__sessions→{project}.marts.mrt__analytics__sessions
This separation matters for access control (analysts might have read access to marts but not intermediate) and for cost attribution (BigQuery INFORMATION_SCHEMA can show costs by schema).
Test Defaults
tests: +severity: warn +store_failures: trueseverity: warn globally means no test failure blocks the pipeline. For a GA4 project where the data arrives from external systems you don’t control, hard failures would require constant firefighting. Warnings surface issues for investigation without stopping reporting.
store_failures: true writes failing rows to the dbt_test__audit schema. When the channel grouping test warns about unexpected values, you can query the stored failures to see which source/medium combinations triggered it — essential for diagnosing UTM naming problems.
Override to error severity for tests where pipeline blocking is appropriate:
- name: event__key tests: - unique: severity: errorRunning the Project
Initial setup:
# Verify connection and configurationdbt debug
# Full refresh to build from ga4_start_datedbt build --full-refresh
# Generate docsdbt docs generate && dbt docs serveDaily incremental:
dbt buildBackfill a specific date range:
dbt build --vars '{"ga4_start_date": "20240101"}'Recommended schedule: Run 4-6 hours after midnight in your primary timezone. GA4’s daily export typically completes within a few hours of the day ending. Running too early means some of the day’s data hasn’t landed yet; running too late delays reporting unnecessarily.
Project Folder Structure
models/├── base/│ └── ga4/│ ├── _ga4__sources.yml│ ├── _ga4__models.yml│ └── base__ga4__events.sql├── intermediate/│ └── ga4/│ ├── _int_ga4__models.yml│ ├── int__ga4__event_items.sql│ └── int__ga4__events_sessionized.sql└── marts/ └── ga4/ ├── _mrt_ga4__models.yml ├── mrt__analytics__sessions.sql └── mrt__analytics__users.sql
macros/└── ga4/ ├── extract_event_param.sql ├── extract_event_param_numeric.sql └── default_channel_grouping.sql
tests/└── singular/ ├── test_sessions_missing_session_start.sql └── test_purchase_without_session.sqlThe ga4 subfolder under each layer keeps the project clean if you add other source systems. The macro subfolder prevents naming collisions. Singular tests in their own folder are easily discoverable.