ServicesAboutNotesContact Get in touch →
EN FR
Note

Eventarc Event-Driven dbt Triggers

Using Eventarc to trigger dbt runs when upstream data arrives — Cloud Storage object creation, BigQuery audit log events, and combining event-driven with scheduled execution.

Planted
dbtgcpdata engineeringautomation

Scheduled dbt runs run at a fixed interval regardless of whether new data is available. Running too frequently processes zero new rows most of the time; running too infrequently delays data that arrived shortly after the last run.

Eventarc triggers dbt when upstream data actually arrives — “run dbt when there’s something new to transform” rather than on a fixed schedule. This is event-driven orchestration without Airflow DAGs or polling loops: a GCP service routes events to a Cloud Run Job.

Cloud Storage Triggers

The most common pattern: trigger dbt when a file lands in a GCS bucket. This suits pipelines where upstream data is delivered as files — CSV exports, Parquet dumps from external systems, or output from extraction tools.

Terminal window
gcloud eventarc triggers create dbt-on-upload \
--location=us-central1 \
--destination-run-job=dbt-daily \
--destination-run-region=us-central1 \
--event-filters="type=google.cloud.storage.object.v1.finalized" \
--event-filters="bucket=your-data-bucket" \
--service-account=dbt-runner@PROJECT_ID.iam.gserviceaccount.com

The object.v1.finalized event fires when an object is created or overwritten in the bucket. Every file upload triggers your dbt job. For buckets with frequent writes, this can mean many dbt invocations — which may or may not be what you want.

To narrow the scope, Eventarc supports path pattern matching. If only files in a specific prefix should trigger dbt:

Terminal window
gcloud eventarc triggers create dbt-on-raw-data \
--location=us-central1 \
--destination-run-job=dbt-daily \
--destination-run-region=us-central1 \
--event-filters="type=google.cloud.storage.object.v1.finalized" \
--event-filters="bucket=your-data-bucket" \
--event-filters-path-pattern="objectId=/raw/daily/*" \
--service-account=dbt-runner@PROJECT_ID.iam.gserviceaccount.com

This fires only when objects land under the raw/daily/ prefix. Files in other prefixes — backups, archives, staging — are ignored.

BigQuery Audit Log Triggers

For pipelines where upstream data arrives via BigQuery load jobs (not file uploads), Eventarc can watch BigQuery’s audit logs:

Terminal window
gcloud eventarc triggers create dbt-on-bq-load \
--location=us-central1 \
--destination-run-job=dbt-daily \
--destination-run-region=us-central1 \
--event-filters="type=google.cloud.audit.log.v1.written" \
--event-filters="serviceName=bigquery.googleapis.com" \
--event-filters="methodName=google.cloud.bigquery.v2.JobService.InsertJob" \
--service-account=dbt-runner@PROJECT_ID.iam.gserviceaccount.com

This triggers dbt whenever a BigQuery load job completes. The pattern works when an ingestion tool (Fivetran, Airbyte, a custom pipeline) loads data directly into BigQuery tables that dbt sources reference.

The audit log filter methodName=google.cloud.bigquery.v2.JobService.InsertJob is broad — it captures all BigQuery job inserts, including queries, not just load jobs. For more precision, process the event payload in a lightweight Cloud Function that filters by job type and source table before invoking the Cloud Run Job. This adds a layer of indirection but prevents dbt from running on every query execution in the project.

Combining Event-Driven with Scheduled Runs

Pure event-driven execution has a fragile edge: if the event source fails silently (an upstream system stops sending files, an API call starts returning empty results), dbt never runs. Nobody notices until someone asks “why is the dashboard showing yesterday’s data?”

The robust pattern combines both:

  1. Eventarc triggers handle the happy path — data arrives, dbt runs promptly
  2. Cloud Scheduler provides a fallback — a scheduled run that catches anything events missed
Terminal window
# Event-driven: run when data lands
gcloud eventarc triggers create dbt-on-upload \
--location=us-central1 \
--destination-run-job=dbt-daily \
--destination-run-region=us-central1 \
--event-filters="type=google.cloud.storage.object.v1.finalized" \
--event-filters="bucket=your-data-bucket" \
--service-account=dbt-runner@PROJECT_ID.iam.gserviceaccount.com
# Scheduled fallback: ensure models refresh at least daily
gcloud scheduler jobs create http dbt-daily-fallback \
--location=us-central1 \
--schedule="0 6 * * *" \
--uri="https://us-central1-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/PROJECT_ID/jobs/dbt-daily:run" \
--http-method=POST \
--oauth-service-account-email=dbt-scheduler@PROJECT_ID.iam.gserviceaccount.com

The scheduled run is a safety net. If Eventarc already triggered a successful run for today’s data, the scheduled run processes zero new rows (assuming your models are incremental) and completes quickly. The cost of a redundant run is negligible. The cost of missing a day of data is not.

Deduplication and Throttling

Event-driven triggers can fire more often than you want dbt to run. Ten files landing in quick succession trigger ten dbt executions. For most dbt projects, only the last execution matters — the previous nine process incomplete data.

Cloud Run Jobs helps here: if a job execution is already running when a new trigger fires, Cloud Run queues the execution. But queuing ten redundant runs still wastes compute.

Strategies to handle this:

Debouncing via Cloud Workflows. Instead of triggering the Cloud Run Job directly, route Eventarc to a Cloud Workflow that checks whether a run is already in progress. If so, skip or delay. This adds complexity but prevents redundant executions.

Batch window approach. Rather than triggering on individual file uploads, trigger on a “manifest” or “completion marker” file that your upstream process writes after all files for a batch are uploaded. This converts many events into one.

Idempotent dbt runs. Design your dbt models to be idempotent — running dbt twice on the same data produces the same result. Incremental models with proper deduplication achieve this naturally. If redundant runs are cheap and harmless, throttling becomes unnecessary.

When Event-Driven Triggering Makes Sense

Event-driven dbt works well for:

  • Near-real-time dashboards where users expect data freshness within minutes of source updates
  • Multi-source pipelines where different sources arrive at unpredictable times
  • Cost-sensitive environments where running dbt only when needed saves compute (relevant for slot-based BigQuery pricing)

It adds complexity that isn’t justified when:

  • A daily schedule is sufficient. If stakeholders check dashboards once per morning, hourly or event-driven runs waste engineering effort on freshness nobody uses.
  • Source data arrives on a predictable schedule. If Fivetran syncs at 5 AM and takes 30 minutes, scheduling dbt at 6 AM is simpler and equally effective.
  • The dbt project isn’t incremental. Full-refresh models process all data regardless of when they run. Event-driven triggering saves nothing if the model ignores the timing of source updates.

For most teams starting with dbt on GCP, a Cloud Scheduler cron job is the simpler starting point. Eventarc adds value when data freshness requirements genuinely require event-driven triggering.