ServicesAboutNotesContact Get in touch →
EN FR
Note

Dataform as a GCP Service

What Dataform is in 2026 — a fully managed BigQuery transformation service with deep GCP integration, zero licensing cost, and SQLX/JavaScript templating

Planted
dataformbigquerygcpdata engineeringdata modeling

Dataform is a fully managed SQL transformation service embedded in the BigQuery console. Google acquired the company — a 7-person London startup founded by ex-Googlers — in December 2020. It has since evolved from a standalone product into a native GCP service that competes with dbt for BigQuery transformation workloads.

How It Works

Dataform uses SQLX files with JavaScript templating instead of dbt’s SQL/Jinja combination. You define transformations, dependencies, and assertions in SQLX, then Dataform compiles and executes them against BigQuery. The compilation happens in a JavaScript engine, which historically provided faster compilation than dbt’s Python-based approach.

A basic SQLX file looks like standard SQL with a config block:

config {
type: "table",
schema: "marts",
description: "Customer lifetime metrics",
dependencies: ["stg_customers", "stg_orders"]
}
SELECT
c.customer_id,
c.email,
COUNT(o.order_id) AS total_orders,
SUM(o.amount) AS lifetime_value
FROM ${ref("stg_customers")} c
LEFT JOIN ${ref("stg_orders")} o
ON c.customer_id = o.customer_id
GROUP BY 1, 2

The ${ref()} syntax creates dependencies, analogous to dbt’s {{ ref() }}. The config block defines materialization type and metadata. The overall pattern is familiar to anyone who has worked with dbt, but the templating language underneath is JavaScript rather than Jinja.

For dynamic model generation, Dataform’s JavaScript templating has genuine advantages. Creating identical models for multiple countries or clients uses standard JavaScript:

definitions/country_tables.js
const countries = ["US", "GB", "FR", "DE"];
countries.forEach(country => {
publish(`reporting_${country}`)
.dependencies(["source_table"])
.query(ctx => `SELECT * FROM ${ctx.ref("source_table")} WHERE country = '${country}'`);
});

This creates four models with a simple loop. In dbt, achieving the same result requires the dbt_codegen package, external preprocessing, or increasingly convoluted Jinja. The difference matters most when your project has repetitive structural patterns across many entities.

GCP Integration Depth

The integration with Google Cloud goes well beyond hosting. Dataform connects to the GCP ecosystem at multiple levels:

IAM and access control. Dataform uses standard GCP service accounts for BigQuery access. No separate credential management system, no additional authentication layer. Teams already using GCP IAM get Dataform permissions as part of their existing policy structure.

Dataplex metadata integration. Models defined in Dataform automatically appear in Dataplex’s data catalog. Column descriptions, table documentation, and lineage flow through without manual catalog maintenance.

Scheduling and orchestration. Built-in workflow configurations handle basic scheduling. For more complex orchestration, Cloud Composer (managed Airflow) and Cloud Scheduler provide GCP-native options. Everything stays within the Google ecosystem.

VPC Service Controls. Since the 2024 migration to the GCP-hosted version, Dataform supports VPC-SC perimeters. Data stays within your security boundary without custom network engineering.

Compliance. SOC 1/2/3, HIPAA, and ISO 27001 certifications cover Dataform as part of the broader GCP compliance umbrella.

The Built-In IDE

The Cloud Console IDE provides a browser-based development experience with real-time compilation feedback and BigQuery cost estimates as you write. You see immediately whether your SQLX syntax is valid and what the compiled SQL will cost to run. No waiting for a build step. For teams that don’t need deep IDE integration with local development tools, this workflow is fast and frictionless.

The Cloud Console IDE is the only first-class development environment. dbt benefits from the Power User Cursor extension (1M+ installs) with lineage visualization, column auto-complete, and AI-powered documentation; Dataform has no comparable local IDE tooling. Development options are the browser IDE or a basic text editor with no transformation-aware features.

What Dataform Is Not

Dataform is exclusively a BigQuery transformation tool. It does not connect to Snowflake, Databricks, Redshift, or any other data platform. This is not a gap that might be filled later — it is a fundamental architectural choice. Google built Dataform to serve BigQuery users, and the deep GCP integration is a direct consequence of that focus.

It is also not an orchestrator in the way that Airflow or Dagster are. Dataform handles the transformation layer — compiling, dependency resolution, and execution of SQL models — but it does not manage extraction, loading, or cross-system workflows. For end-to-end pipeline orchestration, you still need something external.

The service has matured significantly since 2024 and is no longer an experimental beta feature. Maturity as a GCP service and maturity as a transformation ecosystem are different things. The ecosystem gaps relative to dbt remain substantial even as the core transformation engine has become production-ready.