Google Sheets as Analytics Data Source

Google Sheets is one of the most common “shadow data sources” in analytics engineering. Business stakeholders maintain mapping tables, seed data, budget figures, ad-hoc inputs, and lookup tables in spreadsheets. These Sheets are often authoritative — the sales team lives in their pipeline tracker, finance owns the budget allocation Sheet — but they exist outside the data warehouse’s control plane.

The gap this creates: your dbt models need the data, but the data isn’t in BigQuery. Your pipeline needs the data to be current, but the Sheet updates unpredictably. Your data quality checks need to validate the Sheet structure, but Sheets don’t have schema enforcement.

How Sheets Enter BigQuery

BigQuery supports three main patterns for consuming Google Sheets data.

External tables allow BigQuery to query a Sheet directly without copying data. The Sheet remains the source of truth; BigQuery reads it at query time. Setup requires sharing the Sheet with the BigQuery service account and creating an external table definition that points to the Sheet URL:

CREATE EXTERNAL TABLE my_dataset.budget_allocation
OPTIONS (
  format = 'GOOGLE_SHEETS',
  uris = ['https://docs.google.com/spreadsheets/d/SHEET_ID/edit'],
  skip_leading_rows = 1
);

External tables have limitations: no partitioning, no clustering, no caching. Every query hits the Sheet API, so performance degrades for large Sheets or high query frequency. They’re appropriate for small, infrequently queried reference data.

Snapshot import copies Sheet data into a native BigQuery table on a schedule. This is more appropriate for data that feeds pipelines — you get the performance of native storage while capturing a point-in-time snapshot. Previously required Apps Script or manual exports; gws automates this.

dbt seeds are the simplest path for truly static data: download the Sheet as CSV, commit it to the dbt project, and reference it as ref('seed_name'). Seeds work well for lookup tables that change rarely and whose changes should be tracked in version control.

What gws Changes

Before [[Google Workspace CLI (gws)|gws]], automating Sheets interactions required either Google Apps Script (limited, timeout-prone, no CI/CD integration) or the raw REST API (substantial auth and pagination boilerplate). The gws CLI wraps this behind a consistent interface that AI agents can use directly.

An agent with gws available can:

# Discover which Sheets exist in Drive that match a naming convention
gws drive files list --params '{
  "q": "name contains \"Budget\" and mimeType = \"application/vnd.google-apps.spreadsheet\"",
  "fields": "files(id,name,modifiedTime)"
}'

# Read a specific Sheet's data
gws sheets spreadsheets.values.get --params '{
  "spreadsheetId": "SHEET_ID",
  "range": "Sheet1!A1:Z100"
}'

# Write results back to a Sheet dashboard
gws sheets spreadsheets.values.update --params '{
  "spreadsheetId": "DASHBOARD_ID",
  "range": "Summary!A2",
  "valueInputOption": "USER_ENTERED"
}' --json '{"values": [["2026-03-27", "1234", "99.2%"]]}'

The repo also ships curated recipes for common patterns, including reading from a Sheet to create Calendar events (recipe-create-events-from-sheet) and composing email drafts from Doc content (recipe-draft-email-from-doc).

The Convergence Pattern

The broader trend gws represents: data infrastructure tools (BigQuery, dbt, Airflow) and productivity tools (Gmail, Sheets, Docs, Calendar) are converging through AI agent layers.

A concrete workflow that’s now achievable without custom code:

Agent discovers which Sheets exist in Drive for a client project
Agent inspects Sheet structure and validates columns against a BigQuery table definition
Agent creates or refreshes a BigQuery external table pointing to the Sheet
dbt run incorporates the Sheet data as a source
Agent composes a summary email via gws gmail with key metrics from the dbt run
Agent updates a Google Doc runbook with any schema changes detected
Agent creates a Calendar event for the next data quality review

Steps 5-7 were previously manual. gws makes them automatable from the same agent session that runs steps 1-4.

Google’s own BigQuery Data Engineering Agent (experimental as of early 2026) automates pipeline creation from natural language. The dbt Semantic Layer integrates directly with Sheets and Excel for business user access to metrics. These aren’t coincidental — they’re expressions of the same convergence.

Managing Sheets as Data Sources

Eliminating Sheets from the data chain is typically not feasible. Business stakeholders use Sheets for flexible data entry, collaborative editing, formula-driven calculations, and a UI that requires no training. Teams that have attempted to replace Sheets with database-native tools have generally found that stakeholders create new Sheets alongside them.

The practical approach is to treat Sheets as part of the data infrastructure for business-user-owned inputs, with tooling that makes them reliable sources. That means:

Schema validation at ingestion time — check column names and types before loading to BigQuery
Change detection — alert when a Sheet’s structure changes in a way that would break downstream models
Automated snapshot imports — don’t query Sheets at dbt run time; import them to native tables first
Audit trail — log which version of the Sheet fed which dbt run

gws makes the automation side of this tractable. The BigQuery external table pattern handles the query side. Together, they’re enough to make Sheets a first-class (if awkward) data source.