BigQuery Resource Hierarchy

BigQuery’s resource hierarchy follows Google Cloud’s structure with BigQuery-specific additions. Each level controls distinct concerns: billing, quotas, access, and data location.

Organization
    └── Folders
        └── Projects
            └── Datasets
                └── Tables / Views / Materialized Views / Models

Each level serves a distinct purpose, and the boundaries between them have real consequences for cost, performance, and security.

Projects as the Primary Boundary

Projects form the fundamental organizational unit in Google Cloud. Each project carries its own billing account, its own quotas, and its own API enablement. For BigQuery specifically, projects determine where query costs are billed.

Costs bill to the project running the query, not the project storing the data. If data lives in project-warehouse but queries run from project-analytics, compute costs appear on project-analytics’s bill. Storage costs stay with the project containing the data.

This separation enables powerful patterns. A central data team can manage raw data ingestion and storage in one project. Department teams can query that data from their own projects, with costs flowing to their budgets. Agencies can query client datasets while billing compute to their own accounts. See BigQuery Multi-Environment Patterns for how this plays out in practice.

Projects also scope quotas. On-demand pricing gives each project up to 2,000 concurrent slots, with 20,000 slots shared across an organization. Sharing a project between dev and prod environments means a runaway development query can affect production dashboard performance.

Datasets as Access Control Boundaries

Datasets are BigQuery’s logical containers for tables and views. They sit outside the standard GCP resource hierarchy (you won’t find them in Cloud Resource Manager) but they’re the primary unit for access control and data organization in BigQuery.

Every table inherits permissions from its parent dataset. Grant bigquery.dataViewer on a dataset, and that principal can read all current and future tables within it. This inheritance makes datasets the natural boundary for access control: group tables by who should access them, not just by what they contain. The IAM patterns note covers the specific roles and grant strategies.

Several dataset properties deserve special attention because they’re hard or impossible to change later:

Location is set at creation and cannot be changed afterward. This is the most consequential and least reversible decision you’ll make about a dataset. A dataset created in US will remain in US forever. Moving it requires exporting to Cloud Storage, creating a new dataset in the correct region, reloading all data, and updating every downstream reference. See BigQuery Regional Architecture for how to think about location choices.

Default table expiration automatically deletes tables after a specified period. This is genuinely useful for development datasets where abandoned experiments would otherwise accumulate indefinitely. Set a 7-day expiration on dev datasets and you’ll never have to manually clean up test_orders_v3_final_FINAL.

Labels enable cost tracking and resource organization. Apply labels consistently across datasets to enable meaningful cost analysis by team, project, or environment. Labels flow through to billing exports and INFORMATION_SCHEMA queries, making them essential for answering “how much does team X spend?”

Storage billing model lets you choose between logical bytes (what your data represents) and physical bytes (what’s actually stored after compression). Physical billing often reduces costs by 60-80% for highly compressible data, but requires opting in per dataset. Query INFORMATION_SCHEMA.TABLE_STORAGE to check your compression ratios before switching. The BigQuery Cost Model covers this tradeoff in detail.

Tables, Views, and Materialized Views

Standard tables store data. Views store queries that execute on read. Materialized views store precomputed query results that BigQuery maintains automatically.

The distinction between standard views and materialized views matters more than it might seem. A standard view runs its defining query every time someone reads from it. If that view aggregates a billion rows, every dashboard refresh pays for that aggregation. A materialized view precomputes results and refreshes incrementally, often reducing both latency and cost by orders of magnitude.

BigQuery’s smart tuning feature automatically routes queries against base tables to use materialized views when beneficial. If you create a materialized view aggregating daily sales, queries that request compatible aggregations will use the materialized view even if they reference the base table directly. Zero configuration required.

Naming Conventions That Scale

Naming conventions become important when managing many models across many datasets.

Use snake_case universally. BigQuery is case-sensitive, and mixing conventions creates confusion and bugs. Use semantic prefixes that communicate a table’s role in the transformation pipeline (these align with the base, intermediate, and mart layers in dbt):

Prefix	Purpose	Example
`raw_` or `src_`	Untransformed source data	`raw_shopify_orders`
`base__`	Cleaned, renamed, typed	`base__shopify__orders`
`int__`	Intermediate transformations	`int__orders__daily_aggregates`
`mrt__`	Business-specific data marts	`mrt__marketing__attribution`

The double-underscore convention (base__shopify__orders) clarifies source system identification and scales elegantly. When you have 30 source systems, knowing that base__shopify__orders and base__netsuite__orders come from different sources at a glance saves real time.

Anti-Pattern: Dataset Sprawl

Dataset sprawl emerges gradually: test_data for an experiment, backup_orders_v2 for a one-off, and so on. The result is many datasets with no clear record of which tables are authoritative.

Symptoms include:

Datasets named test, temp, backup, old, new, v2, final, final_v2
The same table name appearing in multiple datasets
No documentation of which datasets are active
IAM grants scattered across dozens of datasets with no clear pattern

Start with an audit: query INFORMATION_SCHEMA.SCHEMATA to list all datasets, check access logs to see which are queried, and archive unused datasets to a separate project. Establish naming conventions and enforce them through code review.

For teams using dbt, the dataset naming should mirror your dbt schema configuration: base, intermediate, marts in production, and dbt_<username>_base, dbt_<username>_intermediate in development. The generate_schema_name macro controls this mapping.