ServicesAboutNotesContact Get in touch →
EN FR
Notes

A garden of working notes.

Short, atomic notes on analytics engineering, dbt, BigQuery, marketing data, and AI agents. Topic guides stitch them into starting points — pick one and follow the threads. Filter by domain or topic, or just browse.

Type
Domain
Topics
Note

Terminal Fundamentals

What the terminal actually is, how it differs from a shell, and the working directory mental model that makes navigation intuitive

claude codeai
Note

Data Contract Anti-Patterns

Where data contract initiatives go wrong: misplaced enforcement, paper-only contracts, one-size-fits-all implementations, and unfunded ownership.

dbtdata qualitydata engineering
Topic guide

HubSpot to BigQuery Pipeline Hub

All the moving parts for a HubSpot-to-BigQuery pipeline with dbt: associations, lifecycle stages, deal stages, property history, ingestion tools, and the dbt_hubspot package.

dbtbigquerydata modelingdata engineering
Note

How Lightdash Connects to Your dbt Project

The three mechanisms for connecting Lightdash to a dbt project — Git repository integration, CLI deployment, and CI/CD automation — and how Lightdash generates a BI layer from dbt YAML.

dbtanalyticsdata modeling
Note

Data Contract Definition

What a data contract is, how it differs from schema tests and data quality checks, and why the 'non-consensual API' framing matters.

dbtdata qualitydata engineering
Note

MCP Data Quality Server Pattern

A practical MCP server pattern for data quality — running validation checks, retrieving quality scores, and surfacing tables that need attention.

mcpdata engineeringdata quality
Note

BigQuery Row Access Policies

Dynamic row-level filtering using CREATE ROW ACCESS POLICY — replace per-segment views with policies that apply automatically based on querying user identity.

bigquerygcpdata engineering
Note

Attribution Model Disagreement as Signal

Why running multiple attribution models in parallel reveals more than any single model, and how to use the disagreement between them to communicate uncertainty and drive better decisions

ga4analyticsdata modeling
Note

GA4 BigQuery Schema Hub

Hub note connecting all concepts needed to understand and query the GA4 BigQuery export schema — table types, nested structures, gotchas, and query patterns.

ga4bigqueryanalyticsdata engineering
Note

Contract-First Development in dbt

Defining the contract before writing the SQL — the API design analogy, the workflow, and how ODCS + Data Contract CLI can generate dbt model YAML.

dbtdata engineeringdata qualitydata modeling
Note

IAM Drift Monitoring for GCP

Catch IAM debt before it accumulates — IAM Recommender, INFORMATION_SCHEMA job monitoring, and audit log queries to detect permission drift quarterly.

gcpbigquerydata engineering
Note

BI Tool Self-Hosting and Licensing

How MIT, AGPL, and proprietary licensing affect what you can do with self-hosted BI tools — feature gates, copyleft obligations, and what 'free' actually means for Lightdash, Metabase, and Looker.

dbtanalyticscost optimization
Note

BigQuery Cost Optimization

A structured guide to BigQuery cost optimization covering the cost model, query patterns, dbt configurations, pricing models, storage billing, and governance.

bigquerydbtcost optimizationdata engineering
Note

Lightdash in Production: Kubernetes Deployment

Moving Lightdash from Docker Compose to Kubernetes with the community Helm chart — production checklist, external dependencies, authentication options, and upgrade strategy.

dbtanalyticsdata modeling
Note

MCP SDK Selection for Data Engineering

Choosing between the Python and TypeScript MCP SDKs — installation, capabilities, and which one fits your data engineering team.

mcpdata engineering
Note

Analytics Engineer Skills in the Agent Era

Seven skills worth investing in now that agents handle execution — AI orchestration, specification engineering, critical code review, domain expertise, governance, systems thinking, and tool fluency.

claude codedbtaiautomation
Note

Microbatch Backfill and Full Refresh Protection

How to use dbt's built-in microbatch backfill commands, retry failed batches, and protect large incremental tables from accidental full refreshes.

dbtincremental processingdata engineering
Note

dbt Documentation CI Enforcement

Tools and patterns for enforcing dbt documentation completeness in CI — dbt-coverage, dbt-checkpoint, dbt-score, and dbt-bouncer

dbtdata qualityautomation
Note

Google Ads DTS dbt Integration

How to model Google Ads BigQuery DTS tables in dbt — source configuration, incremental strategy for partition replacement, and conversion lookback windows.

google adsbigquerydbtdata engineering
Note

dlt and BigQuery Integration

How dlt loads data into BigQuery — the two loading strategies (streaming vs. GCS staging), the bigquery_adapter for partitioning and clustering, nested JSON normalization, and the metadata tables dlt creates.

dltbigquerydata engineeringetl
Note

dbt observe-fix remediation pattern

How to embed self-healing logic directly in the dbt DAG by detecting problems in base models and applying fixes in downstream layers.

dbtdata qualitydata modeling
Note

BigQuery Cost Governance Guardrails

Query-level limits, project-level quotas, authorized views, and access patterns that prevent expensive BigQuery mistakes before they happen.

bigquerygcpcost optimizationdata engineering
Note

Elementary edr monitor alerting

How edr monitor works, how it differs from edr report, and how to configure alert metadata in model YAML to control who gets notified and when.

dbtelementarydata qualityautomation
Note

Baseline vs. Autoscaling Slots in BigQuery

How baseline and autoscaling slots work in BigQuery Editions -- guaranteed capacity vs. elastic scaling, the 60-second autoscale window, and slot usage priority.

bigquerycost optimization
Note

dbt-project-evaluator for documentation enforcement

How dbt-project-evaluator and dbt_meta_testing enforce documentation completeness in CI — materializing coverage as models and setting folder-level requirements

dbtdata qualityautomation
Note

Attribution Dashboard Design

How to design attribution dashboards for multiple audiences — essential metrics, audience-tiered hierarchy, Looker Studio implementation patterns, and working around BI tool limitations

bigqueryga4analytics
Note

Analytics Engineer as Director of AI

The role identity shift as agents take over execution — from producing analytical work to directing it. What stays human, what moves to agents, and how to think about your own value in the transition.

claude codedbtaiautomation
Topic guide

Lightdash + dbt YAML: Metrics Reference Hub

Hub note for Lightdash metric configuration in dbt YAML — dimensions, metric types, joins, and scaling organization.

dbtanalyticsdata modeling
Note

Elementary alert fatigue reduction

How to configure suppression intervals, alert grouping, and sampling controls in Elementary to keep signal-to-noise ratio high as test suites grow.

dbtelementarydata qualityautomation
Note

Modern BI Landscape

Hub note for understanding BI in 2026 — the semantic layer, metrics-as-code, headless BI, dbt centrality, and how to choose a tool

dbtbigquerysnowflakeanalytics
Note

dbt Package CI/CD

How to set up CI/CD for dbt packages — matrix testing across warehouses and dbt versions with GitHub Actions, credential management, and the integration test workflow.

dbttestingautomation
Note

dbt Doc Block File Organization

How to organize doc block files in a dbt project — per-directory, per-model, centralized, and hybrid approaches with practical tradeoffs

dbtdata modelingdata engineering
Note

Medallion Lakehouse on GCP

How the bronze-silver-gold medallion architecture maps to BigQuery table types, with BigLake Iceberg for flexibility and native tables for performance.

bigquerygcpdata engineeringdata modeling
Note

Fivetran dbt Packages for CRM

What dbt_salesforce and dbt_hubspot provide out of the box — model coverage, configuration, pass-through columns, history mode support, and naming convention tradeoffs.

dbtbigquerydata modelingdata engineering
Note

BigQuery Editions and Slot-Based Pricing

When to switch from on-demand to slot-based pricing, how autoscaling works, committed use discounts, and a feature comparison across BigQuery editions.

bigquerygcpcost optimization
Note

The Rule of Three for dbt Macros

Why you should wait until the third occurrence of a pattern before extracting a dbt macro — and what goes wrong when you don't.

dbtdata modelingdata engineering
Note

GA4 event_params Type Detection

How GA4 auto-detects parameter types across string_value, int_value, and double_value fields — and the defensive COALESCE pattern when the type isn't guaranteed.

ga4bigqueryanalyticsdata engineering
Note

GA4 Reporting Identity Modes

How GA4's three reporting identity modes (Blended, Observed, Device-based) apply user resolution in the interface — and why none of that logic reaches BigQuery.

ga4bigqueryanalyticsdata quality
Note

dbt Source Schema Validation

How to validate source schema in dbt when contracts can't reach — using dbt-expectations on sources to catch column drift before transformation runs.

dbtdata qualitytesting
Note

dlt RESTClient Mechanics

How dlt's RESTClient works — instantiation, the paginate() method, key parameters, and built-in error handling with retry and backoff.

dltdata engineeringetl
Topic guide

OpenClaw Reporting Assistant

A reading map for the OpenClaw client KPI reporting guide — GA4 skill integration, dashboard scraping tradeoffs, direct warehouse queries, multi-client architecture, and Slack summary formatting.

ga4bigquerysnowflakeautomation
Note

Entity-Centric Naming for dbt Intermediate Models

Why intermediate models should be named for the entity they represent, not the transformation they perform — and the self-documenting join notation that makes it work.

dbtdata modelingdata engineering
Note

Elementary alerting hub

A reading path through Elementary's alerting system -- from the edr monitor command through Slack/Teams setup, filter-based routing, alert fatigue reduction, and on-call strategy.

dbtelementarydata qualityautomation
Note

Elementary HTML report generation

How the edr report command works, which flags matter in practice, and patterns for generating targeted reports for different audiences.

elementarydbtdata qualitytesting
Note

Claude Code Slash Commands for dbt

How to create custom slash commands in Claude Code that automate repeatable dbt workflows — test generation, model documentation, and prompt validation

claude codedbtautomationai
Note

MCP Server Project Setup

Step-by-step project initialization for a custom MCP server — directory structure, dependencies, client installation, and the typical project layout.

mcpdata engineering
Note

EL Tool Schema Contract Modes

How dlt, Fivetran, and Airbyte handle schema changes during extraction and loading — from dlt's granular freeze/evolve/discard modes to Fivetran's blunt blocking settings.

dltdata qualitydata engineeringetl
Note

Data Observability Tool Landscape

A reference comparison of data observability tools in 2026 — Elementary, Monte Carlo, Soda, Bigeye, Datafold, and Atlan — covering capabilities, pricing, and positioning.

dbtelementarydata qualitydata engineering
Note

dbt Cross-Database Array Operations

How array syntax diverges across BigQuery, Snowflake, and Databricks — UNNEST vs LATERAL FLATTEN vs EXPLODE — and dispatch macros to handle it.

dbtbigquerysnowflakedatabricks
Note

Dataform-to-dbt Migration Hub

Hub note connecting all garden notes related to migrating from Dataform to dbt — decision criteria, concept mapping, templating differences, and validation.

dbtdataformbigquerydata engineering
Note

Floating-Point Precision in Data Comparison

Why exact equality fails for floating-point values in data comparison, and practical strategies for handling precision mismatches.

dbtdata qualitytesting
Note

MCP Context Window Overhead

The concrete token cost of MCP tool definitions in an LLM's context window — measurements from Anthropic and practitioners, and why it matters for long sessions.

mcpclaude codeaicost optimization
Note

BI Tool Migration and Portability

Switching costs between BI tools depend on where your metric definitions live. LookML is proprietary and expensive to migrate away from. dbt YAML and Metabase's per-question definitions are more portable.

dbtanalyticsdata modeling
Note

GA4 Event Data Structure

How GA4 structures event data in BigQuery — the event model, nested parameters, and the patterns you need to query it effectively.

ga4bigqueryanalyticsdata modeling
Note

Data Contract Adoption Challenges

Why data contract initiatives fail — the execution gap between contract-as-documentation and contract-as-enforcement, and the cultural change that matters more than the YAML.

dbtdata qualitydata engineering
Note

Zero-Downtime Table Materialization in dbt

A custom dbt materialization that builds to a temp name, validates row counts, then swaps via rename — keeping the old table queryable until the new one is confirmed ready.

dbtbigquerydata engineeringdata quality
Note

Airbyte Pricing and Self-Hosting Costs

Airbyte's February 2025 capacity-based pricing model and the hidden infrastructure costs of self-hosting — NAT Gateway, Kubernetes overhead, and what 'free' actually costs.

data engineeringetlcost optimization
Note

GitHub Actions for dbt Scheduling

Using GitHub Actions scheduled workflows as a zero-infrastructure dbt runner — what it covers well, where it falls short, and when to use it over Cloud Run.

dbtgcpdata engineeringautomation
Note

GTM Server-Side Hosting on AWS

How to host the GTM Server-Side tagging container on AWS using ECS Fargate, why App Runner costs more, and why Lambda is architecturally incompatible.

gtmanalyticscost optimization
Note

Fivetran MAR Pricing Shift

How Fivetran's March 2025 shift to per-connector MAR pricing broke the economics of managed ELT — bulk discount elimination, 4-8x cost increases, and the marketing data problem

dltbigquerycost optimizationetl
Note

Browser Cookie Restrictions in 2026

How Safari ITP, Firefox Total Cookie Protection, and Chrome handle tracking cookies differently in 2026 — and why the combined effect means client-side tracking misses 20-40% of visitors.

ga4google adsanalyticsdata quality
Note

dbt Unit Test CI/CD Workflow

A production-ready GitHub Actions workflow for running dbt unit tests on BigQuery — unique CI datasets, the --empty flag, cost optimization, and production exclusion.

dbtbigquerytestingautomation
Note

Stale documentation is worse than missing documentation

Why outdated documentation that looks complete causes more damage than obvious gaps — the false confidence problem in data teams

dbtdata quality
Note

n8n RSS-to-Notion Workflow

How to build an automated RSS reader that fetches, cleans, and stores articles in Notion using n8n, Jina AI, and ChatGPT.

automationai
Note

GA4 Identity Graph in BigQuery

How to build a production identity graph from GA4 BigQuery data — mapping user_id to all associated devices, detecting shared devices and anomalies, and structuring forward and reverse lookups.

ga4bigquerydbtanalytics
Note

When to Write Custom dbt Materializations

Decision framework for when custom dbt materializations are worth the maintenance burden versus post-hooks, macros, or built-in incremental strategies.

dbtdata engineeringdata modeling
Note

OpenClaw Persistent Memory for dbt Context

How to load dbt project documentation, schema descriptions, and failure history into OpenClaw's persistent memory so that monitoring reports include business context rather than just technical output.

dbtdata qualityautomationai
Note

BigQuery Data Lake Patterns

A reading guide for understanding BigQuery data lake architecture: table types, the medallion lakehouse pattern, catalog strategy, performance, cost optimization, and common mistakes.

bigquerygcpdata engineeringdata modeling
Note

dbt Project Structure and Naming

How to organize a dbt project — folder structure, model naming conventions, layer responsibilities, and dbt_project.yml configuration patterns

dbtdata modelingdata engineering
Note

BigQuery ML for Lead Scoring

Train a logistic regression or boosted tree model to predict lead conversion directly in BigQuery SQL — including the TRANSFORM clause, class imbalance, and how to evaluate model quality.

bigquerydbtanalyticsai
Note

Salesforce Ingestion Tool Selection

Choosing between Fivetran, Airbyte, dlt, Hevo, and custom Python for Salesforce extraction — connector mechanics, cost realities, and the AppExchange dispute.

dbtbigquerydata engineeringetl
Note

Headless BI Pattern

The architectural pattern of decoupling the semantic layer from visualization — exposing metrics via APIs so any frontend, AI agent, or application can consume governed data

dbtanalyticsdata modeling
Note

GA4 dbt Project Configuration

The dbt_project.yml setup for a GA4 project — variable-driven configuration, folder-level materializations, and the project variables that make the template reusable.

ga4dbtbigquerydata modeling
Note

dbt Hub Publishing

How to publish a dbt package to the dbt Hub — requirements, the registration process, hubcap automation, and best practices for version management.

dbtdata engineering
Note

dbt Test Severity and Performance Tuning

How to configure dbt test severity levels, optimize expensive tests on BigQuery, and structure test execution for cost-effective data quality.

dbtbigquerydata qualitytesting
Note

BigQuery MCP Toolbox Setup

Installing and configuring Google's open-source MCP Toolbox for Databases — the self-hosted option for connecting BigQuery to AI assistants with ADC authentication.

mcpbigquerygcpclaude code
Note

Salesforce vs HubSpot Data Models

How Salesforce and HubSpot structure CRM data differently — metadata-driven relational models vs many-to-many associations — and what that means for warehouse modeling.

dbtbigquerydata modelingdata engineering
Note

The full_refresh: false Guard in dbt

When and why to set full_refresh: false on dbt incremental models — preventing accidental multi-hour rebuilds while keeping intentional full refreshes possible.

dbtincremental processingdata engineering
Note

dlt Authentication Patterns

The authentication strategies dlt provides for API pipelines — bearer tokens, API keys, OAuth2 client credentials — and how to extend them for non-standard flows.

dltdata engineeringetl
Note

MCP Transport Configuration

Practical configuration for MCP's two transport modes — stdio for local development and streamable HTTP for production deployment.

mcpdata engineering
Note

BigQuery Data Lake Common Mistakes

Three anti-patterns that cause the most problems in BigQuery data lake implementations: missing metadata caching, skipped partition filters, and over-engineered architectures.

bigquerygcpdata engineeringcost optimization
Note

Google Workspace CLI for AI Agents (Hub)

Hub note for the gws CLI ecosystem — the tool itself, agent-first design principles, OAuth setup, CLI vs MCP tradeoffs, and Sheets as a data source.

gcpmcpclaude codeautomation
Note

Dataform Dynamic Model Generation

How Dataform's JavaScript enables programmatic DAG construction — generating dozens of models from a single loop — and what dbt teams do instead.

dataformdbtbigquerydata engineering
Note

Context Window Compaction and Agent Safety

How LLM context window compaction causes AI agents to lose or deprioritize stop commands during long-running tasks — and why bulk data operations are the highest-risk scenario.

aiautomationdata engineering
Note

dlt Secrets Management

How dlt's configuration hierarchy keeps credentials out of code — the priority order, secrets.toml for local development, environment variables for CI/CD, and vault integrations.

dltdata engineeringetl
Note

dbt Package Anatomy

What makes a dbt package different from a regular project — the three design principles, standard directory structure, and dbt_project.yml configuration for reusable packages.

dbtdata engineering
Note

Unified Ad Model Downstream Patterns

What becomes practical once you have a unified cross-platform ad model — blended ROAS, budget pacing, and Marketing Mix Modeling data preparation.

dbtgoogle adsanalyticsdata modeling
Note

CRM Modeling Patterns in dbt

How to apply the three-layer dbt architecture to Salesforce and HubSpot data — base model conventions, intermediate enrichment, mart design, and incremental strategies.

dbtbigquerydata modelingdata engineering
Note

dbt Testing Decision Framework

A three-question framework and decision tree for choosing the right dbt testing approach — unit tests, generic tests, singular tests, dbt-expectations, Elementary, or dbt-audit-helper.

dbttestingdata quality
Note

Cloud Functions as a dbt Execution Environment

When and why to use Google Cloud Functions to run dbt Core — how it compares to Cloud Run Jobs, what it's good at, and where it falls short.

dbtgcpdata engineeringautomation
Note

Fivetran-dbt Merger and Orchestration Independence

Why the October 2025 Fivetran-dbt merger makes external orchestration more strategically important — vendor optionality, platform lock-in risk, and the case for controlling your orchestration layer.

dbtdata engineeringautomation
Note

dbt Docs Markdown Capabilities

What Markdown works in dbt docs and what does not — supported syntax, YAML scalar styles, image embedding, cross-referencing models, and known limitations

dbtdata modeling
Note

GA4 User Backstitching

How to retroactively apply GA4 user_id to anonymous sessions in the warehouse — the SQL pattern, shared device handling, and when backstitching is worth the complexity.

ga4bigquerydbtdata modeling
Note

Build vs. Buy Data Pipeline Economics

The three converging shifts that flipped the build-vs-buy calculation for data pipelines — pricing changes, AI-assisted development velocity, and open-source maturity

dltbigquerydata engineeringetl
Note

OpenClaw vs Claude Code vs Cursor for Data Work

A clear-eyed comparison of three AI tools data people actually use — what each is for, where each falls short, and why the best practitioners run all three as a layered stack.

claude codeaiautomationdata engineering
Note

GA4 dbt Unnesting Layer Architecture

How to structure a dbt project for GA4 unnesting — base layer for parameter extraction, intermediate for event-specific models, mart for analytics-ready aggregations.

ga4dbtbigquerydata modeling
Note

Pipeline retry and catch-up patterns

How to configure retries, exponential backoff, and catch-up mechanisms in data pipelines so that transient failures resolve themselves without human intervention.

gcpdata engineeringautomation
Note

BigQuery HyperLogLog Sketches

How HyperLogLog++ sketches in BigQuery enable composable, approximate distinct counts at a fraction of the cost of exact counting.

bigqueryanalyticscost optimization
Note

Attribution Touchpoint Table Design

How to design and build the touchpoint table that all attribution models consume -- field requirements, identity considerations, and the intermediate dbt model that maps raw events to attribution-ready rows

bigquerydbtdata modelinganalytics
Topic guide

LinkedIn Ads Pipeline — Hub

data engineeringetldata modeling
Note

dbt-audit-helper Hub

Hub note for dbt-audit-helper — the progressive validation workflow, macro reference, CI/CD integration, and related comparison topics.

dbtdata qualitytesting
Note

Privacy Constraints for Linked Analytics Data

GDPR and CNIL implications when linking GA4 cookie identifiers to CRM contact records — consent exemption loss, right to deletion cascades, and the architectural requirements for compliant Customer 360 models.

ga4bigquerydbtanalytics
Note

Elementary for dbt

How Elementary extends dbt with data observability — anomaly detection, automated freshness monitoring, test result history, and Slack alerting

dbtelementarydata qualitytesting
Note

AI Tool Tiers for Data Engineering

The four capability tiers of AI tools for data engineering — autonomous agents, copilots, chat assistants, and platform-embedded AI — and why context determines which tier delivers value

claude codedbtbigquerysnowflake
Note

First-Party Data and Compliance Hub

Hub connecting the browser restrictions, server-side infrastructure, EU/US legal frameworks, and identity resolution approaches that together determine how much advertising and analytics signal you can legally collect in 2026.

ga4google adsanalyticsdata quality
Note

dbt-utils generate_surrogate_key

How generate_surrogate_key works, why null handling matters, and why migrating from the old surrogate_key() macro can silently break incremental models and snapshots.

dbtdata modelingdata engineering
Note

dbt Service Account Setup for Multi-Project GCP Architectures

How to create and configure a dbt service account when your source data, transformation output, and compute infrastructure live in separate GCP projects.

dbtgcpbigquerydata engineering
Note

Claude Code for dbt Development

A reading path through the core workflows for using Claude Code in a dbt project — base models, tests, documentation, debugging, refactoring, and prompting.

claude codedbtdata engineeringai
Note

Looker Studio Limits and Upgrade Path

The hard technical limits of Looker Studio that optimization can't fix, what Looker Studio Pro actually adds, and when to evaluate enterprise Looker or alternative BI tools.

bigqueryanalytics
Note

LinkedIn Marketing API Access

How to get approved for LinkedIn's Marketing API — the developer app setup, super admin verification, manual review process, rejection handling, and what to include in your application.

data engineeringetl
Note

Elementary custom BI dashboards

How to build custom data quality dashboards in any BI tool by querying Elementary's warehouse tables directly, with example SQL for the most useful metrics.

elementarydbtdata qualityanalytics
Note

dbt Dispatch Configuration

How to configure dbt's dispatch search order in dbt_project.yml — overriding package macros, adding Databricks support via spark_utils, and namespace resolution.

dbtdata engineering
Note

Essential Terminal Commands

The core terminal commands for navigation, file operations, viewing content, and finding things — the foundation of terminal literacy

claude codeai
Note

dbt Testing Taxonomy

A taxonomy of dbt test types — generic tests, singular tests, unit tests, contract tests, and data quality packages like dbt_expectations

dbttestingdata quality
Note

Ad Pipeline Engineering Challenges

The operational challenges of maintaining advertising data pipelines — API rate limits, schema changes, attribution window normalization, currency handling, and privacy compliance

google adsdata engineeringetl
Note

BigQuery Partitioning Mechanics

How BigQuery partitioning physically divides tables, the three partitioning types, key constraints, and when partition pruning does and doesn't work.

bigquerydata engineeringdata modeling
Note

Incremental Models in dbt

How dbt incremental models work, when to use them, the available strategies, and the trade-offs you need to understand.

dbtincremental processingdata modelingcost optimization
Note

MCP Client Landscape

The major MCP clients — desktop apps, code editors, and CLI tools — and how to choose between them based on your workflow.

mcpdata engineeringai
Note

Building MCP Apps Visualization Server

How to build a custom MCP Apps visualization server in TypeScript — registering app tools with UI metadata, serving HTML resources, and implementing the client SDK for bidirectional communication.

mcpclaude codeaidata engineering
Note

LLM Accuracy With Semantic Layers

Research benchmarks showing how semantic layers improve LLM accuracy on enterprise data questions from ~17% to 54-92% — the data.world study, Spider 2.0, and dbt Labs replication.

dbtsnowflakedatabricksai
Note

dbt Packageable Model Patterns

Three patterns that make dbt models installable by anyone — configurable sources with var(), enable/disable flags, and namespaced model names.

dbtdata modelingdata engineering
Note

dbt Unit Test CLI Commands

How to run, filter, debug, and exclude dbt unit tests from the command line — including output interpretation and production exclusion patterns.

dbttesting
Note

Metric Naming Conventions in dbt

How to name MetricFlow metrics so they stay discoverable and consistent as your project scales — patterns by metric type, grouping families, and the name vs label distinction

dbtdata modelinganalytics
Note

dbt documentation coverage tracking

Measuring and trending dbt documentation coverage over time with dbt-coverage, dbt-score, and dbt Cloud — moving beyond pass/fail CI checks to spot erosion early

dbtdata qualityautomation
Note

GA4-Specific dbt Testing Patterns

Data quality tests for GA4 dbt projects that catch tracking failures standard schema tests miss — missing session_start events, orphaned transactions, suspicious session metrics.

ga4dbtbigquerydata quality
Note

Dagster Fundamentals Hub

Hub note connecting all Dagster core concept notes — the asset-centric model, SDAs, resources, components, UI, pricing, GCP deployment, learning curve, and the dbt integration.

dbtbigquerygcpdata engineering
Note

BigLake Performance Characteristics

How BigLake external and Iceberg tables perform relative to native BigQuery tables, the role of metadata caching, and where the remaining gaps matter.

bigquerygcpdata engineeringcost optimization
Note

dbt Model Description Writing Patterns

Practical patterns for writing dbt model, column, and source descriptions that serve both business users and engineers — the three-question framework and when to use meta instead of description

dbtdata engineeringdata quality
Note

Base Model Generation with Claude Code

How to use Claude Code to generate dbt base models — the pattern-replication workflow, prompting constraints, and CLAUDE.md defaults that eliminate inconsistency.

claude codedbtdata engineeringai
Note

Data Contract Rollout Change Management

The organizational change management strategy for data contracts: start with two datasets, create urgency through visible cost, and measure conversations rather than coverage.

dbtdata qualitydata engineering
Note

dbt-utils Hub

Navigation hub for dbt-utils v1.3 — the full scope of the package, what moved to dbt-core, and pointers to each section of the reference.

dbtdata engineeringdata modeling
Note

Dataform Ecosystem and Tooling Gaps

Where Dataform falls short beyond testing — CI/CD automation, IDE tooling, package ecosystem, and platform lock-in compared to dbt

dataformdbtbigquerygcp
Note

GA4 Consent Mode Orphaned Events

How Consent Mode creates rows in GA4 BigQuery exports with null user_pseudo_id and session identifiers — what they are, how they affect counts, and same-page backstitching behavior.

ga4bigqueryanalyticsdata quality
Note

SCD Type 2 with dbt Snapshots

How dbt snapshots implement slowly changing dimension Type 2 — tracking every version of a record over time with timestamp and check strategies, plus Fivetran History Mode as an alternative.

dbtbigquerydata modelingdata engineering
Note

dbt Base Layer Patterns

What belongs in dbt base models — renaming, casting, deduplication, unnesting — and the one exception to the no-joins rule.

dbtdata modelingdata engineering
Note

Dagster Learning Curve for Analytics Engineers

Where the friction shows up when analytics engineers adopt Dagster — Python proficiency, conceptual overhead, manifest management, pricing surprises, and the best onboarding path.

dbtdata engineering
Note

dbt documentation automation strategy

A graduated approach to automating dbt documentation freshness — from a single pre-commit hook to comprehensive drift detection, coverage tracking, and AI remediation

dbtautomationdata quality
Note

Event-Grain Sessionization

Why enriching events with session context beats building session-grain tables, and how the pattern enables flexible downstream analysis.

ga4bigquerydata modelinganalytics
Note

dbt Model Contract Mechanics

How dbt's native model contracts work — the preflight check, DDL generation, fail-fast behavior, configuration options, and what contracts do and don't validate.

dbtdata qualitydata modeling
Note

dlt Dependent Resources

How dlt lets one resource use another's output to configure its endpoint — the path template syntax for multi-step API traversal.

dltdata engineeringetl
Note

AI Query Cost Control for BigQuery MCP

Managing the cost and safety risks of AI assistants running BigQuery queries through MCP — cost mitigation, write protection, and practical guardrails.

mcpbigquerygcpai
Note

OpenClaw Morning Briefing Pattern

How to configure an OpenClaw cron job to deliver a daily personal briefing — covering calendar, email priority, pipeline status, and time tracking — to Telegram before your first coffee.

automationai
Note

dbt Unit Test Mocking Dependencies

How to mock refs, sources, macros, variables, and the 'this' keyword in dbt unit tests — with patterns for multi-join models and incremental overrides.

dbttesting
Note

Consent Mode Debugging Network Parameters

How to decode the gcs and gcd parameters in Google Analytics network requests to verify Consent Mode implementation without relying on CMP interfaces.

ga4google adsanalyticsdata quality
Note

GA4 BigQuery Number Discrepancies

Why your BigQuery session and user counts won't match the GA4 interface, and the practical approach to handling the 1-5% variance.

ga4bigqueryanalyticsdata quality
Note

Preparing for the dbt Analytics Engineering Certification

What the dbt developer certification actually tests, where people get tripped up, and how hands-on project experience matters more than studying.

dbtdata engineeringdata modeling
Note

Agentic AI Fit for Data Work

Why data engineering is structurally well-suited for agentic AI tools — repetitive patterns, multi-language context-switching, and cross-layer debugging make the case.

claude codedbtdata engineeringai
Note

dbt Unit Test File Organization

Where to put dbt unit test files, how to name tests consistently, and the co-location pattern with _unit_tests.yml.

dbttesting
Note

direnv for Multi-Client GCP Credential Management

Automate per-project GCP credential loading with direnv — .envrc configuration, the four-variable pattern, and a five-minute setup for each new client.

gcpdbtdata engineeringautomation
Note

OpenClaw Cron Scheduler Mechanics

How OpenClaw's built-in cron scheduler works — session modes, job persistence, exponential backoff, and the configuration patterns that make scheduled monitoring reliable.

automationdata engineering
Note

BI Tool Selection Framework

A decision framework for choosing a BI tool in 2026 — four key questions, a comparison of Lightdash vs Looker vs Metabase, and the market landscape from dbt-native to enterprise tools

dbtbigquerysnowflakeanalytics
Note

dbt-utils Web Macros for URL Parsing

dbt-utils URL extraction macros for marketing analytics: get_url_host, get_url_path, and get_url_parameter. What they do, where they're useful, and what they don't handle.

dbtanalyticsdata engineering
Note

Dataform Decision Framework

When Dataform is the right choice and when dbt wins — a decision framework based on platform commitment, budget, team preferences, and use case complexity

dataformdbtbigquerydata engineering
Note

dbt Built-In Cross-Database Macros

Reference for dbt's built-in cross-database macros in the dbt namespace — dateadd, datediff, safe_cast, concat, type helpers, and the migration path from dbt_utils.

dbtdata engineering
Note

Templating Language and Team Skills

How a team's existing skill mix — SQL practitioner, Python engineer, JavaScript developer — should shape the choice between Jinja and JavaScript templating in analytics engineering.

dbtdataformdata engineeringanalytics
Note

GA4 Channel Grouping Macro

A dbt macro that encapsulates Google's default channel grouping logic as reusable SQL, with the regex patterns and edge cases you need to know.

ga4dbtbigqueryanalytics
Note

GCP Authentication for Multi-Client Consulting Work (Hub)

Hub note for GCP credential isolation across multiple client projects — the problem, the four-variable solution, tool-specific agent constraints, and the service account vs impersonation tradeoff.

gcpclaude codedbtdata engineering
Note

Google Sheets as Analytics Data Source

How Google Sheets functions as a shadow data source in GCP analytics stacks — the integration patterns, the automation gap gws fills, and the convergence of data and productivity tooling.

gcpbigquerydata engineeringanalytics
Note

Dagster Full-Stack Pipeline Architecture

How Dagster unifies ingestion, transformation, Python processing, and downstream triggers in a single asset graph — the pattern that justifies Dagster over simpler orchestration approaches.

dbtbigquerydata engineeringautomation
Note

dbt Contract Rollout Strategy

How to adopt dbt model contracts in an existing project — identifying candidates, scaffolding YAML, phased enablement, and CI/CD integration for governance-only checks.

dbtdata qualitydata modeling
Note

GA4 Schema Evolution Monitoring

GA4's BigQuery schema changes without announcements and new fields are never retroactive. How to detect additions before they break production queries.

ga4bigqueryanalyticsdata engineering
Note

Per-Workload Service Account Naming Conventions

One service account per workload with a compute-platform prefix — so logs, cost attribution, and incident response all point to the right place immediately.

gcpbigquerydata engineering
Note

dbt Slot Management on BigQuery

How dbt's execution model interacts with BigQuery slots -- why dbt is compute-heavy, the multi-project workaround, and best practices for sizing slots for dbt workflows.

bigquerydbtcost optimizationdata engineering
Note

Privacy Sandbox Collapse

How Google's Privacy Sandbox went from the industry's best hope for a cookie replacement to a quiet retirement — the timeline, what survived, and why it sealed the case for server-side infrastructure.

ga4google adsanalyticsdata quality
Note

dbt Core vs Cloud Decision Framework

A structured comparison of dbt Core and dbt Cloud across deployment, interface, features, pricing, and team profile -- with decision heuristics for choosing between them.

dbtdata engineeringcost optimization
Note

Attribution Lookback Windows

How to set attribution lookback windows by industry and purchase cycle -- benchmarks, consequences of wrong windows, and implementation in SQL

bigqueryanalyticsdata modeling
Note

CLI vs MCP for AI Agents

The practical tradeoffs between CLI commands and MCP tool calls for AI agent workflows — benchmark data, token efficiency, and when each approach wins.

mcpclaude codeaiautomation
Note

AI Tooling Cost for Solo Consultants

What a four-layer AI stack actually costs per month for an independent analytics engineering consultant — tool-by-tool breakdown, ROI assessment, and cost visibility gaps

claude codedbtaicost optimization
Note

dbt-audit-helper Progressive Validation

The broad-to-narrow validation workflow for dbt-audit-helper — start with schema checks, escalate to row-level diffs only when needed.

dbtdata qualitytesting
Note

Claude Code Hooks

How hooks give Claude Code deterministic guardrails — shell commands that execute at specific lifecycle points to enforce rules, auto-format code, and block dangerous operations

claude codeautomationdata engineering
Note

Cross-Platform Ad Testing Patterns

How to test unified ad reporting models in dbt — source freshness, spend reconciliation, grain testing, and the manual checks that automated tests can't replace.

dbtgoogle adsdata qualitytesting
Note

Metric Anti-Patterns in dbt

Common mistakes when defining MetricFlow metrics — one-off models for metrics, sum-of-ratios errors, hardcoded measure filters, and missing descriptions

dbtdata modelinganalytics
Note

Unit Testing Window Functions in dbt

How to design test data that validates window function partitioning, ordering, and framing — with patterns for ROW_NUMBER, FIRST_VALUE, cumulative sums, and deliberate out-of-order inputs.

dbtbigquerytesting
Note

BigQuery CLI Capabilities Beyond MCP

What the bq command-line tool can do that BigQuery MCP servers cannot — data loading, exports, table management, and the full feature gap with examples.

bigquerygcpclaude codedata engineering
Note

SQL Dialect Divergences Across Warehouses

Where SQL syntax breaks across BigQuery, Snowflake, and Databricks — date functions, type casting, and argument ordering differences that matter for portable dbt code.

dbtbigquerysnowflakedatabricks
Note

Orchestrator Learning Curves

An honest assessment of ramp-up time and friction points for Dagster, Airflow, and Prefect — what trips up analytics engineers and what helps.

dbtdata engineering
Note

dbt Macro Deprecation Pattern

How to change macro behavior without breaking callers — the staged deprecation pattern using exceptions.warn() that dbt-utils demonstrates.

dbtdata engineeringdata modeling
Note

dbt Packages vs Mesh

When to use dbt packages (code sharing) vs dbt Mesh (data product sharing) — the conceptual distinction, practical differences, and how to choose.

dbtdata modelingdata engineering
Note

Data Comparison Tool Landscape

When to use dbt-audit-helper, Elementary, dbt-expectations, Datafold, or Soda for data comparison and validation.

dbtdata qualitytesting
Note

Salesforce Opportunity Stage Duration Analysis

How to calculate time spent in each pipeline stage using OpportunityFieldHistory and LEAD window functions — the SQL pattern, downstream analysis, and win rate metrics.

dbtbigquerydata modelingdata engineering
Note

GA4 Acquisition Performance Mart

A daily x source/medium grain mart for GA4 acquisition reporting — aggregating sessionized events into dashboard-ready metrics with conversion rates and revenue.

ga4dbtbigquerydata modeling
Note

dbt Materialization Cost Impact on BigQuery

How dbt materialization choices affect BigQuery costs -- table vs view vs ephemeral trade-offs, the view chain anti-pattern, and why defaulting to tables usually wins.

dbtbigquerycost optimizationdata modeling
Note

Your First Hour with Claude Code (Analytics Engineer)

A sequenced reading path for getting started with Claude Code as an analytics engineer — from installation through your first useful output

claude codedata engineeringai
Note

dbt Documentation Scaffolding Tools

How dbt-codegen and dbt-osmosis handle the mechanical parts of documentation — generating YAML skeletons and propagating descriptions through your DAG

dbtautomationdata quality
Topic guide

GTM Server-Side: Map of Content

Index of garden notes on GTM Server-Side — architecture, Cloud Run deployment, GA4 configuration, Meta CAPI, Google Ads, hosting costs, and common failures.

gtmga4google adsgcp
Note

BigQuery Dynamic Data Masking

Show sensitive column structure without exposing values — SHA256 hashing, nullification, and default masking for analysts who need to write queries but not read PII.

bigquerygcpdata engineering
Note

dbt Model Description Style Guide

Hub note for the dbt documentation style guide — why consistency beats effort, what to put in model and column descriptions, YAML formatting options, doc blocks, CI enforcement, and rollout strategy

dbtdata qualitydata modeling
Note

EU Cookie Consent Legal Framework

The two overlapping EU legal frameworks governing cookie consent — ePrivacy Directive and GDPR — what valid consent actually requires, which cookies are exempt, and where enforcement stands in 2026.

ga4google adsanalyticsdata quality
Note

GA4 CROSS JOIN versus LEFT JOIN UNNEST

Why the comma syntax in FROM table, UNNEST(array) silently drops rows — and when to use LEFT JOIN UNNEST to preserve events without array data.

ga4bigqueryanalyticsdata modeling
Note

dlt Incremental Loading

How dlt tracks state between pipeline runs using cursor-based incremental loading — the dlt.sources.incremental() helper, declarative REST API config, and why state lives in the destination.

dltdata engineeringetlincremental processing
Note

Lead Scoring Signal Dimensions

The four categories of signals that drive lead scoring — demographic fit, firmographic fit, behavioral engagement, and recency — and why the warehouse sees all of them when the CRM can't.

dbtbigqueryanalyticsdata modeling
Note

Removal Effect in Attribution

The removal effect measures how much conversion probability drops when a channel is removed -- the mathematical foundation of both Markov chain and Shapley value attribution

bigqueryanalyticsdata modeling
Note

OpenClaw for dbt Monitoring

Using OpenClaw as an always-on monitoring layer for dbt projects — cron-based testing, Slack alerting, mobile access, and practical use cases for solo consultants

dbtaiautomationdata quality
Note

FastMCP Server Skeleton

Minimal MCP server examples in Python (FastMCP) and TypeScript (McpServer) — the starting point for any custom server build.

mcpdata engineering
Note

AI Judgment Failures in dbt Development

The category of mistakes AI makes in dbt projects that aren't syntax errors — wrong joins, rebuilt existing assets, wrong layer sourcing — and why they require business context that no prompt can fully provide.

dbtclaude codedata engineeringai
Note

MCP Protocol Architecture

What the Model Context Protocol is, how clients and servers communicate, and why it matters for connecting AI tools to your data infrastructure.

mcpaidata engineering
Note

Claude Code Behind the Scenes

What commands Claude Code actually runs when it explores code, searches for patterns, edits files, and manages git — understanding the mechanics builds confidence and helps you learn

claude codeai
Note

MCP Data Engineering Servers

The MCP servers that actually matter for data engineering work — Snowflake, BigQuery, ClickHouse, centralmind/gateway, MindsDB, and Confluent.

mcpbigquerysnowflakedata engineering
Note

BigQuery Autoscaling Cost Overhead

Why theoretical slot-hour costs rarely match your actual BigQuery bill — the 1.5x autoscaling multiplier, 60-second billing window, and how workload shape changes everything.

bigquerygcpcost optimization
Note

dbt Operational Slash Commands

Practical Claude Code slash commands for daily dbt operations — building models, generating base models, running modified code, auditing quality, and cleaning up artifacts

claude codedbtautomationdata engineering
Note

dbt-utils SQL Generators

Reference for dbt-utils SQL generation macros: date_spine, deduplicate, star, union_relations, pivot, unpivot, and the smaller helpers. What each does, how to call it, and the gotchas.

dbtdata engineeringdata modeling
Note

Automating dbt Docs Deployment

Patterns for keeping dbt docs automatically updated — CI/CD workflows, Astronomer Cosmos operators, and tools that push documentation to platforms like Notion

dbtdata engineeringautomation
Note

AI SQL Review Tradeoffs

The practical costs of AI SQL review — false positive rates, conflicting tool feedback, CI latency, annual spend, and the configuration investment that makes it worthwhile

dbtdata qualityaitesting
Note

Window Function Patterns for Analytics SQL

Practical window function patterns for analytics SQL — ROW_NUMBER, LEAD/LAG, running totals, session detection, and deduplication

bigqueryanalyticsdata modeling
Note

GA4 dbt Package Ecosystem

An overview of the major open-source dbt packages for GA4 BigQuery exports — what they optimize for, what they miss, and when to build custom.

ga4dbtbigquerydata modeling
Note

HubSpot Lifecycle Stages in the Warehouse

How HubSpot's lifecycle stage model maps to warehouse columns, why forward-only transitions make funnel analysis straightforward, and how to handle merged contact artifacts.

dbtbigquerydata modelinganalytics
Note

Elementary data quality dashboards

Hub for building data quality dashboards with Elementary: generating reports, hosting them for team access, building custom BI dashboards, and designing KPIs.

elementarydbtdata qualityanalytics
Topic guide

OpenClaw for Freelance Consultants

A reading path through the OpenClaw admin automation use cases for solo consultants — morning briefings, expense capture, personal CRM, and meeting prep.

automationai
Note

Server-Side Cookies and Safari ITP Bypass

How setting cookies via HTTP Set-Cookie header from a same-domain server bypasses Safari's 7-day JavaScript cookie cap — the FPID mechanism, the IP mismatch problem, and the three approaches that solve it.

ga4google adsanalyticsdata quality
Note

Google Ads to BigQuery: Loading Approaches

Four ways to load Google Ads data into BigQuery — a map through the decision landscape.

google adsbigquerydata engineeringetl
Note

Salesforce Unified Activity Timeline

Combining Salesforce Tasks and Events into a single activity timeline with consistent column naming and polymorphic entity resolution.

dbtbigquerydata modelingdata engineering
Note

Orchestrator Architectural Philosophies

The three competing mental models in data orchestration — process-oriented (Airflow), data-oriented (Dagster), and function-oriented (Prefect) — and why the abstraction matters more than the feature list.

dbtdata engineeringautomation
Note

Claude Code Stop and Session Hooks

How Stop and SessionStart hooks complement per-tool hooks — running quality gates after Claude finishes responding and loading project context at session start

claude codedbtautomationdata quality
Note

Meta Ads Actions Array in BigQuery

How to flatten Meta's nested actions JSON array in BigQuery — unnesting patterns, configurable action type pivots, dbt integration, and the action_values companion field.

bigquerydbtdata engineeringdata modeling
Note

GA4 Traffic Source Fields

The four traffic source locations in GA4 BigQuery exports — their scopes, use cases, and the July 2024 cutoff that changed session attribution.

ga4bigqueryanalyticsdata modeling
Note

AI Developer Skill Atrophy

How AI coding tools affect developer comprehension — Anthropic's RCT, the delegation vs. inquiry distinction, and why how you use AI matters as much as which tools you pick

claude codeaidata engineering
Note

Alternatives to Default dbt Docs

When to move beyond the default dbt docs frontend — Dagster's Next.js replacement, dbterd for ERDs, data catalogs, and dbt Cloud Catalog

dbtdata engineering
Note

BigQuery SQL Patterns for Analytics Engineers

A reading guide to essential BigQuery SQL patterns covering query optimization, nested data, window functions, dbt incrementals, and marketing analytics.

bigquerydbtdata modelingdata engineering
Note

Elementary alert routing with filters

How to run multiple edr monitor commands with different filters to route alerts by tag, owner, status, or resource type to different channels and incident management tools.

dbtelementarydata qualityautomation
Note

BigQuery Storage Billing Strategies

Physical vs logical storage billing in BigQuery, long-term storage discounts, table expiration policies, and how to evaluate which billing mode saves money.

bigquerygcpcost optimization
Note

dbt Project Structure: Guide Hub

A hub connecting all notes on structuring a dbt project — layers, naming, materialization, YAML, modern features, and marketing analytics patterns.

dbtdata modelingdata engineering
Note

dbt Test Alert Routing and Ownership

How to route dbt test failures to the right people, configure tiered alert severity, and apply the Broken Window principle to test suite health.

dbttestingdata qualityautomation
Note

BigQuery MCP Server Setup

A reading path through connecting BigQuery to AI assistants via MCP — comparing the two official options, authentication, custom queries, and cost control.

mcpbigquerygcpclaude code
Note

dbt Unit Test Patterns

Hub note connecting all unit test patterns for dbt — incremental models, snapshots, window functions, business logic, marketing analytics, and edge cases.

dbtbigquerytesting
Note

IAM Debt Audit for GCP Data Platforms

Bash and SQL queries to surface Editor roles, service accounts with keys, and shared credentials — the starting point for any GCP IAM cleanup.

gcpbigquerydata engineering
Note

Google OAuth CLI Setup Gotchas

The specific mistakes that cause OAuth setup to fail silently for Google Workspace CLI tools — wrong application type, missing test users, and the scope limit trap.

gcpautomationdata engineering
Note

BigQuery Fine-Grained Access Control

Column-level security with policy tags, row-level security with Row Access Policies, and dynamic data masking — the three layers of fine-grained access control in BigQuery beyond basic IAM roles.

bigquerygcpdata engineering
Note

Feature Engineering for ML in dbt

How to structure dbt intermediate models as ML feature tables — including time-windowed aggregations, domain-separated feature sets, and joining them into a labeled training dataset.

dbtbigquerydata modelingdata engineering
Note

Meta Ads Pipeline Maintenance

Operational practices for keeping a Meta Ads pipeline running — token expiry monitoring, spend reconciliation, API version lifecycle management, and circuit breaker patterns.

bigquerydata engineeringdata quality
Note

Markov Chain Attribution

How Markov chains model customer journeys as state transitions to calculate data-driven attribution through transition probabilities and the removal effect

bigqueryanalyticsdata modeling
Note

MCP Data Catalog Server Pattern

A practical MCP server pattern for exposing internal data catalogs — table search, metadata retrieval, and lineage tracing as AI-accessible tools.

mcpdata engineering
Note

GTM Server-Side Hosting Costs: Self-Hosted vs Managed

The real cost of running GTM Server-Side — Cloud Run pricing by traffic tier, the Cloud Logging cost trap, and a comparison of managed alternatives (Stape, Addingwell, Cloudflare Zaraz).

ga4gcpanalyticscost optimization
Note

Agentic Workflow Shift in Data Engineering

How agentic AI tools change the data engineering workflow from manual template adaptation to describe-and-review — and why the real shift is from syntax to modeling decisions.

claude codedbtdata engineeringai
Note

GA4 dbt Project Template

Hub connecting all concepts in building a production-ready dbt project for GA4 BigQuery exports — from base model to marts, with testing and documentation.

ga4dbtbigquerydata modeling
Note

Cloud Storage Tiering for BigQuery

How to use Cloud Storage tiers and lifecycle policies alongside BigQuery for cost-effective data lake storage, including Autoclass and physical billing.

bigquerygcpcost optimizationdata engineering
Note

dbt Macro Testing Patterns

Two approaches to testing dbt macros — integration test models and dbt 1.8 unit tests — plus the compile-and-inspect workflow for debugging.

dbttestingdata engineering
Note

Ad Platform Metric Divergence

Why impressions, clicks, and conversions mean different things on Google, Meta, and LinkedIn — and why pretending they're equivalent produces misleading cross-platform reports.

google adsanalyticsdata modeling
Note

Debugging dbt with Claude Code

How to use Claude Code for dbt debugging — letting the agent face errors directly, tracing data issues through upstream models, and using subagents for complex investigations

dbtclaude codebigquerydata engineering
Note

dlt Pipeline Testing

Testing dlt pipelines locally with DuckDB before hitting production — unit tests with resource limits, integration tests for schema validation, and common debugging patterns.

dltdata engineeringetltesting
Note

HubSpot Property History Mechanics

How HubSpot's property history tables work, their retention limits, why CALCULATED properties inflate sync costs, and how to model history data without surprises.

dbtbigquerydata engineeringetl
Note

Data Contract Adoption Friction

Reducing the friction that kills data contract adoption: SDK-based onboarding, audience-specific messaging, post-mortem data as leverage, and the Data Product Manager role.

dbtdata qualitydata engineering
Note

Dagster-dbt Asset Mapping

How dagster-dbt reads your manifest.json to create one Dagster asset per dbt model, with automatic lineage from ref() calls, and how to customize the mapping with DagsterDbtTranslator.

dbtdata engineeringdata modelingautomation
Note

MCP Server Testing and Debugging

Testing MCP servers with the Inspector, the stderr logging gotcha that bites everyone, and a practical three-stage testing workflow.

mcpdata engineeringtesting
Note

Custom Sessionization Patterns

How to build custom session definitions from raw events using LAG and running sums, with configurable timeouts, campaign-based splits, and session metrics.

bigqueryga4analyticsdata modeling
Note

Dagster+ Pricing and Credit Model

How Dagster+ pricing works — the credit model (1 credit = 1 asset materialization), plan tiers, overage costs, and how it compares to dbt Cloud and Cloud Composer for analytics engineering teams.

data engineeringcost optimization
Note

Orchestrator Comparison for dbt Teams Hub

Hub note for the Dagster vs Airflow vs Prefect comparison — architectural philosophies, dbt integration depth, developer experience, pricing, learning curves, and the decision framework.

dbtdata engineeringautomationcost optimization
Note

dlt Core Concepts

The four building blocks of dlt pipelines — sources, resources, pipelines, and schemas — and the three write dispositions that control how data lands.

dltdata engineeringetl
Note

GA4 Ecommerce Checkout Funnel Pattern

Session-based checkout funnel analysis from GA4 BigQuery data — counting distinct sessions at each funnel stage from view_item through purchase.

ga4bigqueryanalytics
Note

Claude Code CLI Basics

Installation, essential CLI flags, built-in slash commands, and how to read Claude Code's output — the practical starting point for new users

claude codeai
Note

iOS 14.5 Signal Loss and Meta Measurement

How Apple's App Tracking Transparency changed Meta ad measurement — IDFA collapse, default attribution window changes, Aggregated Event Measurement, and Conversions API as the response.

bigqueryanalyticsdata engineeringdata quality
Note

BigQuery Materialized Views

How BigQuery materialized views precompute aggregations, refresh incrementally, and transparently rewrite queries for automatic optimization.

bigquerycost optimizationdata engineering
Note

Unit Testing String Extraction in dbt

How to unit test regex and string manipulation logic in dbt — edge case documentation, graceful failure handling, and regression protection for fragile parsing.

dbttesting
Note

Multi-Source Conflict Resolution

Three patterns for resolving conflicting data when merging records from multiple source systems — priority-based, recency-based, and source-specific fields.

dbtbigquerydata modelingdata engineering
Note

Ad Data Extraction Tools

Managed ELT, open-source, and native integration options for getting advertising data into your warehouse — Fivetran, Airbyte, dlt, Meltano, and BigQuery Data Transfer Service

bigqueryetldata engineering
Note

Testing Late-Arriving Data Handling in dbt

How to write dbt unit tests that simulate late arrivals, and how to use audit_helper to detect drift between incremental and full-refresh results in production.

dbtincremental processingdata qualitytesting
Note

LinkedIn Ads dbt Modeling

How to model LinkedIn Ads data in dbt — the campaign hierarchy rename, metric normalization, cross-platform integration via dbt_ad_reporting, and the incremental strategy for 90-day attribution windows.

dbtdata modelingdata engineeringincremental processing
Note

GTM Server-Side: Architecture and Four Building Blocks

How GTM Server-Side works as an intermediary layer — the request/response data flow, and the four component types (Clients, Tags, Triggers, Variables/Transformations) that make it up.

ga4google adsanalyticsdata engineering
Note

Semantic Layer Architecture

How semantic layers work in the modern data stack — competing implementations (MetricFlow, Snowflake Semantic Views, Databricks Metric Views), the OSI initiative, and why the semantic layer determines AI accuracy

dbtbigquerysnowflakeanalytics
Note

BigQuery Slots and Reservations

A reading guide to BigQuery's compute model -- slots, reservations, editions, autoscaling, fair scheduling, and slot management for dbt workflows.

bigquerydbtcost optimization
Note

Cloud Run Jobs for dbt

Why Cloud Run Jobs is the optimal dbt execution environment for most GCP teams — capabilities, container setup, authentication, monitoring, and cost profile.

dbtgcpdata engineeringcost optimization
Note

Layered SQL Review Pipeline for dbt

A four-layer architecture for SQL review in dbt projects — IDE feedback, pre-commit hooks, PR-level AI review, and CI testing — each catching a different class of error

dbtbigquerysnowflakeclaude code
Note

Google Ads BigQuery Data Transfer Service Setup

How the Google Ads BigQuery Data Transfer Service works — what it gives you, how the schema is organized, MCC vs per-account setup, and the defaults that will hurt you.

google adsbigquerygcpdata engineering
Note

Hosting dbt Docs Beyond Localhost

Deployment options for dbt docs by complexity — GitHub Pages, Netlify, GCS with IAP, S3 with CloudFront, and Docker with Nginx

dbtdata engineering
Note

Unit Testing Attribution Models in dbt

How to unit test first-touch, last-touch, and multi-touch attribution in dbt — multi-session journeys, single-touch conversions, and the no-conversion exclusion pattern.

dbttestinganalytics
Note

dlt: Python-Native Data Loading

A reading path through dlt's core mechanics — from building blocks through BigQuery-specific loading to incremental state tracking.

dltbigquerydata engineeringetl
Note

dlt Google Ads Pipeline

Building a Google Ads to BigQuery pipeline with dlt — the verified source, GAQL query patterns, incremental loading, and deployment options.

google adsbigquerydltdata engineering
Note

dbt vs Dataform Templating Hub

Navigation hub for notes comparing Jinja (dbt) and JavaScript (Dataform) templating in analytics engineering — syntax, philosophy, strengths, and team fit.

dbtdataformdata engineeringdata modeling
Topic guide

OpenClaw dbt Data Quality Assistant

A reading path through the building blocks of a 24/7 automated dbt data quality assistant — test execution and parsing, severity assessment, documentation cross-referencing, morning summaries, and an honest maturity assessment.

dbtdata qualityautomationai
Note

RAG for dbt Documentation

How retrieval-augmented generation bridges the business context gap in AI-generated dbt documentation — from full RAG pipelines to the simpler CLAUDE.md workaround

dbtclaude codeaidata quality
Note

BigQuery Partitioning and Clustering

A structured reading path for understanding BigQuery partitioning and clustering -- mechanics, decision framework, configuration patterns, and anti-patterns.

bigquerydbtdata engineeringdata modeling
Note

dlt Deployment Options

Where and how to run dlt pipelines in production — GitHub Actions, Airflow, Modal serverless, and other platforms — with the dlt deploy command as the starting point.

dltgcpdata engineeringetl
Note

GA4 Event Ordering with Batch Fields

How to use batch_event_index, batch_ordering_id, and batch_page_id for deterministic event sequencing in GA4 BigQuery exports.

ga4bigqueryanalyticsdata modeling
Note

Service Account Key Files vs Impersonation Tokens

The practical tradeoff between GCP service account key files and short-lived impersonation tokens — when each is appropriate and what the honest security calculus looks like for consultants.

gcpdata engineeringautomation
Note

Dataform-to-dbt Concept Mapping

A reference mapping of Dataform concepts to their dbt equivalents — refs, configs, sources, materializations, testing, and directory structure.

dbtdataformdata engineeringdata modeling
Note

BigQuery Remote MCP Server Setup

Google's managed BigQuery MCP endpoint — enabling the service, configuring Claude Desktop and Claude Code, and why token expiration limits its usefulness.

mcpbigquerygcpai
Note

AI Tools for dbt Documentation

A comparison of dbt Copilot, Claude Code with MCP, and Altimate AI for generating dbt model and column documentation — capabilities, limitations, and selection guidance

dbtclaude codemcpai
Note

BigLake Metastore and Catalog Strategy

Why catalog infrastructure matters more than format choice on GCP, and how BigLake Metastore and Dataplex Universal Catalog provide unified governance across engines and formats.

bigquerygcpdata engineeringdata quality
Note

GA4 BigQuery Export Table Types

The four table types in a GA4 BigQuery export dataset — daily, intraday, and user tables — their timing, limitations, costs, and when to use each.

ga4bigqueryanalyticsdata engineering
Note

dbt Unit Tests BigQuery Workarounds

BigQuery-specific gotchas for dbt unit tests — STRUCT completeness, ARRAY comparisons, column_transformations, slot costs, and common error solutions.

dbtbigquerytesting
Note

Code Generation over Tool Calling Pattern

The emerging pattern of having LLMs write code against APIs rather than generate tool calls — Cloudflare's Code Mode, Anthropic's code execution, and what it means for MCP's future.

mcpclaude codeai
Note

Lightdash Joins and Fanout Protection

How to define joins between dbt models in Lightdash YAML, why the relationship property matters for metric accuracy, and how Lightdash warns about fanout risk in one-to-many joins.

dbtanalyticsdata modeling
Note

CRM Data Architecture Hub

Hub note connecting all garden notes on modeling Salesforce and HubSpot data in a modern warehouse with dbt and BigQuery.

dbtbigquerydata modelingdata engineering
Note

dbt Package Integration Testing

The integration_tests sub-project pattern for testing dbt packages — using seeds as mock data, comparing outputs to expected results, and running the full suite.

dbttestingdata engineering
Note

dbt-audit-helper CI/CD Integration

How to integrate dbt-audit-helper into CI/CD pipelines — dbt Cloud PR jobs, GitHub Actions with --defer, and automated regression detection.

dbtdata qualitytestingautomation
Note

Looker Studio Caching Mechanics

How Looker Studio's per-chart cache works, why date range selection affects cache hit rates, the difference between owner and viewer credential caches, and how to pre-warm dashboards.

bigqueryanalyticscost optimization
Note

Cloud Run Jobs Deployment Script Pattern

An end-to-end deployment script for dbt on Cloud Run Jobs — service accounts, IAM bindings, Artifact Registry, job creation, and scheduling in a single reproducible script.

dbtgcpdata engineeringautomation
Note

The AI Production Gap in Data Engineering

Why AI gets you to 80% fast but the remaining 20% — security, compliance, temporal consistency, governance — is where most of the real work lives.

dbtaidata engineeringdata quality
Note

dbt Doc Block Jinja Limitations

What you cannot do inside dbt doc blocks — restricted Jinja context, the README parsing gotcha, and the missing column description inheritance feature

dbtdata engineering
Note

Prompting Claude Code for dbt

What separates dbt prompts that work from ones that produce generic output — specificity, codebase references, constraint encoding, and the session-less memory problem.

claude codedbtdata engineeringai
Note

LLM as Content Cleaner

Using a cheap LLM like GPT-4o-mini to strip navigation, CTAs, and HTML noise from scraped markdown — a reliable pattern for web content pipelines.

automationaidata engineering
Note

GA4 Unnesting Patterns Hub

Hub connecting all concepts for extracting data from GA4's nested BigQuery schema — UNNEST approaches, JOIN types, engagement recipes, e-commerce funnels, and dbt architecture.

ga4bigquerydbtdata modeling
Note

dbt Documentation Audience Mismatch

Why most dbt documentation goes unread — the fundamental mismatch between who writes docs (engineers) and who needs them (business users, analysts, and increasingly AI tools)

dbtdata engineeringdata quality
Note

dbt Constraint Enforcement Across Warehouses

How dbt constraint types behave across Postgres, Snowflake, BigQuery, Redshift, and Databricks — which constraints actually reject bad data and which are metadata only.

dbtdata qualitydata modeling
Note

dlt RESTClient vs REST API Source

The two approaches dlt offers for building custom API pipelines — imperative RESTClient and declarative REST API Source — and how to choose between them.

dltdata engineeringetl
Note

Reverse ETL Patterns for CRM Activation

How to push warehouse-computed scores and attributes back into Salesforce or HubSpot using reverse ETL tools — sync architecture, field mapping, sync frequency, and downstream automation.

dbtbigquerydata engineeringanalytics
Note

Semantic Layer Adoption Readiness

When to invest in a semantic layer, what barriers you'll face, and how to start small — a practical readiness assessment based on team size, tooling maturity, and organizational commitment.

dbtsnowflakedatabricksdata modeling
Note

BigQuery Reservation Hierarchy

The three layers of BigQuery's capacity model -- commitments, reservations, and assignments -- and how they work together to manage slot allocation.

bigquerygcpcost optimization
Note

Custom MCP Servers for Data Engineering

A reading path through building custom MCP servers — from decision criteria and SDK selection through tool design, testing, and practical server patterns for data catalogs, pipelines, and quality.

mcpdata engineeringdata quality
Note

Metric Organization in dbt Projects

How to organize semantic models and metrics in dbt — co-located vs parallel subfolder structures, the one-primary-entity rule, and scaling patterns for large projects

dbtdata modelinganalytics
Note

MCP JSON-RPC Wire Format

The actual message format MCP uses under the hood — initialization handshake, capability negotiation, tool discovery, and tool invocation — with examples for debugging.

mcpdata engineeringai
Note

dbt as the Center of Gravity for BI

Why dbt has become the foundation layer that BI tools read from — not a parallel concern — and how the Fivetran merger accelerates this shift

dbtbigquerysnowflakeanalytics
Note

Dagster Asset Checks from dbt Tests

How Dagster automatically converts dbt tests into asset checks since version 1.7 -- severity mapping, health badges, and what this means for unified data quality monitoring.

dbtdata qualitytestingautomation
Note

Meta Ads Attribution Windows

How Meta's attribution windows work, the June 2025 on-Meta/off-Meta split, which windows survived the January 2026 deprecation, and what this means for warehouse data.

bigqueryanalyticsdata engineeringetl
Note

Business Cost of Poor Data Quality

The measurable financial and operational impact of data quality failures — industry statistics, high-profile incidents, and why prevention costs a fraction of remediation.

data qualitydata engineering
Note

dbt as AI Knowledge Base

How a well-structured dbt project functions as a shared context layer that improves every AI tool in your stack — models, tests, documentation, and semantic definitions as machine-readable knowledge.

dbtclaude codemcpdata engineering
Note

Looker Studio: Extract vs. Live Connection

When to use Looker Studio's extract mode versus live BigQuery connections, the 100 MB limit that catches teams off guard, and how to combine both in the same report.

bigqueryanalyticscost optimization
Note

Why a dbt Documentation Style Guide Matters More Than Effort

The case for writing a documentation style guide for your dbt project — why inconsistency is the root problem, not effort, and how style guides serve both humans and AI tools

dbtdata qualityai
Note

Data Architecture as Human Judgment

Why data architecture — DAG design, ownership models, temporal logic, team boundaries — resists AI automation and remains a fundamentally human discipline.

dbtdata engineeringdata modeling
Note

Hybrid ELT Strategy

When to buy managed ELT, when to build with dlt + AI, and the practical migration path — a decision framework for splitting your pipeline portfolio strategically

dltbigquerydata engineeringetl
Note

AI SQL Review Tools

A reference of tools that apply AI to SQL and dbt code review — Altimate AI, Greptile, CodeRabbit, and MotherDuck FixIt — with benchmarks and differentiators

dbtbigquerysnowflakedata quality
Note

dbt Docs Site Customization Options

What you can customize in the default dbt docs site — the overview page, DAG node colors, hiding models — and where the customization options end

dbtdata engineering
Note

dbt Cloud Managed Platform

What dbt Cloud provides beyond Core -- web IDE, job scheduling, collaboration tools, managed infrastructure, and the pricing model that shapes adoption decisions.

dbtdata engineeringautomation
Note

dbt MCP Server Setup

A reading path through connecting dbt to AI assistants via MCP — choosing between local and remote modes, tool capabilities, configuration, and safety.

mcpdbtclaude codeai
Note

dbt Groups and Access Modifiers

How dbt groups and access modifiers (private, protected, public) organize model ownership and enforce boundaries — and why they're worth using even in single projects.

dbtdata engineeringdata modeling
Note

Consent Mode Common Implementation Failures

The ten most frequent Consent Mode implementation mistakes, ordered by prevalence and damage — from missing defaults to untested consent states.

ga4google adsanalyticsdata quality
Note

Identity Resolution for Customer 360

How to link CRM contact records to GA4 cookie identifiers in BigQuery — the three join key strategies, deterministic vs probabilistic matching, and open-source tooling.

ga4bigquerydbtdata modeling
Note

dbt Unit Test YAML Syntax

Complete reference for dbt unit test YAML structure — required elements, input formats (dict, csv, sql), optional configuration, and version-specific features.

dbttesting
Note

dbt Docs Performance at Scale

Why the default dbt docs site becomes unusable for large projects — the AngularJS frontend, client-side JSON parsing, and the performance ceiling that drives teams to alternatives

dbtdata engineering
Note

CRM Data Extraction Challenges

Why CRM data is harder to warehouse than most sources — mutability, API-based extraction, soft deletes, formula field blind spots, and rate limits.

dbtbigquerydata engineeringetl
Note

dbt Macros

How dbt macros work — Jinja fundamentals, writing custom macros, using dbt_utils, dispatch patterns, and when macros help vs hurt

dbtdata engineeringdata modeling
Note

dbt deps and the Package Lock File

How dbt resolves and installs packages — the difference between packages.yml and dependencies.yml, how the lock file works, and the flags worth knowing.

dbtdata engineering
Note

MetricFlow setup hub

Hub note connecting garden notes extracted from the MetricFlow getting started tutorial: installation, semantic model components, time spine, metric types, CLI querying, and organization.

dbtdata modelinganalytics
Note

ML Anomaly Detection vs Statistical Methods

When ML-powered anomaly detection earns its cost over simpler Z-score approaches — and why the answer depends on data complexity, not marketing materials.

dbtelementarydata qualitydata engineering
Note

GCP Application Default Credentials

The difference between gcloud auth login and Application Default Credentials — why they exist, how they work, and why ADC is what MCP servers and SDKs actually use.

gcpbigquerydata engineering
Note

Claude Code ROI for Analytics Engineers

Realistic time-to-value for Claude Code in a dbt workflow — what setup actually costs, when consistent savings emerge, and the qualitative benefit of tasks that finally get done.

claude codedbtdata engineeringai
Note

Fivetran dbt Packages Architecture

How Fivetran structures its 60+ dbt packages — the unified source-plus-transform model, cross-platform reporting bundles, and the installation pattern that avoids version conflicts.

dbtdata engineeringdata modeling
Note

BigQuery Architecture for Analytics Engineers

How BigQuery works under the hood — columnar storage, slots, the separation of compute and storage — and why it matters for your queries and costs.

bigquerygcpanalyticscost optimization
Note

dbt Repository Structure for Cloud Function Deployment

How to restructure a dbt project repository for Cloud Function deployment — the subdirectory pattern, main.py, requirements.txt, and profiles.yml with oauth.

dbtgcpdata engineeringautomation
Note

dbt Single Responsibility Macros

Why dbt macros should do one thing, how to recognize when they've outgrown their scope, and the composition pattern for building complex transformations from focused pieces.

dbtdata modelingdata engineering
Note

GA4 User-Provided Data BigQuery Trap

Enabling User-provided data in GA4 admin permanently disables user_id export to BigQuery with no reversal option — what this means and how to protect your pipelines.

ga4bigqueryanalyticsdata quality
Note

dbt Migration Validation Patterns

How to validate a dbt migration — parallel execution, comparison queries, ML regression testing, and the practical approach to proving equivalence.

dbtbigquerydata engineeringdata quality
Note

GTM Server-Side: Ten Implementation Failures and How to Avoid Them

The ten most common GTM Server-Side implementation mistakes — from missing custom domains and silent trigger failures to Cloud Logging cost surprises and Safari IP mismatch — with diagnostic guidance for each.

ga4google adsanalyticsdata quality
Note

Agent-First CLI Design Principles

Seven principles for building CLIs that AI agents can consume reliably — from Justin Poehnelt's design of the Google Workspace CLI, with implications for any tool targeting agent consumers.

mcpclaude codeaiautomation
Note

on_schema_change in dbt Incremental Models

How dbt handles column additions and removals in incremental models, the four on_schema_change options, and why none of them backfill historical data.

dbtincremental processingdata modeling
Note

Salesforce Person Accounts and Multi-Currency in the Warehouse

Two Salesforce data model quirks that break standard warehouse patterns — Person Accounts that merge Account and Contact, and multi-currency orgs that require exchange rate conversion in dbt.

dbtbigquerydata modelingdata engineering
Note

BigQuery Editions

The three BigQuery Editions tiers -- Standard, Enterprise, and Enterprise Plus -- what each offers, their limits, and how they compare to on-demand pricing.

bigquerygcpcost optimization
Note

Claude Code Model Selection for Analytics Work

When to use Sonnet vs Opus in Claude Code for analytics engineering — daily work defaults, complex problem escalation, and practical cost-speed tradeoffs

claude codeaidata engineering
Note

Data quality KPIs from Elementary

Five data quality KPIs you can build from Elementary's warehouse tables, how to interpret them, and how they map to standard data quality dimensions.

elementarydbtdata qualityanalytics
Note

BigQuery Slots

What BigQuery slots are, how queries use them, what happens during slot contention, and the two ways to get slots.

bigquerycost optimization
Note

dbt Materialization Default: Tables Everywhere

Why materializing every dbt model as a table by default — not views, not ephemeral — produces more debuggable, stable, and maintainable projects.

dbtdata engineeringdata modelingcost optimization
Note

SQL Attribution Patterns

SQL implementation patterns for marketing attribution — first-touch, last-touch, linear, position-based, time-decay, and algorithmic models

bigqueryanalyticsdata modeling
Note

Salesforce Polymorphic Relationship Resolution

How to resolve Salesforce's WhoId and WhatId polymorphic foreign keys in the warehouse using ID prefix routing — the pattern, the SQL, and where it recurs.

dbtbigquerydata modelingdata engineering
Note

BigQuery Table Types

Native BigQuery tables, BigLake external tables, and BigLake Iceberg tables — what each optimizes for, when to use them, and a decision framework for choosing.

bigquerygcpdata engineeringdata modeling
Note

Consent Mode Basic vs Advanced

How Basic and Advanced Consent Mode differ in tag behavior, cookieless pings, and conversion modeling — and the traffic thresholds that determine whether Advanced mode actually helps.

ga4google adsanalyticsdata quality
Note

MCP Apps vs Traditional BI

When to use MCP Apps for data visualization versus a dedicated BI tool — the honest comparison, what each does well, and the hybrid architecture that makes sense for most teams.

mcpclaude codeaianalytics
Note

Salesforce to BigQuery Pipeline

Hub note for the Salesforce-to-BigQuery pipeline — from ingestion tool selection through polymorphic resolution, stage tracking, account hierarchies, and activity timelines.

dbtbigquerydata modelingdata engineering
Note

MCP Resources and Prompts

Beyond tools — using MCP resources for read-only data exposure, prompts for reusable templates, and the Context object for progress reporting in long-running operations.

mcpdata engineering
Note

GA4 Events Sessionized Model

The implementation of the wide event-grain intermediate model for GA4 — the CTE structure, window function patterns, and design decisions that make downstream analysis flexible.

ga4dbtbigquerydata modeling
Note

Self-Hosting Lightdash with Docker Compose

How to run Lightdash with Docker Compose — required services, environment variables, known gotchas, and what to expect in small-team production deployments.

dbtanalyticsdata modeling
Note

ELT Connector Quality and Coverage Comparison

How Fivetran, Airbyte, and dlt differ in connector count, quality tiers, and their approaches to handling sources that don't have pre-built connectors.

dltdata engineeringetl
Note

RSS Feed Deduplication in n8n

How to prevent duplicate Notion pages when polling RSS feeds in n8n, using a Merge node configured as a left anti-join.

automationdata engineering
Note

dbt-audit-helper Macro Reference

Reference for every dbt-audit-helper macro — parameters, output format, platform support, and practical usage notes.

dbtdata qualitytesting
Note

Elementary Slack and Teams integration

How to connect Elementary alerts to Slack (token-based and webhook) and Microsoft Teams, including the tradeoffs between integration methods.

elementarydata qualityautomation
Note

Probabilistic Matching Limitations in GA4

Why probabilistic identity matching fails with GA4's BigQuery export — the signals GA4 intentionally excludes, what coarse data remains, and the compounding cost of false positives.

ga4bigqueryanalyticsdata quality
Note

Dagster Freshness Policies and Scheduling

How Dagster tracks asset freshness rather than just execution timestamps, and how to schedule dbt runs using cron schedules, sensors, and automation conditions.

dbtdata engineeringautomation
Note

dbt Package Installation Types

The three ways to install dbt packages — Hub, Git, and local — and how to choose between them. Includes version conflict patterns and best practices for your root packages.yml.

dbtdata engineering
Note

Orchestrator Developer Experience Comparison

Local development, testing patterns, and CI/CD workflows across Dagster, Airflow, and Prefect — where the day-to-day friction lives.

dbtdata engineeringtestingautomation
Note

dbt Microbatch Strategy Tradeoffs

The practical limitations and design tradeoffs of dbt's microbatch incremental strategy — UTC assumptions, no sub-hourly batches, and sequential execution.

dbtbigquerysnowflakedatabricks
Note

Custom dbt Materializations

Hub note for custom dbt materializations — anatomy, decision framework, zero-downtime swap, secured table, and debugging patterns.

dbtbigquerydata engineeringdata modeling
Note

dbt Integration Depth Across Orchestrators

How dagster-dbt, astronomer-cosmos, and prefect-dbt differ in integration depth — from first-class asset mapping to operational wrappers — and what that means when something breaks.

dbtdata engineeringautomation
Note

BigQuery Clustering Mechanics

How BigQuery clustering sorts data within storage blocks, why column order matters critically, and how automatic re-clustering works at no cost.

bigquerydata engineeringdata modeling
Note

Salesforce Record Type Partitioning in dbt

How to handle Salesforce RecordTypeId in the warehouse — filtering by record type in base models, splitting objects into separate models, and storing IDs in dbt vars.

dbtbigquerydata modelingdata engineering
Note

Building Custom API Pipelines with dlt

A map of the concepts and patterns involved in building production API pipelines with dlt — from choosing an approach through deployment.

dltbigquerydata engineeringetl
Note

Cursor for dbt Development

How Cursor works as the IDE layer for dbt projects — strengths with dbt Power User, limitations for multi-file work, and where it fits alongside Claude Code

dbtaiautomation
Note

Building dlt Pipelines: From First Run to Incremental Loading

A reading path through the concepts in the hands-on dlt tutorial — environment setup, REST API Source config, dependent resources, and incremental loading.

dltdata engineeringetlincremental processing
Note

MCP Apps Protocol Internals

How MCP Apps extend the Model Context Protocol to render interactive HTML interfaces inside AI clients — the ui:// resource mechanism, iframe sandboxing, and bidirectional JSON-RPC communication.

mcpclaude codeaidata engineering
Note

GA4 E-commerce Schema in BigQuery

The ecommerce RECORD and items REPEATED RECORD in GA4's BigQuery export — field reference, nested item_params, and query patterns for purchase analysis.

ga4bigqueryanalyticsdata engineering
Note

Snowflake Cost Monitoring with Warehouse History

SQL patterns for Snowflake cost monitoring using QUERY_HISTORY and WAREHOUSE_METERING_HISTORY — daily cost summaries, per-warehouse breakdowns, and translating credits into dollars for non-technical stakeholders.

snowflakecost optimizationdata engineering
Note

Terminal Cross-Platform Setup

How to set up and use the terminal on macOS, Linux, and Windows — including WSL, Git Bash, and PowerShell options with a command equivalence table

claude codeai
Note

Idempotent Incremental Models in dbt

How to build dbt incremental models that produce identical results regardless of how many times they run, using pre-deduplication and proper unique_key design.

dbtincremental processingdata quality
Topic guide

Lead Scoring in the Warehouse

Hub note for warehouse-native lead scoring — from rule-based weighted models in dbt to BigQuery ML classification, feature engineering, and reverse ETL back to the CRM.

dbtbigqueryanalyticsdata modeling
Note

BigQuery Column-Level Security with Policy Tags

Replace view-based column hiding with Data Catalog policy tags — storage-layer security that survives schema changes and doesn't require view maintenance.

bigquerygcpdata engineering
Note

Customer 360 dbt DAG Architecture

How to structure a dbt project for Customer 360 models — the identity resolution layer between base and mart, the wide customer table, and materialization choices.

dbtbigqueryga4data modeling
Note

Codebase Refactoring with Claude Code

How Claude Code enables project-wide dbt refactoring — column renames, naming convention migrations, and ref() updates across dozens of files without the manual search-and-miss problem.

claude codedbtdata engineeringai
Note

Dagster + dbt Integration Hub

Hub note for the dagster-dbt integration — how the mapping works, quality checks, freshness monitoring, CI/CD workflows, and the case for choosing Dagster over dbt Cloud.

dbtdata engineeringautomation
Note

dbt Incremental Strategy Warehouse Behaviors

How dbt incremental strategies behave differently on BigQuery, Snowflake, and Databricks — the platform-specific quirks, gotchas, and limitations that the documentation doesn't emphasize enough.

dbtbigquerysnowflakedatabricks
Note

Identity Resolution for Ad Measurement

How Enhanced Conversions, Unified ID 2.0, and data clean rooms recover attribution signal after cookies fail — what each approach does, what it requires, and realistic uplift estimates.

ga4google adsanalyticsdata quality
Note

OpenClaw Ecosystem and Community

The community and ecosystem around OpenClaw — ClawHub, ClawData, the viral growth story, the naming history, and what the ecosystem state means for adoption decisions.

aiautomation
Note

dbt Package Development Hub

A hub connecting all notes on building, testing, and publishing dbt packages — from project anatomy to CI/CD to Hub distribution.

dbtdata engineeringdata modeling
Note

Cloud Composer Cost and Capabilities

Cloud Composer 3's pricing model, committed use discounts, and the specific scenarios where its orchestration capabilities justify the $300-400/month minimum.

gcpdbtdata engineeringcost optimization
Note

MetricFlow installation and setup

Installing MetricFlow for dbt Core with adapter-specific packages, the dbt Cloud alternative, and the initial project configuration steps needed before defining semantic models.

dbtdata engineeringanalytics
Note

OpenClaw Skills for Monitoring

How to write OpenClaw skill files for data pipeline monitoring — structuring SKILL.md instructions, categorizing failure types, formatting output for Slack, and adding context that makes alerts actionable.

dbtautomationdata qualityai
Note

dbt Weighted Attribution Models

Implementing position-based and time-decay attribution in dbt with configurable weights via dbt variables — model SQL, project configuration, and revenue integrity testing

dbtbigquerydata modelinganalytics
Note

Attribution Channel Grouping Strategy

How to group marketing channels for data-driven attribution -- balancing granularity against data sparsity to produce stable, actionable model results

bigqueryanalyticsdata modeling
Note

AI Agent Regulatory Exposure for Data Teams

Why running AI agents against client data creates contractual and regulatory exposure for data teams — GDPR, data processing agreements, the open-source liability argument, and what the Dutch DPA warning actually means.

aidata engineeringanalytics
Note

Custom Parameterized MCP Queries

Using the MCP Toolbox's tools.yaml to define constrained, parameterized queries that give AI assistants structured access to data without arbitrary SQL.

mcpbigqueryaidata engineering
Note

dbt Identity Resolution Pipeline

Production dbt DAG structure for GA4 identity resolution — the incremental identity mapping model, stitched events model, schema tests, and the 3-day lookback window for late-arriving data.

ga4bigquerydbtdata modeling
Note

BigQuery Pricing Policy Changes 2024–2025

Three BigQuery policy changes that affect cost modeling in 2024–2025: the flat-rate deprecation, the 200 TiB daily on-demand quota, and new Cloud Storage fees for external tables.

bigquerygcpcost optimization
Note

MCP Setup Troubleshooting

Common failure modes when setting up MCP servers — macOS PATH problems, silent JSON config failures, tool count limits, and where to find debug logs.

mcpclaude codeaidata engineering
Note

MetricFlow Advanced Patterns

Complex metric patterns in MetricFlow — period-over-period comparisons with offset_window, filtered metrics with Jinja, and handling null gaps in time series

dbtdata modelinganalytics
Note

dbt Schema Validation and Data Products Hub

Hub connecting notes on dbt's three validation mechanisms, source schema gaps, the Mesh governance triad, and contract-first development.

dbtdata qualitydata modelingdata engineering
Note

TDD with Claude Code for dbt

How test-driven development works with Claude Code for dbt models — write tests first, let the agent iterate to pass them, then refactor with confidence

dbtclaude codetestingai
Note

Security Posture for AI Agents

How to scope permissions, isolate environments, and treat always-on AI agents like OpenClaw as untrusted actors — practical security practices for data teams

dbtaidata quality
Note

BigQuery BI Engine

How BigQuery BI Engine provides in-memory acceleration for dashboard queries, what it supports, what it silently skips, and how to verify it's actually working.

bigquerygcpcost optimizationanalytics
Note

Dagster Resources

How Dagster resources work as centrally configured, injectable external connections — BigQueryResource, DbtCliResource, and the pattern for swapping environments without changing asset code.

bigquerydata engineeringautomation
Note

dbt MCP Server Tool Reference

Complete reference for the 20+ tools exposed by the dbt MCP server — CLI commands, metadata discovery, Semantic Layer queries, and job management.

mcpdbtaidata engineering
Note

Silent SQL Errors in AI-Generated Code

Why AI-generated SQL that compiles and runs is more dangerous than SQL that fails — the 3% warning rate, temporal filter inconsistencies, and the review practices that catch what linters miss

dbtbigquerysnowflakeai
Note

Signals That Your Cron-Based dbt Setup Has Outgrown Itself

Five concrete indicators that a simple cron-scheduled dbt job has hit its limits — and what each one tells you about the orchestration capability you actually need.

dbtgcpdata engineeringautomation
Note

dbt Production Safety Hooks

Using Claude Code PreToolUse hooks to block dangerous dbt commands before they execute — full-refresh on production, unscoped builds, and other high-risk operations

claude codedbtautomationdata quality
Note

Context Engineering for Data Pipelines

How the value in data engineering is shifting from writing code to structuring context — the emerging discipline of context engineering, the ETL-to-ECL reframe, and the skills pipeline risk.

dbtaidata engineering
Note

dbt-utils Introspective Macros

How dbt-utils compile-time introspection macros work — get_column_values, get_relations_by_pattern, get_query_results_as_dict, and get_single_value — and when they cause problems.

dbtdata engineeringdata modeling
Note

Google DDA Silent Fallback

GA4's Data-Driven Attribution silently falls back to last-click when data thresholds aren't met -- how to detect it and why warehouse-native attribution avoids this trap

ga4analytics
Note

GA4 Ecommerce Items UNNEST Pattern

How to handle GA4's nested items array in dbt — building a separate item-level grain model with intentional Cartesian UNNEST.

ga4dbtbigquerydata modeling
Note

dbt Incremental Strategy Configuration Patterns

Complete, runnable dbt config blocks for each incremental strategy — merge with predicates, delete+insert on Snowflake, insert_overwrite with static partitions, and replace_where on Databricks.

dbtbigquerysnowflakedatabricks
Note

The Context Gap in AI Data Engineering

Why business context — what 'Status' means, whether 'Amount' is net or gross, tacit SAP knowledge — is the core limitation of AI in data engineering.

dbtaidata modelingdata quality
Note

Open Data Contract Standard

ODCS v3.1.0 under the Linux Foundation's Bitol project — what it covers, how it compares to the Data Contract Specification, and where harmonization stands.

dbtdata qualitydata engineering
Note

dbt Features Without a Dataform Equivalent

The dbt capabilities that simply don't exist in Dataform — snapshots, the package ecosystem, microbatch incremental strategy, and Slim CI. These are the blockers that stall dbt-to-Dataform migrations.

dbtdataformbigquerydata engineering
Note

Incrementality Testing for Attribution

How to validate attribution models with causal experiments — holdout tests, geo tests, and platform lift studies that measure whether a channel actually drives conversions

ga4analytics
Note

dbt-expectations row_condition Pattern

How the row_condition parameter in dbt-expectations enables conditional test filtering — applying tests to specific segments without custom SQL.

dbtdata qualitytesting
Note

Expense Capture as a Habit Layer

Using natural language logging and receipt OCR to close the gap between 'I spent money' and 'that expense is recorded somewhere useful' — why capture is the real problem, not the accounting.

automationai
Note

GA4 user_id Data Quality

Common implementation bugs that corrupt GA4 user_id data — string 'null' values, logout tagging errors, suspicious high-cardinality IDs — and the SQL patterns to detect and filter them.

ga4bigqueryanalyticsdata quality
Note

Agent Dashboard Scraping: The Fragility Problem

How browser automation works for dashboards without APIs, the five-step scraping loop, session management patterns, and why silent failure is the central limitation that makes this a fallback of last resort.

automationanalyticsai
Note

Data Quality Validation Layers

The three-layer model for data quality — proactive contracts, reactive schema tests, and anomaly detection — and why you need all three.

dbtdata qualitydata engineering
Note

Unit Tests vs Data Tests in dbt

The two-checkpoint model for dbt testing — unit tests gate deployments by verifying transformation logic, data tests gate production by verifying data health.

dbttestingdata quality
Note

dbt Testing Strategy by Layer

What to test at each layer of the dbt DAG — sources, base, intermediate, and mart — and why testing intensity should increase toward the edges.

dbttestingdata qualitydata modeling
Note

Unit Testing Snapshot Consumers in dbt

Three strategies for testing snapshot-related logic — pre-snapshot base models, SCD2 date range calculations in downstream models, and change detection hashing.

dbttestingdata modeling
Note

GA4 Flattened Events Materialization

When and how to pre-unnest GA4 events into a flat table — the cost-performance tradeoff, the CREATE TABLE pattern, and why dbt models formalize this approach.

ga4bigquerydata engineeringcost optimization
Note

MetricFlow semantic model components

The three building blocks of a MetricFlow semantic model: entities (join keys), dimensions (group-by columns), and measures (numeric aggregations that feed metrics).

dbtdata modelinganalytics
Note

Secured Table Materialization in dbt

A custom dbt materialization that automatically reapplies BigQuery row access policies, column descriptions, and data masking tags after every table rebuild.

dbtbigquerydata engineeringdata quality
Note

Organizing dbt Unit Tests at Scale

Tag strategies, CI pipeline tiers, and selection patterns for managing hundreds of dbt unit tests across a growing project.

dbttesting
Note

Elementary dashboard organization

How to organize Elementary dashboards and reports by domain, criticality, and refresh cadence so they stay useful as your project grows.

elementarydbtdata qualitydata engineering
Note

Self-healing pipeline maturity spectrum

Five levels of self-healing capability in data pipelines, from basic retries to fully agentic systems, and where production value actually concentrates.

data engineeringautomationai
Note

Salesforce Account Hierarchy with Recursive CTEs

How to resolve Salesforce's self-referential ParentAccountId into a flattened hierarchy using recursive CTEs in BigQuery — the SQL pattern, ultimate parent resolution, and revenue rollup.

dbtbigquerydata modelingdata engineering
Note

BigQuery Partitioning vs Clustering Decision Framework

A practical decision framework for choosing between BigQuery partitioning, clustering, or both based on table size, query patterns, and operational needs.

bigquerydata engineeringdata modelingcost optimization
Note

dbt MCP Server: Local vs Remote

The two deployment modes for dbt's MCP server — local gives full CLI access and works without dbt Cloud, remote is read-only metadata and requires a Cloud plan.

mcpdbtaidata engineering
Note

Jinja Templating for SQL Practitioners

Why Jinja feels natural to SQL-first analytics engineers — the double-brace model, macros as SQL helpers, and the separation of concerns that keeps transformation files focused.

dbtdata engineeringdata modeling
Note

Dataform-to-dbt Migration

Migration paths between Dataform and dbt — tooling, realistic timelines by project size, and why macro conversion is where migrations get painful

dataformdbtbigquerydata engineering
Note

dbt MCP Server Setup and Configuration

Step-by-step installation and configuration of the dbt MCP server — uv, environment variables, feature toggles, and client setup for Claude Code and Claude Desktop.

mcpdbtclaude codeai
Note

Data team on-call strategies

How data teams structure on-call rotations, triage processes, and runbooks differently from software engineering on-call, and which metrics reveal whether the system is working.

data qualitydata engineering
Note

dbt-expectations Setup and Configuration

How to install and configure dbt-expectations — packages.yml, timezone variable, platform compatibility, and dependency management.

dbtdata qualitytesting
Note

Attribution Analysis

A structured guide to marketing attribution — from SQL implementation patterns through multi-model comparison, dashboard design, and incrementality testing

bigqueryga4dbtanalytics
Note

Google Ads BigQuery Data Transfer Service

Hub note for the free Google Ads → BigQuery pipeline — setup, schema quirks, known data gaps, and dbt modeling patterns.

google adsbigquerygcpdata engineering
Note

Dagster Software-Defined Assets

The core building block of Dagster — how @dg.asset works, automatic dependency inference, the Definitions object, and how SDAs differ from traditional orchestrator primitives.

data engineeringautomation
Note

dbt Unit Testing Implementation

Hub note for implementing dbt unit tests — from YAML syntax and mocking patterns to BigQuery workarounds and CI/CD integration.

dbtbigquerytesting
Note

Dagster UI for Analytics Engineers

A walkthrough of Dagster's web UI — the Asset Catalog, Global Asset Lineage, Run Details, health indicators, and the Dagster+ Pro features that matter most for analytics engineers on dbt + BigQuery.

dbtbigquerydata engineeringautomation
Note

Workload Identity Federation for CI/CD

Replace service account keys in GitHub Actions and other CI systems with keyless OIDC authentication — no credentials to store, rotate, or leak.

gcpdata engineeringautomation
Note

dbt MCP Server Safety Considerations

The risks of giving an AI assistant dbt CLI access — production data modification, credential scope, Copilot credit consumption, and practical mitigations.

mcpdbtclaude codeai
Note

dbt persist_docs for Warehouse Comments

How persist_docs pushes dbt descriptions directly to your data warehouse as table and column comments, making documentation available where analysts already work

dbtdata modelingdata engineering
Note

dbt profiles.yml with env_var for Multi-Client GCP

Using env_var() interpolation in profiles.yml so dbt reads GCP credentials and project from environment variables — enabling seamless client switching via direnv.

dbtgcpbigquerydata engineering
Note

GA4 BigQuery Query Patterns

Efficient querying of GA4 date-sharded tables — _TABLE_SUFFIX filtering, inline vs FROM clause UNNEST, reusable dbt macros, and cost control practices.

ga4bigqueryanalyticsdata engineering
Note

GA4 User Mart Pattern

Building a user-grain mart from GA4 session data — first/last touch attribution, lifetime value aggregation, and identity stitching with user_pseudo_id and user_id.

ga4dbtbigquerydata modeling
Note

Consent Mode US Privacy Requirements

Why US-only sites increasingly need Consent Mode — Enhanced Conversions requirements, expanding state privacy laws, and the recommended region-specific configuration.

ga4google adsanalyticsdata quality
Note

BigQuery IAM Patterns

Least-privilege IAM for BigQuery — predefined roles, the data vs. compute permission split, service account strategy, and common anti-patterns.

bigquerygcpdata engineering
Note

Visualization MCP Server Ecosystem

The available MCP servers for generating charts and interactive visualizations — AntV, Vega-Lite, DuckDB-Plotly, and how to pick between them.

mcpclaude codeaianalytics
Note

Markdown-to-Notion Blocks Parser

How to convert markdown to Notion's block API format in JavaScript, including handling rich_text objects, the 2000-character limit, and the 100-block request cap.

automationdata engineering
Note

Consent Mode Implementation Mechanics

The technical implementation of Consent Mode v2: default state configuration, CMP integration, GTM trigger ordering, and the wait_for_update race condition.

ga4google adsanalyticsdata quality
Note

Agent Skill Supply Chain Attacks

How malicious skills in agent ecosystems like ClawHub bypass traditional antivirus detection, why natural-language malware is a fundamentally different threat class, and how to evaluate skills before installing them.

aiautomationdata engineering
Note

MCP Ecosystem Governance

How MCP became a vendor-neutral open standard — the Linux Foundation donation, corporate adoption, and what broad industry support means in practice.

mcpaidata engineering
Note

MCP Pipeline Monitoring Server Pattern

A practical MCP server pattern for pipeline monitoring — checking job status, listing failures, and triggering reruns across orchestrators like Airflow and Dagster.

mcpdata engineering
Note

Orchestration Market Landscape in 2026

Where each major data orchestrator stands in 2026 — Airflow's scale, Dagster's dbt dominance, Prefect's developer velocity, Kestra's rapid rise, and the tools in decline.

dbtbigquerygcpdata engineering
Note

Kestra Declarative Orchestration

Kestra's YAML-first orchestration model — how it differs from Python-decorator tools, its rapid growth, enterprise adoption, and why production evidence at small-to-mid scale is still thin.

dbtdata engineeringautomation
Note

dbt documentation drift detection

Techniques for detecting when dbt documentation falls out of sync with reality — column-level drift, git-based staleness signals, and schema drift for sources

dbtdata qualityautomation
Note

dbt Test Output Parsing for Automated Monitoring

How to extract structured, actionable information from dbt test output — distinguishing failure types, capturing sample rows, and handling partial runs so automated monitoring doesn't miss anything.

dbtdata qualityautomationai
Note

BigQuery Editions Migration Anti-Patterns

Five mistakes teams make when migrating from BigQuery on-demand to Editions — and how to avoid them.

bigquerygcpcost optimizationdata engineering
Note

dbt Validation Mechanisms Compared

How dbt contracts, data tests, and dbt-expectations differ in when they fire, what they cover, and what they cost — and why you need all three.

dbtdata qualitytesting
Note

CLAUDE.md BigQuery Specifics

What to put in CLAUDE.md when your dbt project runs on BigQuery — GoogleSQL dialect enforcement, partition filter requirements, and incremental model config templates.

claude codedbtbigqueryai
Note

GTM Server-Side Hosting on Azure

How to host the GTM Server-Side tagging container on Azure using App Service or Container Apps, with pricing tiers and SSL configuration notes.

gtmanalyticscost optimization
Note

Dataform as a GCP Service

What Dataform is in 2026 — a fully managed BigQuery transformation service with deep GCP integration, zero licensing cost, and SQLX/JavaScript templating

dataformbigquerygcpdata engineering
Note

GTM Server-Side on Cloud Run: Deployment and Configuration

How to deploy GTM Server-Side on Google Cloud Run — automatic vs manual provisioning, production configuration settings, custom domain setup, and multi-region architecture for global traffic.

ga4gcpanalyticsdata engineering
Note

JavaScript vs Jinja in Analytics Engineering

The philosophical and practical differences between Dataform's JavaScript templating and dbt's Jinja2 — where they diverge, what each excels at, and how to convert between them.

dbtdataformdata engineeringdata modeling
Note

Metrics as Code

The practice of defining business metrics in version-controlled YAML — reviewed in pull requests, tested in CI/CD, and consumed by BI tools and AI agents

dbtanalyticsdata modelingdata quality
Note

LinkedIn Ads OAuth Token Management

LinkedIn's OAuth token expiration model for the Marketing API — 60-day access tokens, 365-day refresh tokens, forced annual re-authentication, and operational strategies for custom pipelines.

data engineeringetl
Note

dlt Pagination Patterns

The built-in paginators dlt provides for common API patterns, and how to extend BasePaginator for APIs that don't follow standard conventions.

dltdata engineeringetl
Topic guide

MCP Protocol Fundamentals

Reading map for the foundational MCP concepts — how the protocol works, what messages look like, what primitives exist, and how they fit together for data engineering.

mcpdata engineeringai
Note

Schema Registry for Contract Enforcement

How schema registries enforce data contracts on event streams before data reaches the warehouse — compatibility modes, CEL validation rules, and production practices.

data qualitydata engineering
Note

dbt Orchestration Decision Framework for GCP

A decision framework for choosing between Cloud Run Jobs, Cloud Workflows, and Cloud Composer for dbt orchestration on GCP — based on actual requirements, not arbitrary complexity thresholds.

dbtgcpdata engineeringcost optimization
Note

Documentation Quality Determines AI Usefulness

Why the quality of your dbt documentation directly determines how useful AI tools can be — the Roche chatbot failure, the docs-to-AI feedback loop, and case studies in enforcement

dbtaidata quality
Note

Looker Studio + BigQuery Performance — Hub

Map of garden notes on optimizing Looker Studio dashboards backed by BigQuery: BI Engine, extract mode, blending pitfalls, caching, credentials, and upgrade decisions.

bigqueryanalyticscost optimization
Note

dbt Mart Layer Patterns

What belongs in dbt mart models — reporting aggregations, activation exports, ML feature tables — and the principle that every mart serves a specific consumer.

dbtdata modelingdata engineering
Note

dbt Private Packages via Git

How to distribute internal dbt packages as Git dependencies — version pinning, authentication options, and trade-offs compared to Hub packages.

dbtdata engineering
Note

The Chatbot → Copilot → Agent Paradigm Shift

How AI's relationship to the developer changed across three distinct eras — chatbot (demand), copilot (alongside), agent (autonomous) — and why each phase is qualitatively different, not incrementally better.

claude codeaiautomationdata engineering
Note

dbt Package Ecosystem Hub

Navigation hub for the dbt package ecosystem — how installation works, what's available, version compatibility, and how to evaluate packages for production use.

dbtdata engineering
Note

Dagster Branch Deployments for dbt

How Dagster+ branch deployments create ephemeral preview environments for dbt changes on PR, with state-based selection and partitioned execution for CI/CD workflows.

dbtdata engineeringtestingautomation
Note

OpenClaw Security Risks — What's Documented

A factual catalogue of the specific, documented security incidents, CVEs, regulatory warnings, and threat patterns that analytics engineers need to know before running OpenClaw near client data.

aidata engineering
Note

Looker Studio Data Blending Pitfalls

Why Looker Studio data blending silently creates cartesian products, how to identify it, and why pre-joining in BigQuery is almost always the right fix.

bigqueryanalyticscost optimization
Note

Google Ads Performance Max Data Gaps

Why Performance Max campaign data is incomplete in BigQuery DTS, what's actually missing, and how to get the data you need.

google adsbigquerydata qualitydata engineering
Note

OpenClaw for Data People — Hub

A reading map for the OpenClaw introductory guide — architecture and design principles, tool comparison, security risks, persistent memory, and the ecosystem around OpenClaw.

claude codeaiautomationdata engineering
Note

Meta CAPI Server-Side Setup: Deduplication and Event Match Quality

How to configure Meta Conversions API via server-side GTM — event deduplication with shared event_id, user data mapping for EMQ score, and forwarding the _fbp and _fbc cookies.

ga4google adsanalyticsdata quality
Note

AI-Powered dbt Documentation

A reading path through automating dbt documentation — from scaffolding tools to AI generation, business context enrichment, and CI enforcement

dbtclaude codeautomationai
Note

Orchestrator Pricing for dbt Teams

Managed orchestration costs compared — Dagster+, Prefect Cloud, Astronomer, Cloud Composer, and dbt Cloud — with entry-tier pricing, scaling models, and the hidden costs that shift the math.

dbtgcpcost optimizationdata engineering
Note

Proactive vs. Reactive AI Agents

The distinction between AI tools that respond to prompts and AI agents that act on schedules — why this shift matters for automation use cases, and where each model fits.

claude codeaiautomation
Note

Dagster GCP Deployment

How to deploy Dagster on GCP — Serverless vs Hybrid modes, GKE with Helm, Workload Identity authentication, Cloud SQL for storage, and the community Cloud Run option.

gcpdata engineeringautomation
Note

Choosing Between BigQuery MCP Options

Decision framework for BigQuery MCP access — Remote Server vs Toolbox vs bq CLI, matched to your client, team setup, and use case.

mcpbigqueryclaude codeai
Note

Dagster vs dbt Cloud Orchestration

When Dagster's dagster-dbt integration is worth the setup cost over dbt Cloud's built-in scheduler -- cost comparison, capability gaps, and the vendor independence argument after the Fivetran merger.

dbtdata engineeringautomationcost optimization
Note

MCP Official Reference Servers

The servers maintained by the MCP Steering Group — which are actively developed, which have been handed to vendors, and why the distinction matters.

mcpdata engineeringai
Note

GTM Server-Side Managed Hosting Providers

Comparison of Stape, Addingwell, TAGGRS, and Cloudflare Zaraz as managed alternatives to self-hosting GTM Server-Side containers on cloud infrastructure.

gtmanalyticscost optimization
Note

dbt Cross-Database Macros

Hub for writing dbt macros that work across BigQuery, Snowflake, and Databricks — dialect differences, dispatch configuration, built-in macros, and array operations.

dbtbigquerysnowflakedatabricks
Note

dbt Core vs Cloud Hub

Hub note connecting garden notes decomposed from the dbt Core vs dbt Cloud comparison article.

dbtdata engineering
Note

dbt Agent Skills

dbt Labs' official Markdown skill files that teach AI coding agents how to follow dbt best practices — what they cover, how they work, and what the benchmarks actually show.

dbtclaude codedata engineeringai
Note

Incremental Strategy Decision Framework

A decision framework for choosing the right dbt incremental materialization strategy — merge, delete+insert, insert_overwrite, append, and microbatch

dbtincremental processingdata modeling
Note

Consent Mode v2 Hub

Hub note connecting all concepts involved in implementing, debugging, and maintaining Google Consent Mode v2 across web and server-side GTM containers.

ga4google adsanalyticsdata quality
Note

dlt REST API Source Configuration

How to configure dlt's declarative REST API Source — the client block, resources block, endpoint paths, pagination wiring, and what dlt does automatically with the data.

dltdata engineeringetl
Note

Semantic Validation in dbt

How to encode business rules as dbt tests — regex pattern validation, cross-column logic, natural language AI validation, and when each approach fits.

dbtelementarydata qualitytesting
Note

Customer 360 Modeling

Hub note connecting the concepts involved in building a unified Customer 360 model from CRM and GA4 data — identity resolution, DAG architecture, conflict resolution, and privacy constraints.

dbtbigqueryga4data modeling
Note

Eventarc Event-Driven dbt Triggers

Using Eventarc to trigger dbt runs when upstream data arrives — Cloud Storage object creation, BigQuery audit log events, and combining event-driven with scheduled execution.

dbtgcpdata engineeringautomation
Note

Writing Reusable dbt Macros

A map through the garden notes on designing, naming, documenting, testing, and evolving dbt macros — from when to extract to how to handle breaking changes.

dbtdata modelingdata engineering
Topic guide

Lightdash Open Source & Self-Hosting Hub

Hub note for Lightdash self-hosting — connecting to dbt, Docker Compose setup, Kubernetes deployment, and the open-source vs paid tier tradeoffs.

dbtanalyticsdata modeling
Topic guide

MCP Ecosystem Overview

A reading map for the MCP ecosystem — from protocol fundamentals through official servers, clients, data engineering integrations, and building custom servers.

mcpdata engineeringai
Note

Looker Studio Credentials and Security

The security risks of owner's credentials in public Looker Studio reports, the LeakyLooker vulnerability, cost attribution, and using service accounts for production dashboards.

bigquerygcpanalytics
Note

Rule-Based Lead Scoring in dbt

How to build a configurable weighted lead scoring model in dbt using vars, seed files, and Jinja macros — so marketing can adjust weights without touching SQL.

dbtbigquerydata modelinganalytics
Note

Data Observability Total Cost of Ownership

The true cost comparison between OSS and managed data observability — accounting for engineering time, warehouse compute, training, and the costs that don't appear on invoices.

dbtelementarydata qualitycost optimization
Note

GA4 Sharded-to-Partitioned Base Model

How to convert GA4's date-sharded BigQuery export into a properly partitioned incremental dbt model, and why the static lookback pattern is critical for correctness.

ga4dbtbigquerydata modeling
Note

Google Ads Developer Token

What the Google Ads developer token is, how access levels work, why approval takes months, and which loading tools require one.

google adsdata engineeringetl
Note

dbt Package Ecosystem Governance

Who maintains the dbt package ecosystem — dbt Labs, Fivetran, and community contributors — and how to evaluate a package's reliability before committing to it in production.

dbtdata engineering
Note

CLAUDE.md for dbt Projects

A concrete CLAUDE.md template for dbt projects — what to include, what to leave out, and why the file should be grown reactively from real mistakes rather than written upfront.

claude codedbtaiautomation
Note

GA4 Window Function Pitfalls

Three window function traps specific to GA4 sessionization: the LAST_VALUE framing trap, IGNORE NULLS for sparse event data, and MAX for session-scoped boolean flags.

ga4bigquerydata modelinganalytics
Note

Late-Arriving Data in dbt — Hub

Hub note connecting all concepts around handling late-arriving data in dbt incremental models: measurement, lookback windows, partition strategies, deduplication, testing, and operational safety.

dbtbigquerysnowflakedatabricks
Note

dbt Macro Documentation in YAML

Why _macros.yml beats inline SQL comments for documenting dbt macros, and how to write entries that developers actually use.

dbtdata modelingdata engineering
Note

Google Workspace CLI (gws)

The gws CLI gives programmatic access to every Google Workspace API through a single binary — Gmail, Drive, Calendar, Sheets, Docs — filling the gap gcloud has never covered.

gcpmcpautomationai
Note

LinkedIn Ads B2B Data Value

What makes LinkedIn Ads data uniquely valuable for B2B analytics — professional demographic pivots, the negative CTR-to-pipeline correlation, company-level impression attribution, and what metrics actually matter.

analyticsdata engineering
Note

Unit Testing Incremental Models in dbt

The dual-mode testing pattern for incremental models — overriding is_incremental, mocking this, and understanding that expect blocks show inserts, not final state.

dbtbigquerytestingincremental processing
Note

GTM Server-Side Hosting: Decision Framework

How to choose between Cloud Run, AWS ECS Fargate, Azure App Service, and managed providers for hosting your GTM Server-Side container in production.

gtmgcpanalyticscost optimization
Note

dbt-utils v1.0 Migration: What Moved to dbt-core

The complete list of macros that moved from dbt-utils to the dbt namespace at v1.0, what was removed entirely, and how to migrate an existing project.

dbtdata engineering
Note

Managed ELT Tool Architectures: Fivetran, Airbyte, and dlt

How the three dominant data ingestion tools approach the same problem differently — fully managed connectors, self-hosted open source, and Python-native libraries.

dltdata engineeringetl
Note

dbt-Fivetran Merger and the 2026 Transformation Landscape

How the October 2025 dbt-Fivetran merger reshaped the analytics engineering landscape — unified platform strategy, Core/Cloud divergence, and what it means for tool choice.

dbtdataformdata engineeringcost optimization
Note

Data Observability Scaling Thresholds

Team size and technical complexity thresholds that determine when to move from dbt tests to OSS observability to paid platforms.

dbtdata qualitydata engineeringcost optimization
Note

Unit Testing CASE WHEN Boundary Logic in dbt

Systematic boundary testing for CASE WHEN statements — testing threshold values, just-under values, null handling, and implicit ELSE behavior.

dbttesting
Note

Elementary setup troubleshooting

Fixes for the most common Elementary installation failures: empty reports, missing edr command, BigQuery location errors, tables materialized as views, and Databricks permission issues.

dbtelementarybigquerydatabricks
Note

Unit Testing GA4 Sessionization

How to unit test GA4 sessionization logic in dbt — session boundary detection, cross-midnight sessions, microsecond timestamps, and single-event sessions.

dbtga4bigquerytesting
Note

Claude Code Bang Prefix for Shell Commands

Using the ! prefix to run shell commands directly inside Claude Code — how it saves tokens, speeds up authentication, and keeps your flow uninterrupted

claude codeaidata engineering
Note

dbt Test Failure Severity Framework

A four-tier framework for prioritizing dbt test failures by impact — combining test type, model layer, downstream dependents, and historical context into an actionable severity ranking.

dbtdata qualitytestingautomation
Note

Google Ads Server-Side: Conversion Linker and Enhanced Conversions

How to configure Google Ads conversion tracking server-side — the Conversion Linker tag that manages the FPGCLAW cookie, Enhanced Conversions for hashed user data, and realistic uplift expectations.

google adsga4analyticsdata quality
Note

When to Write dbt Unit Tests

Specific decision criteria for where native dbt unit tests pay off — complex logic scenarios, the incremental model override pattern, and what to skip.

dbttestingdata quality
Note

dbt Documentation People Actually Read

A reading path through writing dbt documentation that gets used — from diagnosing why docs go unread to writing patterns, delivery mechanisms, and the AI quality feedback loop

dbtdata engineeringdata quality
Note

Claude Code Skills Activation

How Claude Code skills work under the hood — keyword matching against YAML frontmatter, the ~20% auto-activation rate, and why skills fit background domain knowledge better than repeatable workflows

claude codeaiautomation
Note

Dataform vs dbt Cost Comparison

The real cost equation between Dataform and dbt — licensing savings vs ecosystem gaps, migration costs, and hidden engineering overhead

dataformdbtbigquerycost optimization
Note

Dataform Testing Limitations

Dataform's built-in assertions cover three scenarios — uniqueness, null checks, and row conditions. Everything else requires custom implementation.

dataformbigquerydbttesting
Note

Soda Data Contract Verification

How Soda's contract engine validates schema, freshness, and quality rules against warehouse tables after loading but before transformation — filling the gap between EL and dbt.

dbtdata qualitydata engineeringtesting
Note

BigQuery On-Demand Billing Mechanics

How BigQuery on-demand pricing actually charges you — columnar billing, the LIMIT clause trap, 10 MB minimums, caching, the free tier, and cross-cloud pricing.

bigquerygcpcost optimization
Note

BigQuery Cost Model

How BigQuery pricing works across on-demand and editions models — bytes billed, slot hours, storage costs, and optimization levers

bigquerygcpcost optimization
Note

BigQuery Slot Usage Monitoring

How to monitor BigQuery slot usage with INFORMATION_SCHEMA, the Slot Estimator, and Cloud Monitoring -- practical queries and tools for capacity planning.

bigquerycost optimizationdata engineering
Note

dbt Attribution Packages Landscape

Open-source dbt packages and Python libraries for production-ready attribution models -- Snowplow, Tasman, Rittman Analytics, ChannelAttribution, and when to build your own

dbtbigqueryanalyticsdata modeling
Note

Deploying dbt Core on Cloud Functions

A step-by-step guide to deploying dbt Core on Google Cloud Functions — repository structure, service account setup, deployment, and scheduling with Cloud Scheduler.

dbtgcpbigquerydata engineering
Note

dbt Package Anti-Patterns

Common mistakes in dbt packages — hardcoded schemas, missing dispatch, tight version constraints, generic model names, table defaults, and missing version bounds.

dbtdata engineeringdata modeling
Note

The Freelance Admin Overhead Problem

Why solo consultants spend 20-30% of their time on non-billable admin, why the standard fixes don't work, and what makes a single agent different from another SaaS subscription.

automationai
Note

generate_schema_name: Environment-Aware Schema Naming in dbt

How to override dbt's generate_schema_name macro so dev environments get prefixed schema names while prod uses clean custom schema names directly.

dbtdata engineeringdata modeling
Note

Campaign Naming and UTM Standardization

How to standardize campaign names across ad platforms using naming conventions, regex parsing, and seed overrides — plus UTM hygiene rules that make cross-platform attribution possible.

dbtgoogle adsdata modelinganalytics
Note

Multi-Client Agent Reporting Architecture

How to structure per-client isolation for OpenClaw reporting workflows — separate cron jobs, credential management at scale, failure containment, and the security tradeoffs of running multiple clients on a single machine.

automationdata engineeringai
Note

Lightdash Metric Types and Definition Syntax

The three categories of Lightdash metrics — aggregate, non-aggregate, and post-calculation — plus column-level vs model-level placement, filters, and display configuration.

dbtanalyticsdata modeling
Note

Dataform for BigQuery

A structured guide to evaluating Dataform as a BigQuery transformation tool — what it is, how it compares to dbt, and when it makes sense

dataformbigquerydbtgcp
Note

Claude Code Status Line Configuration

How to set up Claude Code's status line to display git branch, active model, and context usage — practical setup for analytics engineers

claude codeai
Note

Cloud Workflows Orchestration

GCP Cloud Workflows as a middle-ground orchestration layer between Cloud Scheduler and Cloud Composer — serverless, cheap, and capable enough for multi-step pipelines.

gcpdata engineeringautomation
Note

What dbt docs generate actually produces

The static site artifacts that dbt docs generate creates — manifest.json, catalog.json, index.html — and the flags that control how they are built

dbtdata engineering
Note

Microbatch Automatic Upstream Filtering

How dbt's microbatch strategy automatically filters upstream models by event_time, reducing full table scans — and when to opt out with .render().

dbtincremental processingcost optimization
Note

BigQuery Editions Testing Without Commitment

How to evaluate BigQuery Editions on real workloads before committing — creating a test reservation, rolling back instantly, opting out of org-level reservations, and using the Slot Estimator.

bigquerygcpcost optimizationdata engineering
Note

Build vs. Buy Data Pipelines

A reading path through the shifting economics of managed vs. custom data pipelines — from Fivetran's pricing changes through AI-assisted development with dlt to the hybrid strategy

dltbigquerydata engineeringetl
Note

Claude Code Skill Description Engineering

How to write Claude Code skill descriptions that actually trigger activation — explicit keywords, negative boundaries, and the specificity principle

claude codeaiautomation
Note

BigQuery Cost Attribution with INFORMATION_SCHEMA

Using INFORMATION_SCHEMA queries to find expensive queries, attribute costs by user and dataset, identify unoptimized tables, and build a weekly cost review practice.

bigquerygcpcost optimizationdata engineering
Note

Dagster Components

Dagster's newest major abstraction — YAML-configured objects that generate assets, checks, and schedules with minimal Python, lowering the barrier for SQL-first analytics engineers.

dbtdata engineeringautomation
Note

Prompt Injection and the Lethal Trifecta

Simon Willison's lethal trifecta — why combining private data access, untrusted content exposure, and external communication ability creates a uniquely dangerous attack surface for AI agents handling data work.

aidata engineering
Note

GCP Processing Engine Selection: Dataflow, Dataproc, and BigQuery

When to use Dataflow, Dataproc, Dataproc Serverless, and BigQuery SQL for data transformation on GCP — matched to team expertise and workload type, not arbitrary scale thresholds.

gcpbigquerydata engineering
Note

BigQuery Resource Hierarchy

How BigQuery organizes resources from organization to table level — projects as billing boundaries, datasets as access control units, and naming conventions that scale.

bigquerygcpdata engineering
Note

Star Schema vs One Big Table

When to use entity-separated star schema vs wide denormalized tables in your data warehouse — BigQuery performance characteristics, OBT benchmarks, and the practical answer of building both.

dbtbigquerydata modelingdata engineering
Note

GA4 Parameter Extraction Macro

A reusable dbt macro for extracting GA4 event parameters without row multiplication, including the numeric variant for int/float/double fields.

ga4dbtbigquerydata modeling
Note

Warehouse Attribution Data Sources

The three categories of data required for warehouse-based attribution -- website interactions, ad platform spend, and conversions -- with platform-specific loading patterns and common data quality traps

bigqueryga4dbtanalytics
Note

BigQuery Fair Scheduling

How BigQuery distributes slots among competing queries -- the two-level fair scheduling algorithm, its project-level implications, and why project architecture matters for performance.

bigquerycost optimizationdata engineering
Note

Asset-Centric Orchestration

The paradigm shift from task-based orchestration (what to run) to asset-based orchestration (what data should exist) — why it matters for analytics engineers and how it changes debugging, monitoring, and pipeline design.

data engineeringautomation
Note

dbt Testing Pyramid

The layered testing pyramid for dbt projects -- broad data test coverage at the base, targeted unit tests in the middle, anomaly detection and data diffs at the top.

dbttestingdata quality
Note

CLAUDE.md as Project Memory

How CLAUDE.md gives Claude Code persistent project context — what to include, what to leave out, and why reactive additions beat proactive documentation

claude codeaiautomation
Note

KPI Reporting via Direct Warehouse Queries

Why querying the warehouse directly beats dashboard scraping for scheduled KPI delivery — the BigQuery and Snowflake CLI patterns, how to structure pre-written SQL for agent-driven reporting, and the tradeoffs of the approach.

bigquerysnowflakeanalyticsautomation
Note

LLM Training Data Asymmetry for Tool Use

Why LLMs write better shell commands than MCP tool calls — the training data distribution that makes CLI fluency outperform structured tool-calling for well-established tools.

claude codeai
Note

dbt Documentation Rollout Strategy

A practical week-by-week approach to rolling out dbt documentation standards — starting with model descriptions, adding enforcement incrementally, and using AI tools to close coverage gaps

dbtdata qualityautomation
Note

Choosing Between Fivetran, Airbyte, and dlt

A decision framework for picking the right ELT tool based on team skills, budget, connector needs, and tolerance for operational burden — with practitioner sentiment from the field.

dltdata engineeringetlcost optimization
Note

CLAUDE.md for Analytics Engineering — Hub

Hub note connecting all CLAUDE.md configuration concepts for dbt and BigQuery analytics engineering — project memory, dbt templates, BigQuery specifics, hooks, and slash commands.

claude codedbtaiautomation
Note

Position-Based Attribution Models

U-shaped and W-shaped attribution models that weight credit by journey position — formulas, edge cases, industry weight variations, and BigQuery SQL implementation

bigqueryanalyticsdata modeling
Note

dbt Documentation Freshness

A reading path through keeping dbt documentation accurate as your project evolves — from the case for automation to drift detection, coverage tracking, and a graduated rollout strategy

dbtautomationdata quality
Note

dbt Scheduling Without an Orchestrator

How to run dbt in production without Airflow, Dagster, or Prefect — the practical options from $0/month GitHub Actions to Cloud Run Jobs, when each fits, and when to move on.

dbtgcpdata engineeringautomation
Note

YAML Formatting Options for dbt Descriptions

The four ways to write descriptions in dbt YAML — inline strings, folded scalars, literal scalars, and doc blocks — and when to use each one

dbtdata modelingdata engineering
Note

GA4 BigQuery Timezone Handling

Three different timezone contexts coexist in GA4 BigQuery exports — event_timestamp, event_date, and _TABLE_SUFFIX each use different references that silently break date-range queries.

ga4bigqueryanalyticsdata engineering
Note

Pipeline Enforcement Layer Strategy

The four-layer model for data contract enforcement across the full pipeline — pre-warehouse, post-load, transformation, and continuous observability — with practical adoption ordering.

dbtdltdata qualitydata engineering
Note

dbt Core Open-Source Fundamentals

What dbt Core is, how its CLI-driven workflow operates, the open-source ecosystem that powers it, and the technical profile of teams that choose it.

dbtdata engineering
Note

LinkedIn Ads Analytics Endpoint

The engineering quirks of LinkedIn's adAnalytics endpoint — no pagination, 15K element cap, 20-metric limit per request, query tunneling, cursor pagination migration, and monthly API versioning.

data engineeringetl
Note

BigQuery Regional Architecture

How BigQuery's region model works — multi-region vs. single region, the cross-region join constraint, and how to choose a region you'll live with permanently.

bigquerygcpdata engineering
Note

Data Observability Minimum Viable Stack

The four non-negotiable observability capabilities every data team needs regardless of tooling — primary key tests, freshness monitoring, volume anomaly detection, and actionable alerting.

dbtelementarydata qualitydata engineering
Note

Lightdash Dimension Configuration in dbt YAML

How Lightdash turns dbt column definitions into dimensions — types, display properties, time intervals, and computed additional_dimensions.

dbtanalyticsdata modeling
Note

Google Ads Scripts for BigQuery Export

Using Google Ads Scripts to export performance data directly to BigQuery — how the authentication model works, what the execution limits are, and when this approach beats the alternatives.

google adsbigquerydata engineeringetl
Note

Advanced Claude Code Workflows for dbt

A reading path through Claude Code configuration, testing, documentation, and debugging workflows for analytics engineers working with dbt on BigQuery

claude codedbtbigquerydata engineering
Topic guide

Meta Ads to BigQuery Pipeline — Hub

Map of content for building and maintaining a Meta Ads to BigQuery pipeline — API structure, actions array flattening, attribution windows, iOS signal loss, and operational maintenance.

bigquerydbtdata engineeringetl
Note

dbt Macro Naming Conventions

Naming patterns for dbt macros that make them discoverable, communicative, and well-organized — verb prefixes, descriptive names, internal helper conventions, and the one-macro-per-file rule.

dbtdata modelingdata engineering
Note

Pipeline Alerting Delivery Patterns

How to structure pipeline monitoring alerts — tiered severity routing, Slack vs. Telegram tradeoffs, delivery modes (channel, DM, webhook, silent), and designing alert systems that don't become noise.

dbtautomationdata qualitydata engineering
Note

OpenClaw Architecture and Design Principles

How OpenClaw is built — the Gateway daemon, model-agnostic BYOK design, HEARTBEAT.md proactive loop, and plain-text-first philosophy that makes it feel natural to data people.

aiautomation
Note

Late-Arriving Data and the Lookback Window Pattern

How to handle late-arriving data in dbt incremental models using lookback windows, including window sizing trade-offs and the limits of any lookback approach.

dbtincremental processingdata quality
Note

dbt-to-Dataform Migration Process

The step-by-step process for migrating a dbt project to Dataform — auditing what you have, running the automated tool, converting macros to JavaScript includes, recreating tests as assertions, and setting up orchestration.

dbtdataformbigquerydata engineering
Note

BigQuery Cross-Organization Data Sharing

Patterns for sharing BigQuery data across organizations — agency/client models, Analytics Hub, authorized views, and row/column-level security.

bigquerygcpdata engineering
Note

BigQuery Multi-Environment Patterns

Three patterns for separating dev, staging, and production in BigQuery — separate projects, dataset prefixes, and central data lake with department marts.

bigquerydbtgcpdata engineering
Note

CLOUDSDK_CONFIG for Per-Project gcloud Isolation

How CLOUDSDK_CONFIG isolates all gcloud state per project — credentials, ADC files, active config — and why it's the missing piece for multi-client GCP work.

gcpdata engineeringautomation
Note

Elementary materialization override for dbt 1.8+

Why Elementary requires a materialization override macro in dbt 1.8+ projects, what happens without it, and how to write it correctly for BigQuery and Snowflake.

dbtelementarydata qualitytesting
Note

Claude Code Authentication Options

The two ways to authenticate Claude Code — subscription OAuth and API keys — when to use each, and the precedence rule that trips people up

claude codeai
Note

Time-Decay Attribution Model

Time-decay attribution using exponential decay with a configurable half-life — the formula, choosing half-life by industry, BigQuery SQL implementation, and parameterization

bigqueryanalyticsdata modeling
Note

Meta Ads Insights API Structure

How the Meta Marketing API is organized — the five-level object hierarchy, Insights API as a reporting edge, versioning cadence, authentication models, and rate limit system.

bigquerydata engineeringetl
Note

Try-Heal-Retry pattern

How to add AI-powered remediation to data pipelines using structured LLM output, Pydantic schemas, and circuit breakers, with production examples using Claude.

claude codedata engineeringaiautomation
Note

MCP Client Primitives

The three capabilities MCP clients expose to servers — sampling (server-requested LLM completions), elicitation (server-requested user input), and roots (filesystem boundaries) — and when they matter for data engineering.

mcpdata engineeringai
Note

Elementary report sections

What each section of the Elementary HTML report shows and when to use each one during a data quality review.

elementarydbtdata qualitytesting
Note

dbt-utils Generic Tests

Full reference for dbt-utils generic tests: YAML syntax, the Fusion arguments: key change, group_by_columns support, and when to use each test.

dbttestingdata quality
Note

Layered AI Stack for Analytics Engineering

The mental model of thinking about AI tools in layers — IDE, coding agent, orchestration, review — rather than choosing a single tool for everything

claude codedbtaiautomation
Note

Terminal Safety for Beginners

Which terminal commands are safe, which are dangerous, how to read error messages, and the keyboard shortcuts that save you when something goes wrong

claude codeai
Note

Consent Mode Impact on Identity Resolution

How GA4 Consent Mode V2 changes what identity data reaches BigQuery — cookieless pings without identifiers, the same-page backstitch nuance, and filtering consented data for stitching pipelines.

ga4bigqueryanalyticsdata quality
Note

dbt Unit Test Edge Case Patterns

Three essential edge case patterns for dbt unit tests — null handling, empty tables with format: sql, and date boundary testing.

dbtbigquerytesting
Note

dbt-expectations Hub

Hub note for dbt-expectations — setup, test reference, conditional filtering, severity tuning, BigQuery implementation patterns, and the unit test vs data test distinction.

dbtdata qualitytesting
Note

AI Agent Data Quality: What Works Today vs. What's Aspirational

An honest assessment of which AI agent capabilities for dbt data quality are production-ready, which require significant work but are achievable, and which are still too unreliable to depend on.

dbtelementarydata qualityautomation
Note

MCP Apps for Data Engineers

A reading path through MCP Apps — the January 2026 extension to MCP that renders interactive HTML visualizations directly inside AI client conversations.

mcpclaude codeaianalytics
Note

BigQuery Partitioning Configuration Patterns

Domain-specific partitioning and clustering configurations for BigQuery in dbt -- event data, marketing, multi-tenant SaaS, and IoT patterns with rationale.

bigquerydbtdata engineeringdata modeling
Note

Data Contracts Hub

Hub note connecting garden notes on data contracts — definitions, specifications, ownership, tooling, validation layers, and adoption challenges.

dbtdata qualitydata engineering
Note

BigQuery Idle Slot Sharing

How idle slot sharing works in BigQuery Enterprise editions -- requirements, configuration, preemption behavior, and when to disable it.

bigquerycost optimization
Note

dbt Three-Layer Architecture

How the base, intermediate, and mart layers organize a dbt project, what belongs in each, and how data flows between them.

dbtdata modelingdata engineering
Note

GA4 Session Key Construction

Why ga_session_id alone fails as a session identifier, how to build the correct composite key, and the edge cases that produce null sessions.

ga4bigquerydata modelinganalytics
Note

Triangulated Marketing Measurement

Why resilient marketing measurement combines three approaches -- multi-touch attribution for daily optimization, media mix modeling for strategic allocation, and incrementality testing for causal validation

bigqueryga4analytics
Note

GCP IAM Least Privilege for Data Teams

A sequenced guide to auditing and fixing IAM debt on GCP data platforms — from surfacing over-permissioned principals to implementing policy tags and row-level security.

gcpbigquerydata engineering
Note

CI/CD Data Quality Testing in dbt

How to integrate data quality testing into CI/CD pipelines — Slim CI with state:modified+, GitHub Actions workflows, and tools like Datafold and Recce for regression detection.

dbtdata qualitytestingautomation
Note

Debugging Custom dbt Materializations

Common errors in custom dbt materializations, what causes them, and how to test materializations systematically before deploying to production.

dbtbigquerydata engineeringtesting
Note

dbt Materialization Anatomy

The six-step structure every dbt materialization follows — setup, pre-hooks, main SQL, post-hooks, cleanup, and return — plus the key objects and adapter methods.

dbtdata engineeringdata modeling
Note

dbt Doc Block Syntax and Reuse Patterns

How dbt doc blocks work — syntax, naming rules, cross-package references, and patterns for writing column and model descriptions once and reusing them across your project

dbtdata modelingdata engineering
Note

dbt Quality Morning Summary Pattern

A two-cycle design for automated dbt quality reporting — daily morning summaries with Slack threading and follow-up capability, plus a weekly digest that surfaces patterns individual days miss.

dbtdata qualityautomationai
Note

AI-Generated SQL Failure Modes

Why AI-generated SQL is dangerous — it runs without errors but returns wrong results. Research on temporal filter inconsistencies, join failures, and the confidence problem.

dbtaidata quality
Note

dbt Documentation with Claude Code

A systematic approach to dbt documentation using Claude Code — the codegen-plus-AI pattern, docs blocks for consistency, lineage diagrams, and slash commands for automation

dbtclaude codeautomationai
Note

OpenClaw GA4 Skill Integration

How to use community GA4 skills from ClawHub to pull analytics metrics into OpenClaw — the two main options, what each extracts, and how to feed the output into scheduled reporting.

ga4analyticsautomationai
Note

HubSpot Deal Stage Modeling

Why deal stage transitions live in DEAL_STAGE not DEAL_PROPERTY_HISTORY, how to use the is_closed and label columns correctly, and patterns for time-in-stage and pipeline analysis.

dbtbigquerydata modelinganalytics
Note

Elementary for dbt: setup guide

A sequenced map of notes covering Elementary installation from scratch -- dbt package, materialization override, CLI profile configuration, and troubleshooting.

dbtelementarydata qualitytesting
Note

Server-Side Tracking Data Quality Evidence

The quantitative case for server-side tracking — the 41% average data quality improvement, case studies from Finobo, Forward Media, and seoplus+, ad platform Conversions API adoption, and the cost-benefit calculation that has flipped.

ga4google adsanalyticsdata quality
Note

Dataform-to-dbt Migration Decision Criteria

When migrating from Dataform to dbt makes sense, when it doesn't, and the realistic cost-benefit calculation.

dbtdataformbigquerydata engineering
Note

dbt Model Versioning

How dbt model versions work — breaking vs non-breaking changes, the state:modified selector, version integers, deprecation dates, and the friction points.

dbtdata modelingdata quality
Note

Cross-Platform Ad Metric Comparability

Why only five metrics can be meaningfully compared across ad platforms, how to handle platform-specific metrics, and conversion configuration details that determine what your 'conversions' column actually means.

dbtgoogle adsdata modelinganalytics
Note

GA4 Identity Stitching Techniques

The four SQL patterns for resolving GA4 anonymous-to-known user identity — last-touch, first-touch, full backstitch, and session-scoped — with a decision framework for choosing between them.

ga4bigqueryanalyticsdata modeling
Note

Cascading Agent Pattern

The architecture where an always-on monitoring agent detects issues and triggers a coding agent to investigate and fix them — how OpenClaw and Claude Code hand off work

claude codedbtaiautomation
Note

dlt Environment Setup

Setting up a dlt project from scratch — Python virtual environment, installation, dlt init, and the project scaffold it creates.

dltdata engineeringetl
Note

Google Ads ClickType Impression Trap

Why Google Ads DTS stats tables silently inflate impression counts 3-6x, and the exact SQL filter that fixes it without breaking click counts.

google adsbigquerydata qualityanalytics
Note

dbt Mesh Governance Triad

How contracts, access controls, and model versioning combine in dbt Mesh to turn models into data products — and which models actually deserve that treatment.

dbtdata modelingdata qualitydata engineering
Note

Custom MCP Server Decision Criteria

When to build a custom MCP server versus using an existing one — the build-vs-browse decision framework for data engineering teams.

mcpdata engineering
Note

Consent Mode v2 Parameter Architecture

The four Consent Mode v2 parameters, how upstream browser controls differ from downstream server instructions, and the legal mandate that forced the change.

ga4google adsanalyticsdata quality
Note

dlt for AI-Assisted Pipeline Development

Why dlt's Python-native, declarative design maps well to AI-assisted development — the REST API builder, BigQuery-specific features, LLM-friendly docs, and production results

dltbigquerydata engineeringetl
Note

Slack KPI Summary Format for Agent-Delivered Reports

A practical template for agent-generated Slack KPI summaries — directional arrows, week-over-week structure, percentage points vs. percentages, and how to handle the LLM math reliability problem in the output layer.

analyticsautomationai
Note

GA4 session_start Event Unreliability

Why counting session_start events produces wrong session counts in GA4 BigQuery data, and the correct approach using distinct session IDs.

ga4bigqueryanalyticsdata quality
Note

MCP Tool Design Patterns

How to design MCP tools that work well with AI — docstrings as descriptions, Pydantic models for structured output, and input validation with schemas.

mcpdata engineering
Note

Identity Resolution Monitoring

Key metrics and anomaly detection SQL for monitoring a GA4 identity stitching pipeline — stitch rate, consolidation rate, shared device exposure, and week-over-week change alerts.

ga4bigquerydbtanalytics
Note

dbt-to-Dataform Migration Hub

Hub note for migrating from dbt to Dataform — the decision, the concept mapping, the procedural steps, and what you'll lose. For BigQuery teams evaluating the switch.

dbtdataformbigquerydata engineering
Note

dbt-expectations BigQuery Implementation Patterns

Real-world dbt-expectations implementation on BigQuery — complete GA4 and advertising data quality YAML, test placement by DAG layer, and a practical starting checklist.

dbtbigqueryga4data quality
Note

Ad Platform API Landscape

API characteristics, authentication models, and engineering gotchas for Google Ads, Meta, LinkedIn, Microsoft, TikTok, Pinterest, and Twitter ad platforms

google adsdata engineeringetl
Note

Self-healing risk tiering

A framework for deciding which pipeline failures can self-heal automatically, which need human approval, and which should never be auto-remediated.

data engineeringdata qualityai
Note

dbt Testing Anti-Patterns

Four common testing mistakes in dbt projects -- over-testing, happy-path-only coverage, drifting thresholds, and testing warehouse functions -- and what to do instead.

dbttestingdata quality
Note

OpenClaw Persistent Memory Model

How OpenClaw's Markdown-based persistent memory differs from session-based tools, what it enables for long-running data monitoring, and how memory files work in practice.

aiautomationdata engineering
Note

Cloud Scheduler OIDC Authentication for HTTP Triggers

How Cloud Scheduler authenticates to secure HTTP endpoints using OIDC tokens — the service account requirements, the gcloud setup, and the pattern for Cloud Functions and Cloud Run.

gcpdata engineeringautomation
Note

HubSpot Associations as Bridge Tables

HubSpot's many-to-many association model requires bridge tables at every layer. How to model them correctly, handle fan-out, and resolve the primary company problem.

dbtbigquerydata modelingdata engineering
Note

dbt Fusion Package Compatibility

How the dbt Fusion engine (v2.0) affects package compatibility — version bounds, manifest format changes, the Fusion badge, and how to prepare your project and packages for migration.

dbtdata engineering
Note

BigQuery Partition Pruning Patterns

How to combine partitioning and clustering in BigQuery for maximum scan reduction, including anti-patterns that silently defeat pruning.

bigquerycost optimizationdata engineering
Note

dbt Testing Strategy

Hub note for building a complete dbt testing strategy — taxonomy, layer placement, unit test selection, alert routing, and package ecosystem.

dbttestingdata quality
Note

GCP Auth Constraints for AI Coding Agents

How Claude Code, Codex, and Cursor each handle GCP authentication — and where each one breaks when tokens expire, contexts conflict, or interactive flows are required.

gcpclaude codeaiautomation
Note

Consent Mode Server-Side GTM Propagation

How consent signals travel from the web container to server-side GTM via gcs and gcd parameters, and why non-Google vendor tags require manual consent enforcement.

ga4google adsanalyticsdata quality
Note

BigQuery Job Failure Monitoring with INFORMATION_SCHEMA

SQL patterns for monitoring BigQuery job failures and detecting cost anomalies using INFORMATION_SCHEMA.JOBS — with filtering strategies for multi-project setups.

bigquerygcpdata qualitycost optimization
Note

Data Observability Build vs. Buy

A reading path through the data observability decision — from the tool landscape through scaling thresholds, ML vs statistical detection, TCO, and the minimum viable stack.

dbtelementarydata qualitydata engineering
Note

BI Tool Self-Service Models

Three different approaches to self-service BI: governed exploration (Lightdash), visual query builder (Metabase), and LookML-powered Explore (Looker). How to match the model to your users.

dbtanalyticsdata modeling
Note

OpenClaw Pipeline Monitoring

A reading path through the OpenClaw pipeline monitoring tutorial — cron scheduler mechanics, writing monitoring skills, tiered alerting delivery, BigQuery failure checks, and Snowflake cost monitoring.

dbtbigquerysnowflakeautomation
Note

GA4 Sessionization Hub

Hub note connecting all concepts involved in building session tables from GA4 BigQuery event data.

ga4bigquerydbtdata modeling
Note

MetricFlow CLI querying

How to query MetricFlow metrics from the CLI in dbt Core (mf) and dbt Cloud (dbt sl): group-by, filters with Jinja dimension syntax, multi-metric queries, and the semantic manifest.

dbtanalyticsdata modeling
Topic guide

GA4 User Identity

Map of content for GA4 identity resolution in BigQuery — from understanding the two identifier types through stitching techniques, production pipelines, and ongoing monitoring.

ga4bigquerydbtanalytics
Note

AI Limitations in Data Engineering

A reading path through the five core limitations of AI in data engineering — SQL failure modes, the context gap, architectural judgment, the production gap, and context engineering as the response.

dbtaidata engineering
Note

dbt Docs Customization and Deployment

A reading path through customizing and deploying dbt docs beyond localhost — from understanding the build artifacts to choosing a hosting platform, automating deployment, and knowing when to replace the default frontend

dbtdata engineering
Note

dbt Docker Containerization

Patterns for containerizing dbt Core for production — multi-stage Dockerfiles, version pinning, Artifact Registry, and the two-repository strategy that separates transformation logic from infrastructure.

dbtgcpdata engineeringautomation
Note

Data Contract Tooling Ecosystem

The landscape of data contract tools in 2026 — dedicated contract tools, quality frameworks with contract support, and governance platforms.

dbtdata qualitydata engineering
Note

dbt Attribution Comparison Pattern

How to structure a dbt project for multi-model attribution — running first-touch, last-touch, linear, position-based, and time-decay models in parallel with a union comparison layer

dbtbigqueryga4data modeling
Note

AI Personal CRM Pattern

Using an AI agent to auto-scan email and calendar for contact relationship tracking — how the pattern works, what SQLite with vector embeddings enables, and why this is the highest-risk integration to configure carefully.

automationai
Note

2-Layer RBAC with Google Groups

Bind IAM roles to Google Groups representing job functions, not individual users — the pattern that makes onboarding, offboarding, and permission audits tractable.

gcpbigquerydata engineering
Note

Elementary report hosting

How to host Elementary HTML reports on S3, GCS, or Azure Blob Storage so the whole team has access, and how to automate report generation in CI pipelines.

elementarydbtdata qualityautomation
Note

Incremental Predicates for dbt Merge

How incremental_predicates limit destination table scans during dbt merge operations, turning full table scans into partition-pruned reads.

dbtbigquerysnowflakeincremental processing
Note

OpenClaw Security Risks — Hub

A reading map for the OpenClaw security risks guide — documented incidents, CVEs, regulatory warnings, supply chain attacks, context window safety failures, and what data teams specifically need to know.

claude codeaiautomationdata engineering
Note

dbt BigQuery Configuration

How to configure dbt for BigQuery — profiles.yml setup, authentication methods, generate_schema_name, job labels for cost attribution, and cost control settings.

dbtbigquerydata engineeringcost optimization
Note

Shapley Value Attribution

How cooperative game theory's Shapley values produce provably fair attribution by calculating each channel's average marginal contribution across all possible channel coalitions

bigqueryanalyticsdata modeling
Note

Claude Code Strengths and Limitations for Data Work

Where Claude Code delivers real value in data engineering — boilerplate, multi-file changes, pattern replication — and where it struggles with novel logic, ambiguity, and over-engineering.

claude codedbtdata engineeringai
Note

MCP Discovery Resources

Where to find MCP servers — the official registry, community directories, and how to evaluate what you find before installing.

mcpaidata engineering
Note

MetricFlow Metric Types

The five metric types in dbt MetricFlow — simple, cumulative, derived, ratio, and conversion — with syntax, use cases, and gotchas for each

dbtdata modelinganalytics
Note

Data Contract Ownership Models

Producer-defined vs consumer-defined data contracts — why who writes the contract determines whether the initiative succeeds.

dbtdata qualitydata engineering
Note

GA4 Engagement Event Query Recipes

Production-ready BigQuery SQL for GA4 engagement events — page views, scroll depth, outbound clicks, file downloads, and video engagement funnels.

ga4bigqueryanalytics
Note

GA4 First dbt Models Tutorial

Hub note for building your first GA4 dbt models — from understanding the raw event schema through base, intermediate, and mart layers.

ga4dbtbigquerydata engineering
Note

Advertising Data in the Warehouse

Hub note for the complete guide to centralizing advertising data — from the measurement problem through extraction, pipeline challenges, and dbt transformation patterns

bigquerydbtgoogle adsdata engineering
Note

Unit Testing Conversion Funnels in dbt

How to unit test funnel analysis models in dbt — step-over-step conversion rates, user drop-off tracking, and the step-skipping edge case.

dbtga4testinganalytics
Note

Organizing Lightdash Metrics at Scale

How to keep a large Lightdash implementation navigable — groups, group_details, the Metrics Catalog with Spotlight categories, and reusable parameters for values that change across deployments.

dbtanalyticsdata modeling
Note

Migrating Incremental Models to Microbatch

How to convert traditional dbt incremental models to the microbatch strategy — step-by-step migration, side-by-side code examples, and first-run considerations.

dbtincremental processingdata engineering
Note

dbt Intermediate Layer Patterns

What belongs in dbt intermediate models — joins, business logic, window functions — and the critical rule of never reducing grain.

dbtdata modelingdata engineering
Note

dbt Ad Reporting Patterns

How to model advertising data in dbt — the dbt_ad_reporting package, cross-platform UNION patterns, platform-specific normalization, and reconciliation testing

dbtgoogle adsdata modelinganalytics
Note

Ad Platform Attribution Bias

Why every ad platform overcounts conversions, how walled-garden incentives create measurement gaps, and what only becomes visible when ad data lives in the warehouse

google adsanalytics
Note

Measuring Data Latency Before Choosing an Incremental Strategy

How to profile the gap between event time and load time in your source tables, and use that distribution to size lookback windows and choose the right incremental strategy.

dbtbigqueryincremental processingdata quality
Note

Elementary CLI profile configuration

How to configure the Elementary CLI (edr) profile for BigQuery, Snowflake, and Databricks -- including the gotchas that differ from your dbt profile.

dbtelementarybigquerysnowflake
Note

MetricFlow time spine

The MetricFlow time spine is a continuous date table used for cumulative metrics and time series gap filling. How to create it, configure it, and understand when it's required.

dbtdata modelinganalytics
Note

Lightdash's Semantic Layer vs MetricFlow

How Lightdash's native metric layer differs from MetricFlow — simpler syntax, tighter coupling, no cross-platform API — and when the tradeoffs favor each approach.

dbtanalyticsdata modeling
Note

dbt-expectations Test Reference

A categorized reference of the highest-value dbt-expectations tests — table-level, pattern, range, multi-column, and completeness — with BigQuery-ready YAML examples.

dbtdata qualitytesting
Note

Markov Attribution SQL Implementation

SQL patterns for extracting journey paths and calculating transition probabilities in BigQuery, the data preparation layer for Markov chain attribution

bigquerydbtanalyticsdata modeling

Let’s talk.

Tell me what’s broken. I’ll reply within two working days with whether I can help — and if I can’t, I’ll point you somewhere useful.

Get in touch →