A garden of working notes.
Short, atomic notes on analytics engineering, dbt, BigQuery, marketing data, and AI agents. Topic guides stitch them into starting points — pick one and follow the threads. Filter by domain or topic, or just browse.
Terminal Fundamentals
What the terminal actually is, how it differs from a shell, and the working directory mental model that makes navigation intuitive
Data Contract Anti-Patterns
Where data contract initiatives go wrong: misplaced enforcement, paper-only contracts, one-size-fits-all implementations, and unfunded ownership.
HubSpot to BigQuery Pipeline Hub
All the moving parts for a HubSpot-to-BigQuery pipeline with dbt: associations, lifecycle stages, deal stages, property history, ingestion tools, and the dbt_hubspot package.
How Lightdash Connects to Your dbt Project
The three mechanisms for connecting Lightdash to a dbt project — Git repository integration, CLI deployment, and CI/CD automation — and how Lightdash generates a BI layer from dbt YAML.
Data Contract Definition
What a data contract is, how it differs from schema tests and data quality checks, and why the 'non-consensual API' framing matters.
MCP Data Quality Server Pattern
A practical MCP server pattern for data quality — running validation checks, retrieving quality scores, and surfacing tables that need attention.
BigQuery Row Access Policies
Dynamic row-level filtering using CREATE ROW ACCESS POLICY — replace per-segment views with policies that apply automatically based on querying user identity.
Attribution Model Disagreement as Signal
Why running multiple attribution models in parallel reveals more than any single model, and how to use the disagreement between them to communicate uncertainty and drive better decisions
GA4 BigQuery Schema Hub
Hub note connecting all concepts needed to understand and query the GA4 BigQuery export schema — table types, nested structures, gotchas, and query patterns.
Contract-First Development in dbt
Defining the contract before writing the SQL — the API design analogy, the workflow, and how ODCS + Data Contract CLI can generate dbt model YAML.
IAM Drift Monitoring for GCP
Catch IAM debt before it accumulates — IAM Recommender, INFORMATION_SCHEMA job monitoring, and audit log queries to detect permission drift quarterly.
BI Tool Self-Hosting and Licensing
How MIT, AGPL, and proprietary licensing affect what you can do with self-hosted BI tools — feature gates, copyleft obligations, and what 'free' actually means for Lightdash, Metabase, and Looker.
BigQuery Cost Optimization
A structured guide to BigQuery cost optimization covering the cost model, query patterns, dbt configurations, pricing models, storage billing, and governance.
Lightdash in Production: Kubernetes Deployment
Moving Lightdash from Docker Compose to Kubernetes with the community Helm chart — production checklist, external dependencies, authentication options, and upgrade strategy.
MCP SDK Selection for Data Engineering
Choosing between the Python and TypeScript MCP SDKs — installation, capabilities, and which one fits your data engineering team.
Analytics Engineer Skills in the Agent Era
Seven skills worth investing in now that agents handle execution — AI orchestration, specification engineering, critical code review, domain expertise, governance, systems thinking, and tool fluency.
Microbatch Backfill and Full Refresh Protection
How to use dbt's built-in microbatch backfill commands, retry failed batches, and protect large incremental tables from accidental full refreshes.
dbt Documentation CI Enforcement
Tools and patterns for enforcing dbt documentation completeness in CI — dbt-coverage, dbt-checkpoint, dbt-score, and dbt-bouncer
Google Ads DTS dbt Integration
How to model Google Ads BigQuery DTS tables in dbt — source configuration, incremental strategy for partition replacement, and conversion lookback windows.
dlt and BigQuery Integration
How dlt loads data into BigQuery — the two loading strategies (streaming vs. GCS staging), the bigquery_adapter for partitioning and clustering, nested JSON normalization, and the metadata tables dlt creates.
dbt observe-fix remediation pattern
How to embed self-healing logic directly in the dbt DAG by detecting problems in base models and applying fixes in downstream layers.
BigQuery Cost Governance Guardrails
Query-level limits, project-level quotas, authorized views, and access patterns that prevent expensive BigQuery mistakes before they happen.
Elementary edr monitor alerting
How edr monitor works, how it differs from edr report, and how to configure alert metadata in model YAML to control who gets notified and when.
Baseline vs. Autoscaling Slots in BigQuery
How baseline and autoscaling slots work in BigQuery Editions -- guaranteed capacity vs. elastic scaling, the 60-second autoscale window, and slot usage priority.
dbt-project-evaluator for documentation enforcement
How dbt-project-evaluator and dbt_meta_testing enforce documentation completeness in CI — materializing coverage as models and setting folder-level requirements
Attribution Dashboard Design
How to design attribution dashboards for multiple audiences — essential metrics, audience-tiered hierarchy, Looker Studio implementation patterns, and working around BI tool limitations
Analytics Engineer as Director of AI
The role identity shift as agents take over execution — from producing analytical work to directing it. What stays human, what moves to agents, and how to think about your own value in the transition.
Lightdash + dbt YAML: Metrics Reference Hub
Hub note for Lightdash metric configuration in dbt YAML — dimensions, metric types, joins, and scaling organization.
Elementary alert fatigue reduction
How to configure suppression intervals, alert grouping, and sampling controls in Elementary to keep signal-to-noise ratio high as test suites grow.
Modern BI Landscape
Hub note for understanding BI in 2026 — the semantic layer, metrics-as-code, headless BI, dbt centrality, and how to choose a tool
dbt Package CI/CD
How to set up CI/CD for dbt packages — matrix testing across warehouses and dbt versions with GitHub Actions, credential management, and the integration test workflow.
dbt Doc Block File Organization
How to organize doc block files in a dbt project — per-directory, per-model, centralized, and hybrid approaches with practical tradeoffs
Medallion Lakehouse on GCP
How the bronze-silver-gold medallion architecture maps to BigQuery table types, with BigLake Iceberg for flexibility and native tables for performance.
Fivetran dbt Packages for CRM
What dbt_salesforce and dbt_hubspot provide out of the box — model coverage, configuration, pass-through columns, history mode support, and naming convention tradeoffs.
BigQuery Editions and Slot-Based Pricing
When to switch from on-demand to slot-based pricing, how autoscaling works, committed use discounts, and a feature comparison across BigQuery editions.
The Rule of Three for dbt Macros
Why you should wait until the third occurrence of a pattern before extracting a dbt macro — and what goes wrong when you don't.
GA4 event_params Type Detection
How GA4 auto-detects parameter types across string_value, int_value, and double_value fields — and the defensive COALESCE pattern when the type isn't guaranteed.
GA4 Reporting Identity Modes
How GA4's three reporting identity modes (Blended, Observed, Device-based) apply user resolution in the interface — and why none of that logic reaches BigQuery.
dbt Source Schema Validation
How to validate source schema in dbt when contracts can't reach — using dbt-expectations on sources to catch column drift before transformation runs.
dlt RESTClient Mechanics
How dlt's RESTClient works — instantiation, the paginate() method, key parameters, and built-in error handling with retry and backoff.
OpenClaw Reporting Assistant
A reading map for the OpenClaw client KPI reporting guide — GA4 skill integration, dashboard scraping tradeoffs, direct warehouse queries, multi-client architecture, and Slack summary formatting.
Entity-Centric Naming for dbt Intermediate Models
Why intermediate models should be named for the entity they represent, not the transformation they perform — and the self-documenting join notation that makes it work.
Elementary alerting hub
A reading path through Elementary's alerting system -- from the edr monitor command through Slack/Teams setup, filter-based routing, alert fatigue reduction, and on-call strategy.
Elementary HTML report generation
How the edr report command works, which flags matter in practice, and patterns for generating targeted reports for different audiences.
Claude Code Slash Commands for dbt
How to create custom slash commands in Claude Code that automate repeatable dbt workflows — test generation, model documentation, and prompt validation
MCP Server Project Setup
Step-by-step project initialization for a custom MCP server — directory structure, dependencies, client installation, and the typical project layout.
EL Tool Schema Contract Modes
How dlt, Fivetran, and Airbyte handle schema changes during extraction and loading — from dlt's granular freeze/evolve/discard modes to Fivetran's blunt blocking settings.
Data Observability Tool Landscape
A reference comparison of data observability tools in 2026 — Elementary, Monte Carlo, Soda, Bigeye, Datafold, and Atlan — covering capabilities, pricing, and positioning.
dbt Cross-Database Array Operations
How array syntax diverges across BigQuery, Snowflake, and Databricks — UNNEST vs LATERAL FLATTEN vs EXPLODE — and dispatch macros to handle it.
Dataform-to-dbt Migration Hub
Hub note connecting all garden notes related to migrating from Dataform to dbt — decision criteria, concept mapping, templating differences, and validation.
Floating-Point Precision in Data Comparison
Why exact equality fails for floating-point values in data comparison, and practical strategies for handling precision mismatches.
MCP Context Window Overhead
The concrete token cost of MCP tool definitions in an LLM's context window — measurements from Anthropic and practitioners, and why it matters for long sessions.
BI Tool Migration and Portability
Switching costs between BI tools depend on where your metric definitions live. LookML is proprietary and expensive to migrate away from. dbt YAML and Metabase's per-question definitions are more portable.
GA4 Event Data Structure
How GA4 structures event data in BigQuery — the event model, nested parameters, and the patterns you need to query it effectively.
Data Contract Adoption Challenges
Why data contract initiatives fail — the execution gap between contract-as-documentation and contract-as-enforcement, and the cultural change that matters more than the YAML.
Zero-Downtime Table Materialization in dbt
A custom dbt materialization that builds to a temp name, validates row counts, then swaps via rename — keeping the old table queryable until the new one is confirmed ready.
Airbyte Pricing and Self-Hosting Costs
Airbyte's February 2025 capacity-based pricing model and the hidden infrastructure costs of self-hosting — NAT Gateway, Kubernetes overhead, and what 'free' actually costs.
GitHub Actions for dbt Scheduling
Using GitHub Actions scheduled workflows as a zero-infrastructure dbt runner — what it covers well, where it falls short, and when to use it over Cloud Run.
GTM Server-Side Hosting on AWS
How to host the GTM Server-Side tagging container on AWS using ECS Fargate, why App Runner costs more, and why Lambda is architecturally incompatible.
Fivetran MAR Pricing Shift
How Fivetran's March 2025 shift to per-connector MAR pricing broke the economics of managed ELT — bulk discount elimination, 4-8x cost increases, and the marketing data problem
Browser Cookie Restrictions in 2026
How Safari ITP, Firefox Total Cookie Protection, and Chrome handle tracking cookies differently in 2026 — and why the combined effect means client-side tracking misses 20-40% of visitors.
dbt Unit Test CI/CD Workflow
A production-ready GitHub Actions workflow for running dbt unit tests on BigQuery — unique CI datasets, the --empty flag, cost optimization, and production exclusion.
Stale documentation is worse than missing documentation
Why outdated documentation that looks complete causes more damage than obvious gaps — the false confidence problem in data teams
n8n RSS-to-Notion Workflow
How to build an automated RSS reader that fetches, cleans, and stores articles in Notion using n8n, Jina AI, and ChatGPT.
GA4 Identity Graph in BigQuery
How to build a production identity graph from GA4 BigQuery data — mapping user_id to all associated devices, detecting shared devices and anomalies, and structuring forward and reverse lookups.
When to Write Custom dbt Materializations
Decision framework for when custom dbt materializations are worth the maintenance burden versus post-hooks, macros, or built-in incremental strategies.
OpenClaw Persistent Memory for dbt Context
How to load dbt project documentation, schema descriptions, and failure history into OpenClaw's persistent memory so that monitoring reports include business context rather than just technical output.
BigQuery Data Lake Patterns
A reading guide for understanding BigQuery data lake architecture: table types, the medallion lakehouse pattern, catalog strategy, performance, cost optimization, and common mistakes.
dbt Project Structure and Naming
How to organize a dbt project — folder structure, model naming conventions, layer responsibilities, and dbt_project.yml configuration patterns
BigQuery ML for Lead Scoring
Train a logistic regression or boosted tree model to predict lead conversion directly in BigQuery SQL — including the TRANSFORM clause, class imbalance, and how to evaluate model quality.
Salesforce Ingestion Tool Selection
Choosing between Fivetran, Airbyte, dlt, Hevo, and custom Python for Salesforce extraction — connector mechanics, cost realities, and the AppExchange dispute.
Headless BI Pattern
The architectural pattern of decoupling the semantic layer from visualization — exposing metrics via APIs so any frontend, AI agent, or application can consume governed data
GA4 dbt Project Configuration
The dbt_project.yml setup for a GA4 project — variable-driven configuration, folder-level materializations, and the project variables that make the template reusable.
dbt Hub Publishing
How to publish a dbt package to the dbt Hub — requirements, the registration process, hubcap automation, and best practices for version management.
dbt Test Severity and Performance Tuning
How to configure dbt test severity levels, optimize expensive tests on BigQuery, and structure test execution for cost-effective data quality.
BigQuery MCP Toolbox Setup
Installing and configuring Google's open-source MCP Toolbox for Databases — the self-hosted option for connecting BigQuery to AI assistants with ADC authentication.
Salesforce vs HubSpot Data Models
How Salesforce and HubSpot structure CRM data differently — metadata-driven relational models vs many-to-many associations — and what that means for warehouse modeling.
The full_refresh: false Guard in dbt
When and why to set full_refresh: false on dbt incremental models — preventing accidental multi-hour rebuilds while keeping intentional full refreshes possible.
dlt Authentication Patterns
The authentication strategies dlt provides for API pipelines — bearer tokens, API keys, OAuth2 client credentials — and how to extend them for non-standard flows.
MCP Transport Configuration
Practical configuration for MCP's two transport modes — stdio for local development and streamable HTTP for production deployment.
BigQuery Data Lake Common Mistakes
Three anti-patterns that cause the most problems in BigQuery data lake implementations: missing metadata caching, skipped partition filters, and over-engineered architectures.
Google Workspace CLI for AI Agents (Hub)
Hub note for the gws CLI ecosystem — the tool itself, agent-first design principles, OAuth setup, CLI vs MCP tradeoffs, and Sheets as a data source.
Dataform Dynamic Model Generation
How Dataform's JavaScript enables programmatic DAG construction — generating dozens of models from a single loop — and what dbt teams do instead.
Context Window Compaction and Agent Safety
How LLM context window compaction causes AI agents to lose or deprioritize stop commands during long-running tasks — and why bulk data operations are the highest-risk scenario.
dlt Secrets Management
How dlt's configuration hierarchy keeps credentials out of code — the priority order, secrets.toml for local development, environment variables for CI/CD, and vault integrations.
dbt Package Anatomy
What makes a dbt package different from a regular project — the three design principles, standard directory structure, and dbt_project.yml configuration for reusable packages.
Unified Ad Model Downstream Patterns
What becomes practical once you have a unified cross-platform ad model — blended ROAS, budget pacing, and Marketing Mix Modeling data preparation.
CRM Modeling Patterns in dbt
How to apply the three-layer dbt architecture to Salesforce and HubSpot data — base model conventions, intermediate enrichment, mart design, and incremental strategies.
dbt Testing Decision Framework
A three-question framework and decision tree for choosing the right dbt testing approach — unit tests, generic tests, singular tests, dbt-expectations, Elementary, or dbt-audit-helper.
Cloud Functions as a dbt Execution Environment
When and why to use Google Cloud Functions to run dbt Core — how it compares to Cloud Run Jobs, what it's good at, and where it falls short.
Fivetran-dbt Merger and Orchestration Independence
Why the October 2025 Fivetran-dbt merger makes external orchestration more strategically important — vendor optionality, platform lock-in risk, and the case for controlling your orchestration layer.
dbt Docs Markdown Capabilities
What Markdown works in dbt docs and what does not — supported syntax, YAML scalar styles, image embedding, cross-referencing models, and known limitations
GA4 User Backstitching
How to retroactively apply GA4 user_id to anonymous sessions in the warehouse — the SQL pattern, shared device handling, and when backstitching is worth the complexity.
Build vs. Buy Data Pipeline Economics
The three converging shifts that flipped the build-vs-buy calculation for data pipelines — pricing changes, AI-assisted development velocity, and open-source maturity
OpenClaw vs Claude Code vs Cursor for Data Work
A clear-eyed comparison of three AI tools data people actually use — what each is for, where each falls short, and why the best practitioners run all three as a layered stack.
GA4 dbt Unnesting Layer Architecture
How to structure a dbt project for GA4 unnesting — base layer for parameter extraction, intermediate for event-specific models, mart for analytics-ready aggregations.
Pipeline retry and catch-up patterns
How to configure retries, exponential backoff, and catch-up mechanisms in data pipelines so that transient failures resolve themselves without human intervention.
BigQuery HyperLogLog Sketches
How HyperLogLog++ sketches in BigQuery enable composable, approximate distinct counts at a fraction of the cost of exact counting.
Attribution Touchpoint Table Design
How to design and build the touchpoint table that all attribution models consume -- field requirements, identity considerations, and the intermediate dbt model that maps raw events to attribution-ready rows
LinkedIn Ads Pipeline — Hub
dbt-audit-helper Hub
Hub note for dbt-audit-helper — the progressive validation workflow, macro reference, CI/CD integration, and related comparison topics.
Privacy Constraints for Linked Analytics Data
GDPR and CNIL implications when linking GA4 cookie identifiers to CRM contact records — consent exemption loss, right to deletion cascades, and the architectural requirements for compliant Customer 360 models.
Elementary for dbt
How Elementary extends dbt with data observability — anomaly detection, automated freshness monitoring, test result history, and Slack alerting
AI Tool Tiers for Data Engineering
The four capability tiers of AI tools for data engineering — autonomous agents, copilots, chat assistants, and platform-embedded AI — and why context determines which tier delivers value
First-Party Data and Compliance Hub
Hub connecting the browser restrictions, server-side infrastructure, EU/US legal frameworks, and identity resolution approaches that together determine how much advertising and analytics signal you can legally collect in 2026.
dbt-utils generate_surrogate_key
How generate_surrogate_key works, why null handling matters, and why migrating from the old surrogate_key() macro can silently break incremental models and snapshots.
dbt Service Account Setup for Multi-Project GCP Architectures
How to create and configure a dbt service account when your source data, transformation output, and compute infrastructure live in separate GCP projects.
Claude Code for dbt Development
A reading path through the core workflows for using Claude Code in a dbt project — base models, tests, documentation, debugging, refactoring, and prompting.
Looker Studio Limits and Upgrade Path
The hard technical limits of Looker Studio that optimization can't fix, what Looker Studio Pro actually adds, and when to evaluate enterprise Looker or alternative BI tools.
LinkedIn Marketing API Access
How to get approved for LinkedIn's Marketing API — the developer app setup, super admin verification, manual review process, rejection handling, and what to include in your application.
Elementary custom BI dashboards
How to build custom data quality dashboards in any BI tool by querying Elementary's warehouse tables directly, with example SQL for the most useful metrics.
dbt Dispatch Configuration
How to configure dbt's dispatch search order in dbt_project.yml — overriding package macros, adding Databricks support via spark_utils, and namespace resolution.
Essential Terminal Commands
The core terminal commands for navigation, file operations, viewing content, and finding things — the foundation of terminal literacy
dbt Testing Taxonomy
A taxonomy of dbt test types — generic tests, singular tests, unit tests, contract tests, and data quality packages like dbt_expectations
Ad Pipeline Engineering Challenges
The operational challenges of maintaining advertising data pipelines — API rate limits, schema changes, attribution window normalization, currency handling, and privacy compliance
BigQuery Partitioning Mechanics
How BigQuery partitioning physically divides tables, the three partitioning types, key constraints, and when partition pruning does and doesn't work.
Incremental Models in dbt
How dbt incremental models work, when to use them, the available strategies, and the trade-offs you need to understand.
MCP Client Landscape
The major MCP clients — desktop apps, code editors, and CLI tools — and how to choose between them based on your workflow.
Building MCP Apps Visualization Server
How to build a custom MCP Apps visualization server in TypeScript — registering app tools with UI metadata, serving HTML resources, and implementing the client SDK for bidirectional communication.
LLM Accuracy With Semantic Layers
Research benchmarks showing how semantic layers improve LLM accuracy on enterprise data questions from ~17% to 54-92% — the data.world study, Spider 2.0, and dbt Labs replication.
dbt Packageable Model Patterns
Three patterns that make dbt models installable by anyone — configurable sources with var(), enable/disable flags, and namespaced model names.
dbt Unit Test CLI Commands
How to run, filter, debug, and exclude dbt unit tests from the command line — including output interpretation and production exclusion patterns.
Metric Naming Conventions in dbt
How to name MetricFlow metrics so they stay discoverable and consistent as your project scales — patterns by metric type, grouping families, and the name vs label distinction
dbt documentation coverage tracking
Measuring and trending dbt documentation coverage over time with dbt-coverage, dbt-score, and dbt Cloud — moving beyond pass/fail CI checks to spot erosion early
GA4-Specific dbt Testing Patterns
Data quality tests for GA4 dbt projects that catch tracking failures standard schema tests miss — missing session_start events, orphaned transactions, suspicious session metrics.
Dagster Fundamentals Hub
Hub note connecting all Dagster core concept notes — the asset-centric model, SDAs, resources, components, UI, pricing, GCP deployment, learning curve, and the dbt integration.
BigLake Performance Characteristics
How BigLake external and Iceberg tables perform relative to native BigQuery tables, the role of metadata caching, and where the remaining gaps matter.
dbt Model Description Writing Patterns
Practical patterns for writing dbt model, column, and source descriptions that serve both business users and engineers — the three-question framework and when to use meta instead of description
Base Model Generation with Claude Code
How to use Claude Code to generate dbt base models — the pattern-replication workflow, prompting constraints, and CLAUDE.md defaults that eliminate inconsistency.
Data Contract Rollout Change Management
The organizational change management strategy for data contracts: start with two datasets, create urgency through visible cost, and measure conversations rather than coverage.
dbt-utils Hub
Navigation hub for dbt-utils v1.3 — the full scope of the package, what moved to dbt-core, and pointers to each section of the reference.
Dataform Ecosystem and Tooling Gaps
Where Dataform falls short beyond testing — CI/CD automation, IDE tooling, package ecosystem, and platform lock-in compared to dbt
GA4 Consent Mode Orphaned Events
How Consent Mode creates rows in GA4 BigQuery exports with null user_pseudo_id and session identifiers — what they are, how they affect counts, and same-page backstitching behavior.
SCD Type 2 with dbt Snapshots
How dbt snapshots implement slowly changing dimension Type 2 — tracking every version of a record over time with timestamp and check strategies, plus Fivetran History Mode as an alternative.
dbt Base Layer Patterns
What belongs in dbt base models — renaming, casting, deduplication, unnesting — and the one exception to the no-joins rule.
Dagster Learning Curve for Analytics Engineers
Where the friction shows up when analytics engineers adopt Dagster — Python proficiency, conceptual overhead, manifest management, pricing surprises, and the best onboarding path.
dbt documentation automation strategy
A graduated approach to automating dbt documentation freshness — from a single pre-commit hook to comprehensive drift detection, coverage tracking, and AI remediation
Event-Grain Sessionization
Why enriching events with session context beats building session-grain tables, and how the pattern enables flexible downstream analysis.
dbt Model Contract Mechanics
How dbt's native model contracts work — the preflight check, DDL generation, fail-fast behavior, configuration options, and what contracts do and don't validate.
dlt Dependent Resources
How dlt lets one resource use another's output to configure its endpoint — the path template syntax for multi-step API traversal.
AI Query Cost Control for BigQuery MCP
Managing the cost and safety risks of AI assistants running BigQuery queries through MCP — cost mitigation, write protection, and practical guardrails.
OpenClaw Morning Briefing Pattern
How to configure an OpenClaw cron job to deliver a daily personal briefing — covering calendar, email priority, pipeline status, and time tracking — to Telegram before your first coffee.
dbt Unit Test Mocking Dependencies
How to mock refs, sources, macros, variables, and the 'this' keyword in dbt unit tests — with patterns for multi-join models and incremental overrides.
Consent Mode Debugging Network Parameters
How to decode the gcs and gcd parameters in Google Analytics network requests to verify Consent Mode implementation without relying on CMP interfaces.
GA4 BigQuery Number Discrepancies
Why your BigQuery session and user counts won't match the GA4 interface, and the practical approach to handling the 1-5% variance.
Preparing for the dbt Analytics Engineering Certification
What the dbt developer certification actually tests, where people get tripped up, and how hands-on project experience matters more than studying.
Agentic AI Fit for Data Work
Why data engineering is structurally well-suited for agentic AI tools — repetitive patterns, multi-language context-switching, and cross-layer debugging make the case.
dbt Unit Test File Organization
Where to put dbt unit test files, how to name tests consistently, and the co-location pattern with _unit_tests.yml.
direnv for Multi-Client GCP Credential Management
Automate per-project GCP credential loading with direnv — .envrc configuration, the four-variable pattern, and a five-minute setup for each new client.
OpenClaw Cron Scheduler Mechanics
How OpenClaw's built-in cron scheduler works — session modes, job persistence, exponential backoff, and the configuration patterns that make scheduled monitoring reliable.
BI Tool Selection Framework
A decision framework for choosing a BI tool in 2026 — four key questions, a comparison of Lightdash vs Looker vs Metabase, and the market landscape from dbt-native to enterprise tools
dbt-utils Web Macros for URL Parsing
dbt-utils URL extraction macros for marketing analytics: get_url_host, get_url_path, and get_url_parameter. What they do, where they're useful, and what they don't handle.
Dataform Decision Framework
When Dataform is the right choice and when dbt wins — a decision framework based on platform commitment, budget, team preferences, and use case complexity
dbt Built-In Cross-Database Macros
Reference for dbt's built-in cross-database macros in the dbt namespace — dateadd, datediff, safe_cast, concat, type helpers, and the migration path from dbt_utils.
Templating Language and Team Skills
How a team's existing skill mix — SQL practitioner, Python engineer, JavaScript developer — should shape the choice between Jinja and JavaScript templating in analytics engineering.
GA4 Channel Grouping Macro
A dbt macro that encapsulates Google's default channel grouping logic as reusable SQL, with the regex patterns and edge cases you need to know.
GCP Authentication for Multi-Client Consulting Work (Hub)
Hub note for GCP credential isolation across multiple client projects — the problem, the four-variable solution, tool-specific agent constraints, and the service account vs impersonation tradeoff.
Google Sheets as Analytics Data Source
How Google Sheets functions as a shadow data source in GCP analytics stacks — the integration patterns, the automation gap gws fills, and the convergence of data and productivity tooling.
Dagster Full-Stack Pipeline Architecture
How Dagster unifies ingestion, transformation, Python processing, and downstream triggers in a single asset graph — the pattern that justifies Dagster over simpler orchestration approaches.
dbt Contract Rollout Strategy
How to adopt dbt model contracts in an existing project — identifying candidates, scaffolding YAML, phased enablement, and CI/CD integration for governance-only checks.
GA4 Schema Evolution Monitoring
GA4's BigQuery schema changes without announcements and new fields are never retroactive. How to detect additions before they break production queries.
Per-Workload Service Account Naming Conventions
One service account per workload with a compute-platform prefix — so logs, cost attribution, and incident response all point to the right place immediately.
dbt Slot Management on BigQuery
How dbt's execution model interacts with BigQuery slots -- why dbt is compute-heavy, the multi-project workaround, and best practices for sizing slots for dbt workflows.
Privacy Sandbox Collapse
How Google's Privacy Sandbox went from the industry's best hope for a cookie replacement to a quiet retirement — the timeline, what survived, and why it sealed the case for server-side infrastructure.
dbt Core vs Cloud Decision Framework
A structured comparison of dbt Core and dbt Cloud across deployment, interface, features, pricing, and team profile -- with decision heuristics for choosing between them.
Attribution Lookback Windows
How to set attribution lookback windows by industry and purchase cycle -- benchmarks, consequences of wrong windows, and implementation in SQL
CLI vs MCP for AI Agents
The practical tradeoffs between CLI commands and MCP tool calls for AI agent workflows — benchmark data, token efficiency, and when each approach wins.
AI Tooling Cost for Solo Consultants
What a four-layer AI stack actually costs per month for an independent analytics engineering consultant — tool-by-tool breakdown, ROI assessment, and cost visibility gaps
dbt-audit-helper Progressive Validation
The broad-to-narrow validation workflow for dbt-audit-helper — start with schema checks, escalate to row-level diffs only when needed.
Claude Code Hooks
How hooks give Claude Code deterministic guardrails — shell commands that execute at specific lifecycle points to enforce rules, auto-format code, and block dangerous operations
Cross-Platform Ad Testing Patterns
How to test unified ad reporting models in dbt — source freshness, spend reconciliation, grain testing, and the manual checks that automated tests can't replace.
Metric Anti-Patterns in dbt
Common mistakes when defining MetricFlow metrics — one-off models for metrics, sum-of-ratios errors, hardcoded measure filters, and missing descriptions
Unit Testing Window Functions in dbt
How to design test data that validates window function partitioning, ordering, and framing — with patterns for ROW_NUMBER, FIRST_VALUE, cumulative sums, and deliberate out-of-order inputs.
BigQuery CLI Capabilities Beyond MCP
What the bq command-line tool can do that BigQuery MCP servers cannot — data loading, exports, table management, and the full feature gap with examples.
SQL Dialect Divergences Across Warehouses
Where SQL syntax breaks across BigQuery, Snowflake, and Databricks — date functions, type casting, and argument ordering differences that matter for portable dbt code.
Orchestrator Learning Curves
An honest assessment of ramp-up time and friction points for Dagster, Airflow, and Prefect — what trips up analytics engineers and what helps.
dbt Macro Deprecation Pattern
How to change macro behavior without breaking callers — the staged deprecation pattern using exceptions.warn() that dbt-utils demonstrates.
dbt Packages vs Mesh
When to use dbt packages (code sharing) vs dbt Mesh (data product sharing) — the conceptual distinction, practical differences, and how to choose.
Data Comparison Tool Landscape
When to use dbt-audit-helper, Elementary, dbt-expectations, Datafold, or Soda for data comparison and validation.
Salesforce Opportunity Stage Duration Analysis
How to calculate time spent in each pipeline stage using OpportunityFieldHistory and LEAD window functions — the SQL pattern, downstream analysis, and win rate metrics.
GA4 Acquisition Performance Mart
A daily x source/medium grain mart for GA4 acquisition reporting — aggregating sessionized events into dashboard-ready metrics with conversion rates and revenue.
dbt Materialization Cost Impact on BigQuery
How dbt materialization choices affect BigQuery costs -- table vs view vs ephemeral trade-offs, the view chain anti-pattern, and why defaulting to tables usually wins.
Your First Hour with Claude Code (Analytics Engineer)
A sequenced reading path for getting started with Claude Code as an analytics engineer — from installation through your first useful output
dbt Documentation Scaffolding Tools
How dbt-codegen and dbt-osmosis handle the mechanical parts of documentation — generating YAML skeletons and propagating descriptions through your DAG
GTM Server-Side: Map of Content
Index of garden notes on GTM Server-Side — architecture, Cloud Run deployment, GA4 configuration, Meta CAPI, Google Ads, hosting costs, and common failures.
BigQuery Dynamic Data Masking
Show sensitive column structure without exposing values — SHA256 hashing, nullification, and default masking for analysts who need to write queries but not read PII.
dbt Model Description Style Guide
Hub note for the dbt documentation style guide — why consistency beats effort, what to put in model and column descriptions, YAML formatting options, doc blocks, CI enforcement, and rollout strategy
EU Cookie Consent Legal Framework
The two overlapping EU legal frameworks governing cookie consent — ePrivacy Directive and GDPR — what valid consent actually requires, which cookies are exempt, and where enforcement stands in 2026.
GA4 CROSS JOIN versus LEFT JOIN UNNEST
Why the comma syntax in FROM table, UNNEST(array) silently drops rows — and when to use LEFT JOIN UNNEST to preserve events without array data.
dlt Incremental Loading
How dlt tracks state between pipeline runs using cursor-based incremental loading — the dlt.sources.incremental() helper, declarative REST API config, and why state lives in the destination.
Lead Scoring Signal Dimensions
The four categories of signals that drive lead scoring — demographic fit, firmographic fit, behavioral engagement, and recency — and why the warehouse sees all of them when the CRM can't.
Removal Effect in Attribution
The removal effect measures how much conversion probability drops when a channel is removed -- the mathematical foundation of both Markov chain and Shapley value attribution
OpenClaw for dbt Monitoring
Using OpenClaw as an always-on monitoring layer for dbt projects — cron-based testing, Slack alerting, mobile access, and practical use cases for solo consultants
FastMCP Server Skeleton
Minimal MCP server examples in Python (FastMCP) and TypeScript (McpServer) — the starting point for any custom server build.
AI Judgment Failures in dbt Development
The category of mistakes AI makes in dbt projects that aren't syntax errors — wrong joins, rebuilt existing assets, wrong layer sourcing — and why they require business context that no prompt can fully provide.
MCP Protocol Architecture
What the Model Context Protocol is, how clients and servers communicate, and why it matters for connecting AI tools to your data infrastructure.
Claude Code Behind the Scenes
What commands Claude Code actually runs when it explores code, searches for patterns, edits files, and manages git — understanding the mechanics builds confidence and helps you learn
MCP Data Engineering Servers
The MCP servers that actually matter for data engineering work — Snowflake, BigQuery, ClickHouse, centralmind/gateway, MindsDB, and Confluent.
BigQuery Autoscaling Cost Overhead
Why theoretical slot-hour costs rarely match your actual BigQuery bill — the 1.5x autoscaling multiplier, 60-second billing window, and how workload shape changes everything.
dbt Operational Slash Commands
Practical Claude Code slash commands for daily dbt operations — building models, generating base models, running modified code, auditing quality, and cleaning up artifacts
dbt-utils SQL Generators
Reference for dbt-utils SQL generation macros: date_spine, deduplicate, star, union_relations, pivot, unpivot, and the smaller helpers. What each does, how to call it, and the gotchas.
Automating dbt Docs Deployment
Patterns for keeping dbt docs automatically updated — CI/CD workflows, Astronomer Cosmos operators, and tools that push documentation to platforms like Notion
AI SQL Review Tradeoffs
The practical costs of AI SQL review — false positive rates, conflicting tool feedback, CI latency, annual spend, and the configuration investment that makes it worthwhile
Window Function Patterns for Analytics SQL
Practical window function patterns for analytics SQL — ROW_NUMBER, LEAD/LAG, running totals, session detection, and deduplication
GA4 dbt Package Ecosystem
An overview of the major open-source dbt packages for GA4 BigQuery exports — what they optimize for, what they miss, and when to build custom.
HubSpot Lifecycle Stages in the Warehouse
How HubSpot's lifecycle stage model maps to warehouse columns, why forward-only transitions make funnel analysis straightforward, and how to handle merged contact artifacts.
Elementary data quality dashboards
Hub for building data quality dashboards with Elementary: generating reports, hosting them for team access, building custom BI dashboards, and designing KPIs.
OpenClaw for Freelance Consultants
A reading path through the OpenClaw admin automation use cases for solo consultants — morning briefings, expense capture, personal CRM, and meeting prep.
Server-Side Cookies and Safari ITP Bypass
How setting cookies via HTTP Set-Cookie header from a same-domain server bypasses Safari's 7-day JavaScript cookie cap — the FPID mechanism, the IP mismatch problem, and the three approaches that solve it.
Google Ads to BigQuery: Loading Approaches
Four ways to load Google Ads data into BigQuery — a map through the decision landscape.
Salesforce Unified Activity Timeline
Combining Salesforce Tasks and Events into a single activity timeline with consistent column naming and polymorphic entity resolution.
Orchestrator Architectural Philosophies
The three competing mental models in data orchestration — process-oriented (Airflow), data-oriented (Dagster), and function-oriented (Prefect) — and why the abstraction matters more than the feature list.
Claude Code Stop and Session Hooks
How Stop and SessionStart hooks complement per-tool hooks — running quality gates after Claude finishes responding and loading project context at session start
Meta Ads Actions Array in BigQuery
How to flatten Meta's nested actions JSON array in BigQuery — unnesting patterns, configurable action type pivots, dbt integration, and the action_values companion field.
GA4 Traffic Source Fields
The four traffic source locations in GA4 BigQuery exports — their scopes, use cases, and the July 2024 cutoff that changed session attribution.
AI Developer Skill Atrophy
How AI coding tools affect developer comprehension — Anthropic's RCT, the delegation vs. inquiry distinction, and why how you use AI matters as much as which tools you pick
Alternatives to Default dbt Docs
When to move beyond the default dbt docs frontend — Dagster's Next.js replacement, dbterd for ERDs, data catalogs, and dbt Cloud Catalog
BigQuery SQL Patterns for Analytics Engineers
A reading guide to essential BigQuery SQL patterns covering query optimization, nested data, window functions, dbt incrementals, and marketing analytics.
Elementary alert routing with filters
How to run multiple edr monitor commands with different filters to route alerts by tag, owner, status, or resource type to different channels and incident management tools.
BigQuery Storage Billing Strategies
Physical vs logical storage billing in BigQuery, long-term storage discounts, table expiration policies, and how to evaluate which billing mode saves money.
dbt Project Structure: Guide Hub
A hub connecting all notes on structuring a dbt project — layers, naming, materialization, YAML, modern features, and marketing analytics patterns.
dbt Test Alert Routing and Ownership
How to route dbt test failures to the right people, configure tiered alert severity, and apply the Broken Window principle to test suite health.
BigQuery MCP Server Setup
A reading path through connecting BigQuery to AI assistants via MCP — comparing the two official options, authentication, custom queries, and cost control.
dbt Unit Test Patterns
Hub note connecting all unit test patterns for dbt — incremental models, snapshots, window functions, business logic, marketing analytics, and edge cases.
IAM Debt Audit for GCP Data Platforms
Bash and SQL queries to surface Editor roles, service accounts with keys, and shared credentials — the starting point for any GCP IAM cleanup.
Google OAuth CLI Setup Gotchas
The specific mistakes that cause OAuth setup to fail silently for Google Workspace CLI tools — wrong application type, missing test users, and the scope limit trap.
BigQuery Fine-Grained Access Control
Column-level security with policy tags, row-level security with Row Access Policies, and dynamic data masking — the three layers of fine-grained access control in BigQuery beyond basic IAM roles.
Feature Engineering for ML in dbt
How to structure dbt intermediate models as ML feature tables — including time-windowed aggregations, domain-separated feature sets, and joining them into a labeled training dataset.
Meta Ads Pipeline Maintenance
Operational practices for keeping a Meta Ads pipeline running — token expiry monitoring, spend reconciliation, API version lifecycle management, and circuit breaker patterns.
Markov Chain Attribution
How Markov chains model customer journeys as state transitions to calculate data-driven attribution through transition probabilities and the removal effect
MCP Data Catalog Server Pattern
A practical MCP server pattern for exposing internal data catalogs — table search, metadata retrieval, and lineage tracing as AI-accessible tools.
GTM Server-Side Hosting Costs: Self-Hosted vs Managed
The real cost of running GTM Server-Side — Cloud Run pricing by traffic tier, the Cloud Logging cost trap, and a comparison of managed alternatives (Stape, Addingwell, Cloudflare Zaraz).
Agentic Workflow Shift in Data Engineering
How agentic AI tools change the data engineering workflow from manual template adaptation to describe-and-review — and why the real shift is from syntax to modeling decisions.
GA4 dbt Project Template
Hub connecting all concepts in building a production-ready dbt project for GA4 BigQuery exports — from base model to marts, with testing and documentation.
Cloud Storage Tiering for BigQuery
How to use Cloud Storage tiers and lifecycle policies alongside BigQuery for cost-effective data lake storage, including Autoclass and physical billing.
dbt Macro Testing Patterns
Two approaches to testing dbt macros — integration test models and dbt 1.8 unit tests — plus the compile-and-inspect workflow for debugging.
Ad Platform Metric Divergence
Why impressions, clicks, and conversions mean different things on Google, Meta, and LinkedIn — and why pretending they're equivalent produces misleading cross-platform reports.
Debugging dbt with Claude Code
How to use Claude Code for dbt debugging — letting the agent face errors directly, tracing data issues through upstream models, and using subagents for complex investigations
dlt Pipeline Testing
Testing dlt pipelines locally with DuckDB before hitting production — unit tests with resource limits, integration tests for schema validation, and common debugging patterns.
HubSpot Property History Mechanics
How HubSpot's property history tables work, their retention limits, why CALCULATED properties inflate sync costs, and how to model history data without surprises.
Data Contract Adoption Friction
Reducing the friction that kills data contract adoption: SDK-based onboarding, audience-specific messaging, post-mortem data as leverage, and the Data Product Manager role.
Dagster-dbt Asset Mapping
How dagster-dbt reads your manifest.json to create one Dagster asset per dbt model, with automatic lineage from ref() calls, and how to customize the mapping with DagsterDbtTranslator.
MCP Server Testing and Debugging
Testing MCP servers with the Inspector, the stderr logging gotcha that bites everyone, and a practical three-stage testing workflow.
Custom Sessionization Patterns
How to build custom session definitions from raw events using LAG and running sums, with configurable timeouts, campaign-based splits, and session metrics.
Dagster+ Pricing and Credit Model
How Dagster+ pricing works — the credit model (1 credit = 1 asset materialization), plan tiers, overage costs, and how it compares to dbt Cloud and Cloud Composer for analytics engineering teams.
Orchestrator Comparison for dbt Teams Hub
Hub note for the Dagster vs Airflow vs Prefect comparison — architectural philosophies, dbt integration depth, developer experience, pricing, learning curves, and the decision framework.
dlt Core Concepts
The four building blocks of dlt pipelines — sources, resources, pipelines, and schemas — and the three write dispositions that control how data lands.
GA4 Ecommerce Checkout Funnel Pattern
Session-based checkout funnel analysis from GA4 BigQuery data — counting distinct sessions at each funnel stage from view_item through purchase.
Claude Code CLI Basics
Installation, essential CLI flags, built-in slash commands, and how to read Claude Code's output — the practical starting point for new users
iOS 14.5 Signal Loss and Meta Measurement
How Apple's App Tracking Transparency changed Meta ad measurement — IDFA collapse, default attribution window changes, Aggregated Event Measurement, and Conversions API as the response.
BigQuery Materialized Views
How BigQuery materialized views precompute aggregations, refresh incrementally, and transparently rewrite queries for automatic optimization.
Unit Testing String Extraction in dbt
How to unit test regex and string manipulation logic in dbt — edge case documentation, graceful failure handling, and regression protection for fragile parsing.
Multi-Source Conflict Resolution
Three patterns for resolving conflicting data when merging records from multiple source systems — priority-based, recency-based, and source-specific fields.
Ad Data Extraction Tools
Managed ELT, open-source, and native integration options for getting advertising data into your warehouse — Fivetran, Airbyte, dlt, Meltano, and BigQuery Data Transfer Service
Testing Late-Arriving Data Handling in dbt
How to write dbt unit tests that simulate late arrivals, and how to use audit_helper to detect drift between incremental and full-refresh results in production.
LinkedIn Ads dbt Modeling
How to model LinkedIn Ads data in dbt — the campaign hierarchy rename, metric normalization, cross-platform integration via dbt_ad_reporting, and the incremental strategy for 90-day attribution windows.
GTM Server-Side: Architecture and Four Building Blocks
How GTM Server-Side works as an intermediary layer — the request/response data flow, and the four component types (Clients, Tags, Triggers, Variables/Transformations) that make it up.
Semantic Layer Architecture
How semantic layers work in the modern data stack — competing implementations (MetricFlow, Snowflake Semantic Views, Databricks Metric Views), the OSI initiative, and why the semantic layer determines AI accuracy
BigQuery Slots and Reservations
A reading guide to BigQuery's compute model -- slots, reservations, editions, autoscaling, fair scheduling, and slot management for dbt workflows.
Cloud Run Jobs for dbt
Why Cloud Run Jobs is the optimal dbt execution environment for most GCP teams — capabilities, container setup, authentication, monitoring, and cost profile.
Layered SQL Review Pipeline for dbt
A four-layer architecture for SQL review in dbt projects — IDE feedback, pre-commit hooks, PR-level AI review, and CI testing — each catching a different class of error
Google Ads BigQuery Data Transfer Service Setup
How the Google Ads BigQuery Data Transfer Service works — what it gives you, how the schema is organized, MCC vs per-account setup, and the defaults that will hurt you.
Hosting dbt Docs Beyond Localhost
Deployment options for dbt docs by complexity — GitHub Pages, Netlify, GCS with IAP, S3 with CloudFront, and Docker with Nginx
Unit Testing Attribution Models in dbt
How to unit test first-touch, last-touch, and multi-touch attribution in dbt — multi-session journeys, single-touch conversions, and the no-conversion exclusion pattern.
dlt: Python-Native Data Loading
A reading path through dlt's core mechanics — from building blocks through BigQuery-specific loading to incremental state tracking.
dlt Google Ads Pipeline
Building a Google Ads to BigQuery pipeline with dlt — the verified source, GAQL query patterns, incremental loading, and deployment options.
dbt vs Dataform Templating Hub
Navigation hub for notes comparing Jinja (dbt) and JavaScript (Dataform) templating in analytics engineering — syntax, philosophy, strengths, and team fit.
OpenClaw dbt Data Quality Assistant
A reading path through the building blocks of a 24/7 automated dbt data quality assistant — test execution and parsing, severity assessment, documentation cross-referencing, morning summaries, and an honest maturity assessment.
RAG for dbt Documentation
How retrieval-augmented generation bridges the business context gap in AI-generated dbt documentation — from full RAG pipelines to the simpler CLAUDE.md workaround
BigQuery Partitioning and Clustering
A structured reading path for understanding BigQuery partitioning and clustering -- mechanics, decision framework, configuration patterns, and anti-patterns.
dlt Deployment Options
Where and how to run dlt pipelines in production — GitHub Actions, Airflow, Modal serverless, and other platforms — with the dlt deploy command as the starting point.
GA4 Event Ordering with Batch Fields
How to use batch_event_index, batch_ordering_id, and batch_page_id for deterministic event sequencing in GA4 BigQuery exports.
Service Account Key Files vs Impersonation Tokens
The practical tradeoff between GCP service account key files and short-lived impersonation tokens — when each is appropriate and what the honest security calculus looks like for consultants.
Dataform-to-dbt Concept Mapping
A reference mapping of Dataform concepts to their dbt equivalents — refs, configs, sources, materializations, testing, and directory structure.
BigQuery Remote MCP Server Setup
Google's managed BigQuery MCP endpoint — enabling the service, configuring Claude Desktop and Claude Code, and why token expiration limits its usefulness.
AI Tools for dbt Documentation
A comparison of dbt Copilot, Claude Code with MCP, and Altimate AI for generating dbt model and column documentation — capabilities, limitations, and selection guidance
BigLake Metastore and Catalog Strategy
Why catalog infrastructure matters more than format choice on GCP, and how BigLake Metastore and Dataplex Universal Catalog provide unified governance across engines and formats.
GA4 BigQuery Export Table Types
The four table types in a GA4 BigQuery export dataset — daily, intraday, and user tables — their timing, limitations, costs, and when to use each.
dbt Unit Tests BigQuery Workarounds
BigQuery-specific gotchas for dbt unit tests — STRUCT completeness, ARRAY comparisons, column_transformations, slot costs, and common error solutions.
Code Generation over Tool Calling Pattern
The emerging pattern of having LLMs write code against APIs rather than generate tool calls — Cloudflare's Code Mode, Anthropic's code execution, and what it means for MCP's future.
Lightdash Joins and Fanout Protection
How to define joins between dbt models in Lightdash YAML, why the relationship property matters for metric accuracy, and how Lightdash warns about fanout risk in one-to-many joins.
CRM Data Architecture Hub
Hub note connecting all garden notes on modeling Salesforce and HubSpot data in a modern warehouse with dbt and BigQuery.
dbt Package Integration Testing
The integration_tests sub-project pattern for testing dbt packages — using seeds as mock data, comparing outputs to expected results, and running the full suite.
dbt-audit-helper CI/CD Integration
How to integrate dbt-audit-helper into CI/CD pipelines — dbt Cloud PR jobs, GitHub Actions with --defer, and automated regression detection.
Looker Studio Caching Mechanics
How Looker Studio's per-chart cache works, why date range selection affects cache hit rates, the difference between owner and viewer credential caches, and how to pre-warm dashboards.
Cloud Run Jobs Deployment Script Pattern
An end-to-end deployment script for dbt on Cloud Run Jobs — service accounts, IAM bindings, Artifact Registry, job creation, and scheduling in a single reproducible script.
The AI Production Gap in Data Engineering
Why AI gets you to 80% fast but the remaining 20% — security, compliance, temporal consistency, governance — is where most of the real work lives.
dbt Doc Block Jinja Limitations
What you cannot do inside dbt doc blocks — restricted Jinja context, the README parsing gotcha, and the missing column description inheritance feature
Prompting Claude Code for dbt
What separates dbt prompts that work from ones that produce generic output — specificity, codebase references, constraint encoding, and the session-less memory problem.
LLM as Content Cleaner
Using a cheap LLM like GPT-4o-mini to strip navigation, CTAs, and HTML noise from scraped markdown — a reliable pattern for web content pipelines.
GA4 Unnesting Patterns Hub
Hub connecting all concepts for extracting data from GA4's nested BigQuery schema — UNNEST approaches, JOIN types, engagement recipes, e-commerce funnels, and dbt architecture.
dbt Documentation Audience Mismatch
Why most dbt documentation goes unread — the fundamental mismatch between who writes docs (engineers) and who needs them (business users, analysts, and increasingly AI tools)
dbt Constraint Enforcement Across Warehouses
How dbt constraint types behave across Postgres, Snowflake, BigQuery, Redshift, and Databricks — which constraints actually reject bad data and which are metadata only.
dlt RESTClient vs REST API Source
The two approaches dlt offers for building custom API pipelines — imperative RESTClient and declarative REST API Source — and how to choose between them.
Reverse ETL Patterns for CRM Activation
How to push warehouse-computed scores and attributes back into Salesforce or HubSpot using reverse ETL tools — sync architecture, field mapping, sync frequency, and downstream automation.
Semantic Layer Adoption Readiness
When to invest in a semantic layer, what barriers you'll face, and how to start small — a practical readiness assessment based on team size, tooling maturity, and organizational commitment.
BigQuery Reservation Hierarchy
The three layers of BigQuery's capacity model -- commitments, reservations, and assignments -- and how they work together to manage slot allocation.
Custom MCP Servers for Data Engineering
A reading path through building custom MCP servers — from decision criteria and SDK selection through tool design, testing, and practical server patterns for data catalogs, pipelines, and quality.
Metric Organization in dbt Projects
How to organize semantic models and metrics in dbt — co-located vs parallel subfolder structures, the one-primary-entity rule, and scaling patterns for large projects
MCP JSON-RPC Wire Format
The actual message format MCP uses under the hood — initialization handshake, capability negotiation, tool discovery, and tool invocation — with examples for debugging.
dbt as the Center of Gravity for BI
Why dbt has become the foundation layer that BI tools read from — not a parallel concern — and how the Fivetran merger accelerates this shift
Dagster Asset Checks from dbt Tests
How Dagster automatically converts dbt tests into asset checks since version 1.7 -- severity mapping, health badges, and what this means for unified data quality monitoring.
Meta Ads Attribution Windows
How Meta's attribution windows work, the June 2025 on-Meta/off-Meta split, which windows survived the January 2026 deprecation, and what this means for warehouse data.
Business Cost of Poor Data Quality
The measurable financial and operational impact of data quality failures — industry statistics, high-profile incidents, and why prevention costs a fraction of remediation.
dbt as AI Knowledge Base
How a well-structured dbt project functions as a shared context layer that improves every AI tool in your stack — models, tests, documentation, and semantic definitions as machine-readable knowledge.
Looker Studio: Extract vs. Live Connection
When to use Looker Studio's extract mode versus live BigQuery connections, the 100 MB limit that catches teams off guard, and how to combine both in the same report.
Why a dbt Documentation Style Guide Matters More Than Effort
The case for writing a documentation style guide for your dbt project — why inconsistency is the root problem, not effort, and how style guides serve both humans and AI tools
Data Architecture as Human Judgment
Why data architecture — DAG design, ownership models, temporal logic, team boundaries — resists AI automation and remains a fundamentally human discipline.
Hybrid ELT Strategy
When to buy managed ELT, when to build with dlt + AI, and the practical migration path — a decision framework for splitting your pipeline portfolio strategically
AI SQL Review Tools
A reference of tools that apply AI to SQL and dbt code review — Altimate AI, Greptile, CodeRabbit, and MotherDuck FixIt — with benchmarks and differentiators
dbt Docs Site Customization Options
What you can customize in the default dbt docs site — the overview page, DAG node colors, hiding models — and where the customization options end
dbt Cloud Managed Platform
What dbt Cloud provides beyond Core -- web IDE, job scheduling, collaboration tools, managed infrastructure, and the pricing model that shapes adoption decisions.
dbt MCP Server Setup
A reading path through connecting dbt to AI assistants via MCP — choosing between local and remote modes, tool capabilities, configuration, and safety.
dbt Groups and Access Modifiers
How dbt groups and access modifiers (private, protected, public) organize model ownership and enforce boundaries — and why they're worth using even in single projects.
Consent Mode Common Implementation Failures
The ten most frequent Consent Mode implementation mistakes, ordered by prevalence and damage — from missing defaults to untested consent states.
Identity Resolution for Customer 360
How to link CRM contact records to GA4 cookie identifiers in BigQuery — the three join key strategies, deterministic vs probabilistic matching, and open-source tooling.
dbt Unit Test YAML Syntax
Complete reference for dbt unit test YAML structure — required elements, input formats (dict, csv, sql), optional configuration, and version-specific features.
dbt Docs Performance at Scale
Why the default dbt docs site becomes unusable for large projects — the AngularJS frontend, client-side JSON parsing, and the performance ceiling that drives teams to alternatives
CRM Data Extraction Challenges
Why CRM data is harder to warehouse than most sources — mutability, API-based extraction, soft deletes, formula field blind spots, and rate limits.
dbt Macros
How dbt macros work — Jinja fundamentals, writing custom macros, using dbt_utils, dispatch patterns, and when macros help vs hurt
dbt deps and the Package Lock File
How dbt resolves and installs packages — the difference between packages.yml and dependencies.yml, how the lock file works, and the flags worth knowing.
MetricFlow setup hub
Hub note connecting garden notes extracted from the MetricFlow getting started tutorial: installation, semantic model components, time spine, metric types, CLI querying, and organization.
ML Anomaly Detection vs Statistical Methods
When ML-powered anomaly detection earns its cost over simpler Z-score approaches — and why the answer depends on data complexity, not marketing materials.
GCP Application Default Credentials
The difference between gcloud auth login and Application Default Credentials — why they exist, how they work, and why ADC is what MCP servers and SDKs actually use.
Claude Code ROI for Analytics Engineers
Realistic time-to-value for Claude Code in a dbt workflow — what setup actually costs, when consistent savings emerge, and the qualitative benefit of tasks that finally get done.
Fivetran dbt Packages Architecture
How Fivetran structures its 60+ dbt packages — the unified source-plus-transform model, cross-platform reporting bundles, and the installation pattern that avoids version conflicts.
BigQuery Architecture for Analytics Engineers
How BigQuery works under the hood — columnar storage, slots, the separation of compute and storage — and why it matters for your queries and costs.
dbt Repository Structure for Cloud Function Deployment
How to restructure a dbt project repository for Cloud Function deployment — the subdirectory pattern, main.py, requirements.txt, and profiles.yml with oauth.
dbt Single Responsibility Macros
Why dbt macros should do one thing, how to recognize when they've outgrown their scope, and the composition pattern for building complex transformations from focused pieces.
GA4 User-Provided Data BigQuery Trap
Enabling User-provided data in GA4 admin permanently disables user_id export to BigQuery with no reversal option — what this means and how to protect your pipelines.
dbt Migration Validation Patterns
How to validate a dbt migration — parallel execution, comparison queries, ML regression testing, and the practical approach to proving equivalence.
GTM Server-Side: Ten Implementation Failures and How to Avoid Them
The ten most common GTM Server-Side implementation mistakes — from missing custom domains and silent trigger failures to Cloud Logging cost surprises and Safari IP mismatch — with diagnostic guidance for each.
Agent-First CLI Design Principles
Seven principles for building CLIs that AI agents can consume reliably — from Justin Poehnelt's design of the Google Workspace CLI, with implications for any tool targeting agent consumers.
on_schema_change in dbt Incremental Models
How dbt handles column additions and removals in incremental models, the four on_schema_change options, and why none of them backfill historical data.
Salesforce Person Accounts and Multi-Currency in the Warehouse
Two Salesforce data model quirks that break standard warehouse patterns — Person Accounts that merge Account and Contact, and multi-currency orgs that require exchange rate conversion in dbt.
BigQuery Editions
The three BigQuery Editions tiers -- Standard, Enterprise, and Enterprise Plus -- what each offers, their limits, and how they compare to on-demand pricing.
Claude Code Model Selection for Analytics Work
When to use Sonnet vs Opus in Claude Code for analytics engineering — daily work defaults, complex problem escalation, and practical cost-speed tradeoffs
Data quality KPIs from Elementary
Five data quality KPIs you can build from Elementary's warehouse tables, how to interpret them, and how they map to standard data quality dimensions.
BigQuery Slots
What BigQuery slots are, how queries use them, what happens during slot contention, and the two ways to get slots.
dbt Materialization Default: Tables Everywhere
Why materializing every dbt model as a table by default — not views, not ephemeral — produces more debuggable, stable, and maintainable projects.
SQL Attribution Patterns
SQL implementation patterns for marketing attribution — first-touch, last-touch, linear, position-based, time-decay, and algorithmic models
Salesforce Polymorphic Relationship Resolution
How to resolve Salesforce's WhoId and WhatId polymorphic foreign keys in the warehouse using ID prefix routing — the pattern, the SQL, and where it recurs.
BigQuery Table Types
Native BigQuery tables, BigLake external tables, and BigLake Iceberg tables — what each optimizes for, when to use them, and a decision framework for choosing.
Consent Mode Basic vs Advanced
How Basic and Advanced Consent Mode differ in tag behavior, cookieless pings, and conversion modeling — and the traffic thresholds that determine whether Advanced mode actually helps.
MCP Apps vs Traditional BI
When to use MCP Apps for data visualization versus a dedicated BI tool — the honest comparison, what each does well, and the hybrid architecture that makes sense for most teams.
Salesforce to BigQuery Pipeline
Hub note for the Salesforce-to-BigQuery pipeline — from ingestion tool selection through polymorphic resolution, stage tracking, account hierarchies, and activity timelines.
MCP Resources and Prompts
Beyond tools — using MCP resources for read-only data exposure, prompts for reusable templates, and the Context object for progress reporting in long-running operations.
GA4 Events Sessionized Model
The implementation of the wide event-grain intermediate model for GA4 — the CTE structure, window function patterns, and design decisions that make downstream analysis flexible.
Self-Hosting Lightdash with Docker Compose
How to run Lightdash with Docker Compose — required services, environment variables, known gotchas, and what to expect in small-team production deployments.
ELT Connector Quality and Coverage Comparison
How Fivetran, Airbyte, and dlt differ in connector count, quality tiers, and their approaches to handling sources that don't have pre-built connectors.
RSS Feed Deduplication in n8n
How to prevent duplicate Notion pages when polling RSS feeds in n8n, using a Merge node configured as a left anti-join.
dbt-audit-helper Macro Reference
Reference for every dbt-audit-helper macro — parameters, output format, platform support, and practical usage notes.
Elementary Slack and Teams integration
How to connect Elementary alerts to Slack (token-based and webhook) and Microsoft Teams, including the tradeoffs between integration methods.
Probabilistic Matching Limitations in GA4
Why probabilistic identity matching fails with GA4's BigQuery export — the signals GA4 intentionally excludes, what coarse data remains, and the compounding cost of false positives.
Dagster Freshness Policies and Scheduling
How Dagster tracks asset freshness rather than just execution timestamps, and how to schedule dbt runs using cron schedules, sensors, and automation conditions.
dbt Package Installation Types
The three ways to install dbt packages — Hub, Git, and local — and how to choose between them. Includes version conflict patterns and best practices for your root packages.yml.
Orchestrator Developer Experience Comparison
Local development, testing patterns, and CI/CD workflows across Dagster, Airflow, and Prefect — where the day-to-day friction lives.
dbt Microbatch Strategy Tradeoffs
The practical limitations and design tradeoffs of dbt's microbatch incremental strategy — UTC assumptions, no sub-hourly batches, and sequential execution.
Custom dbt Materializations
Hub note for custom dbt materializations — anatomy, decision framework, zero-downtime swap, secured table, and debugging patterns.
dbt Integration Depth Across Orchestrators
How dagster-dbt, astronomer-cosmos, and prefect-dbt differ in integration depth — from first-class asset mapping to operational wrappers — and what that means when something breaks.
BigQuery Clustering Mechanics
How BigQuery clustering sorts data within storage blocks, why column order matters critically, and how automatic re-clustering works at no cost.
Salesforce Record Type Partitioning in dbt
How to handle Salesforce RecordTypeId in the warehouse — filtering by record type in base models, splitting objects into separate models, and storing IDs in dbt vars.
Building Custom API Pipelines with dlt
A map of the concepts and patterns involved in building production API pipelines with dlt — from choosing an approach through deployment.
Cursor for dbt Development
How Cursor works as the IDE layer for dbt projects — strengths with dbt Power User, limitations for multi-file work, and where it fits alongside Claude Code
Building dlt Pipelines: From First Run to Incremental Loading
A reading path through the concepts in the hands-on dlt tutorial — environment setup, REST API Source config, dependent resources, and incremental loading.
MCP Apps Protocol Internals
How MCP Apps extend the Model Context Protocol to render interactive HTML interfaces inside AI clients — the ui:// resource mechanism, iframe sandboxing, and bidirectional JSON-RPC communication.
GA4 E-commerce Schema in BigQuery
The ecommerce RECORD and items REPEATED RECORD in GA4's BigQuery export — field reference, nested item_params, and query patterns for purchase analysis.
Snowflake Cost Monitoring with Warehouse History
SQL patterns for Snowflake cost monitoring using QUERY_HISTORY and WAREHOUSE_METERING_HISTORY — daily cost summaries, per-warehouse breakdowns, and translating credits into dollars for non-technical stakeholders.
Terminal Cross-Platform Setup
How to set up and use the terminal on macOS, Linux, and Windows — including WSL, Git Bash, and PowerShell options with a command equivalence table
Idempotent Incremental Models in dbt
How to build dbt incremental models that produce identical results regardless of how many times they run, using pre-deduplication and proper unique_key design.
Lead Scoring in the Warehouse
Hub note for warehouse-native lead scoring — from rule-based weighted models in dbt to BigQuery ML classification, feature engineering, and reverse ETL back to the CRM.
BigQuery Column-Level Security with Policy Tags
Replace view-based column hiding with Data Catalog policy tags — storage-layer security that survives schema changes and doesn't require view maintenance.
Customer 360 dbt DAG Architecture
How to structure a dbt project for Customer 360 models — the identity resolution layer between base and mart, the wide customer table, and materialization choices.
Codebase Refactoring with Claude Code
How Claude Code enables project-wide dbt refactoring — column renames, naming convention migrations, and ref() updates across dozens of files without the manual search-and-miss problem.
Dagster + dbt Integration Hub
Hub note for the dagster-dbt integration — how the mapping works, quality checks, freshness monitoring, CI/CD workflows, and the case for choosing Dagster over dbt Cloud.
dbt Incremental Strategy Warehouse Behaviors
How dbt incremental strategies behave differently on BigQuery, Snowflake, and Databricks — the platform-specific quirks, gotchas, and limitations that the documentation doesn't emphasize enough.
Identity Resolution for Ad Measurement
How Enhanced Conversions, Unified ID 2.0, and data clean rooms recover attribution signal after cookies fail — what each approach does, what it requires, and realistic uplift estimates.
OpenClaw Ecosystem and Community
The community and ecosystem around OpenClaw — ClawHub, ClawData, the viral growth story, the naming history, and what the ecosystem state means for adoption decisions.
dbt Package Development Hub
A hub connecting all notes on building, testing, and publishing dbt packages — from project anatomy to CI/CD to Hub distribution.
Cloud Composer Cost and Capabilities
Cloud Composer 3's pricing model, committed use discounts, and the specific scenarios where its orchestration capabilities justify the $300-400/month minimum.
MetricFlow installation and setup
Installing MetricFlow for dbt Core with adapter-specific packages, the dbt Cloud alternative, and the initial project configuration steps needed before defining semantic models.
OpenClaw Skills for Monitoring
How to write OpenClaw skill files for data pipeline monitoring — structuring SKILL.md instructions, categorizing failure types, formatting output for Slack, and adding context that makes alerts actionable.
dbt Weighted Attribution Models
Implementing position-based and time-decay attribution in dbt with configurable weights via dbt variables — model SQL, project configuration, and revenue integrity testing
Attribution Channel Grouping Strategy
How to group marketing channels for data-driven attribution -- balancing granularity against data sparsity to produce stable, actionable model results
AI Agent Regulatory Exposure for Data Teams
Why running AI agents against client data creates contractual and regulatory exposure for data teams — GDPR, data processing agreements, the open-source liability argument, and what the Dutch DPA warning actually means.
Custom Parameterized MCP Queries
Using the MCP Toolbox's tools.yaml to define constrained, parameterized queries that give AI assistants structured access to data without arbitrary SQL.
dbt Identity Resolution Pipeline
Production dbt DAG structure for GA4 identity resolution — the incremental identity mapping model, stitched events model, schema tests, and the 3-day lookback window for late-arriving data.
BigQuery Pricing Policy Changes 2024–2025
Three BigQuery policy changes that affect cost modeling in 2024–2025: the flat-rate deprecation, the 200 TiB daily on-demand quota, and new Cloud Storage fees for external tables.
MCP Setup Troubleshooting
Common failure modes when setting up MCP servers — macOS PATH problems, silent JSON config failures, tool count limits, and where to find debug logs.
MetricFlow Advanced Patterns
Complex metric patterns in MetricFlow — period-over-period comparisons with offset_window, filtered metrics with Jinja, and handling null gaps in time series
dbt Schema Validation and Data Products Hub
Hub connecting notes on dbt's three validation mechanisms, source schema gaps, the Mesh governance triad, and contract-first development.
TDD with Claude Code for dbt
How test-driven development works with Claude Code for dbt models — write tests first, let the agent iterate to pass them, then refactor with confidence
Security Posture for AI Agents
How to scope permissions, isolate environments, and treat always-on AI agents like OpenClaw as untrusted actors — practical security practices for data teams
BigQuery BI Engine
How BigQuery BI Engine provides in-memory acceleration for dashboard queries, what it supports, what it silently skips, and how to verify it's actually working.
Dagster Resources
How Dagster resources work as centrally configured, injectable external connections — BigQueryResource, DbtCliResource, and the pattern for swapping environments without changing asset code.
dbt MCP Server Tool Reference
Complete reference for the 20+ tools exposed by the dbt MCP server — CLI commands, metadata discovery, Semantic Layer queries, and job management.
Silent SQL Errors in AI-Generated Code
Why AI-generated SQL that compiles and runs is more dangerous than SQL that fails — the 3% warning rate, temporal filter inconsistencies, and the review practices that catch what linters miss
Signals That Your Cron-Based dbt Setup Has Outgrown Itself
Five concrete indicators that a simple cron-scheduled dbt job has hit its limits — and what each one tells you about the orchestration capability you actually need.
dbt Production Safety Hooks
Using Claude Code PreToolUse hooks to block dangerous dbt commands before they execute — full-refresh on production, unscoped builds, and other high-risk operations
Context Engineering for Data Pipelines
How the value in data engineering is shifting from writing code to structuring context — the emerging discipline of context engineering, the ETL-to-ECL reframe, and the skills pipeline risk.
dbt-utils Introspective Macros
How dbt-utils compile-time introspection macros work — get_column_values, get_relations_by_pattern, get_query_results_as_dict, and get_single_value — and when they cause problems.
Google DDA Silent Fallback
GA4's Data-Driven Attribution silently falls back to last-click when data thresholds aren't met -- how to detect it and why warehouse-native attribution avoids this trap
GA4 Ecommerce Items UNNEST Pattern
How to handle GA4's nested items array in dbt — building a separate item-level grain model with intentional Cartesian UNNEST.
dbt Incremental Strategy Configuration Patterns
Complete, runnable dbt config blocks for each incremental strategy — merge with predicates, delete+insert on Snowflake, insert_overwrite with static partitions, and replace_where on Databricks.
The Context Gap in AI Data Engineering
Why business context — what 'Status' means, whether 'Amount' is net or gross, tacit SAP knowledge — is the core limitation of AI in data engineering.
Open Data Contract Standard
ODCS v3.1.0 under the Linux Foundation's Bitol project — what it covers, how it compares to the Data Contract Specification, and where harmonization stands.
dbt Features Without a Dataform Equivalent
The dbt capabilities that simply don't exist in Dataform — snapshots, the package ecosystem, microbatch incremental strategy, and Slim CI. These are the blockers that stall dbt-to-Dataform migrations.
Incrementality Testing for Attribution
How to validate attribution models with causal experiments — holdout tests, geo tests, and platform lift studies that measure whether a channel actually drives conversions
dbt-expectations row_condition Pattern
How the row_condition parameter in dbt-expectations enables conditional test filtering — applying tests to specific segments without custom SQL.
Expense Capture as a Habit Layer
Using natural language logging and receipt OCR to close the gap between 'I spent money' and 'that expense is recorded somewhere useful' — why capture is the real problem, not the accounting.
GA4 user_id Data Quality
Common implementation bugs that corrupt GA4 user_id data — string 'null' values, logout tagging errors, suspicious high-cardinality IDs — and the SQL patterns to detect and filter them.
Agent Dashboard Scraping: The Fragility Problem
How browser automation works for dashboards without APIs, the five-step scraping loop, session management patterns, and why silent failure is the central limitation that makes this a fallback of last resort.
Data Quality Validation Layers
The three-layer model for data quality — proactive contracts, reactive schema tests, and anomaly detection — and why you need all three.
Unit Tests vs Data Tests in dbt
The two-checkpoint model for dbt testing — unit tests gate deployments by verifying transformation logic, data tests gate production by verifying data health.
dbt Testing Strategy by Layer
What to test at each layer of the dbt DAG — sources, base, intermediate, and mart — and why testing intensity should increase toward the edges.
Unit Testing Snapshot Consumers in dbt
Three strategies for testing snapshot-related logic — pre-snapshot base models, SCD2 date range calculations in downstream models, and change detection hashing.
GA4 Flattened Events Materialization
When and how to pre-unnest GA4 events into a flat table — the cost-performance tradeoff, the CREATE TABLE pattern, and why dbt models formalize this approach.
MetricFlow semantic model components
The three building blocks of a MetricFlow semantic model: entities (join keys), dimensions (group-by columns), and measures (numeric aggregations that feed metrics).
Secured Table Materialization in dbt
A custom dbt materialization that automatically reapplies BigQuery row access policies, column descriptions, and data masking tags after every table rebuild.
Organizing dbt Unit Tests at Scale
Tag strategies, CI pipeline tiers, and selection patterns for managing hundreds of dbt unit tests across a growing project.
Elementary dashboard organization
How to organize Elementary dashboards and reports by domain, criticality, and refresh cadence so they stay useful as your project grows.
Self-healing pipeline maturity spectrum
Five levels of self-healing capability in data pipelines, from basic retries to fully agentic systems, and where production value actually concentrates.
Salesforce Account Hierarchy with Recursive CTEs
How to resolve Salesforce's self-referential ParentAccountId into a flattened hierarchy using recursive CTEs in BigQuery — the SQL pattern, ultimate parent resolution, and revenue rollup.
BigQuery Partitioning vs Clustering Decision Framework
A practical decision framework for choosing between BigQuery partitioning, clustering, or both based on table size, query patterns, and operational needs.
dbt MCP Server: Local vs Remote
The two deployment modes for dbt's MCP server — local gives full CLI access and works without dbt Cloud, remote is read-only metadata and requires a Cloud plan.
Jinja Templating for SQL Practitioners
Why Jinja feels natural to SQL-first analytics engineers — the double-brace model, macros as SQL helpers, and the separation of concerns that keeps transformation files focused.
Dataform-to-dbt Migration
Migration paths between Dataform and dbt — tooling, realistic timelines by project size, and why macro conversion is where migrations get painful
dbt MCP Server Setup and Configuration
Step-by-step installation and configuration of the dbt MCP server — uv, environment variables, feature toggles, and client setup for Claude Code and Claude Desktop.
Data team on-call strategies
How data teams structure on-call rotations, triage processes, and runbooks differently from software engineering on-call, and which metrics reveal whether the system is working.
dbt-expectations Setup and Configuration
How to install and configure dbt-expectations — packages.yml, timezone variable, platform compatibility, and dependency management.
Attribution Analysis
A structured guide to marketing attribution — from SQL implementation patterns through multi-model comparison, dashboard design, and incrementality testing
Google Ads BigQuery Data Transfer Service
Hub note for the free Google Ads → BigQuery pipeline — setup, schema quirks, known data gaps, and dbt modeling patterns.
Dagster Software-Defined Assets
The core building block of Dagster — how @dg.asset works, automatic dependency inference, the Definitions object, and how SDAs differ from traditional orchestrator primitives.
dbt Unit Testing Implementation
Hub note for implementing dbt unit tests — from YAML syntax and mocking patterns to BigQuery workarounds and CI/CD integration.
Dagster UI for Analytics Engineers
A walkthrough of Dagster's web UI — the Asset Catalog, Global Asset Lineage, Run Details, health indicators, and the Dagster+ Pro features that matter most for analytics engineers on dbt + BigQuery.
Workload Identity Federation for CI/CD
Replace service account keys in GitHub Actions and other CI systems with keyless OIDC authentication — no credentials to store, rotate, or leak.
dbt MCP Server Safety Considerations
The risks of giving an AI assistant dbt CLI access — production data modification, credential scope, Copilot credit consumption, and practical mitigations.
dbt persist_docs for Warehouse Comments
How persist_docs pushes dbt descriptions directly to your data warehouse as table and column comments, making documentation available where analysts already work
dbt profiles.yml with env_var for Multi-Client GCP
Using env_var() interpolation in profiles.yml so dbt reads GCP credentials and project from environment variables — enabling seamless client switching via direnv.
GA4 BigQuery Query Patterns
Efficient querying of GA4 date-sharded tables — _TABLE_SUFFIX filtering, inline vs FROM clause UNNEST, reusable dbt macros, and cost control practices.
GA4 User Mart Pattern
Building a user-grain mart from GA4 session data — first/last touch attribution, lifetime value aggregation, and identity stitching with user_pseudo_id and user_id.
Consent Mode US Privacy Requirements
Why US-only sites increasingly need Consent Mode — Enhanced Conversions requirements, expanding state privacy laws, and the recommended region-specific configuration.
BigQuery IAM Patterns
Least-privilege IAM for BigQuery — predefined roles, the data vs. compute permission split, service account strategy, and common anti-patterns.
Visualization MCP Server Ecosystem
The available MCP servers for generating charts and interactive visualizations — AntV, Vega-Lite, DuckDB-Plotly, and how to pick between them.
Markdown-to-Notion Blocks Parser
How to convert markdown to Notion's block API format in JavaScript, including handling rich_text objects, the 2000-character limit, and the 100-block request cap.
Consent Mode Implementation Mechanics
The technical implementation of Consent Mode v2: default state configuration, CMP integration, GTM trigger ordering, and the wait_for_update race condition.
Agent Skill Supply Chain Attacks
How malicious skills in agent ecosystems like ClawHub bypass traditional antivirus detection, why natural-language malware is a fundamentally different threat class, and how to evaluate skills before installing them.
MCP Ecosystem Governance
How MCP became a vendor-neutral open standard — the Linux Foundation donation, corporate adoption, and what broad industry support means in practice.
MCP Pipeline Monitoring Server Pattern
A practical MCP server pattern for pipeline monitoring — checking job status, listing failures, and triggering reruns across orchestrators like Airflow and Dagster.
Orchestration Market Landscape in 2026
Where each major data orchestrator stands in 2026 — Airflow's scale, Dagster's dbt dominance, Prefect's developer velocity, Kestra's rapid rise, and the tools in decline.
Kestra Declarative Orchestration
Kestra's YAML-first orchestration model — how it differs from Python-decorator tools, its rapid growth, enterprise adoption, and why production evidence at small-to-mid scale is still thin.
dbt documentation drift detection
Techniques for detecting when dbt documentation falls out of sync with reality — column-level drift, git-based staleness signals, and schema drift for sources
dbt Test Output Parsing for Automated Monitoring
How to extract structured, actionable information from dbt test output — distinguishing failure types, capturing sample rows, and handling partial runs so automated monitoring doesn't miss anything.
BigQuery Editions Migration Anti-Patterns
Five mistakes teams make when migrating from BigQuery on-demand to Editions — and how to avoid them.
dbt Validation Mechanisms Compared
How dbt contracts, data tests, and dbt-expectations differ in when they fire, what they cover, and what they cost — and why you need all three.
CLAUDE.md BigQuery Specifics
What to put in CLAUDE.md when your dbt project runs on BigQuery — GoogleSQL dialect enforcement, partition filter requirements, and incremental model config templates.
GTM Server-Side Hosting on Azure
How to host the GTM Server-Side tagging container on Azure using App Service or Container Apps, with pricing tiers and SSL configuration notes.
Dataform as a GCP Service
What Dataform is in 2026 — a fully managed BigQuery transformation service with deep GCP integration, zero licensing cost, and SQLX/JavaScript templating
GTM Server-Side on Cloud Run: Deployment and Configuration
How to deploy GTM Server-Side on Google Cloud Run — automatic vs manual provisioning, production configuration settings, custom domain setup, and multi-region architecture for global traffic.
JavaScript vs Jinja in Analytics Engineering
The philosophical and practical differences between Dataform's JavaScript templating and dbt's Jinja2 — where they diverge, what each excels at, and how to convert between them.
Metrics as Code
The practice of defining business metrics in version-controlled YAML — reviewed in pull requests, tested in CI/CD, and consumed by BI tools and AI agents
LinkedIn Ads OAuth Token Management
LinkedIn's OAuth token expiration model for the Marketing API — 60-day access tokens, 365-day refresh tokens, forced annual re-authentication, and operational strategies for custom pipelines.
dlt Pagination Patterns
The built-in paginators dlt provides for common API patterns, and how to extend BasePaginator for APIs that don't follow standard conventions.
MCP Protocol Fundamentals
Reading map for the foundational MCP concepts — how the protocol works, what messages look like, what primitives exist, and how they fit together for data engineering.
Schema Registry for Contract Enforcement
How schema registries enforce data contracts on event streams before data reaches the warehouse — compatibility modes, CEL validation rules, and production practices.
dbt Orchestration Decision Framework for GCP
A decision framework for choosing between Cloud Run Jobs, Cloud Workflows, and Cloud Composer for dbt orchestration on GCP — based on actual requirements, not arbitrary complexity thresholds.
Documentation Quality Determines AI Usefulness
Why the quality of your dbt documentation directly determines how useful AI tools can be — the Roche chatbot failure, the docs-to-AI feedback loop, and case studies in enforcement
Looker Studio + BigQuery Performance — Hub
Map of garden notes on optimizing Looker Studio dashboards backed by BigQuery: BI Engine, extract mode, blending pitfalls, caching, credentials, and upgrade decisions.
dbt Mart Layer Patterns
What belongs in dbt mart models — reporting aggregations, activation exports, ML feature tables — and the principle that every mart serves a specific consumer.
dbt Private Packages via Git
How to distribute internal dbt packages as Git dependencies — version pinning, authentication options, and trade-offs compared to Hub packages.
The Chatbot → Copilot → Agent Paradigm Shift
How AI's relationship to the developer changed across three distinct eras — chatbot (demand), copilot (alongside), agent (autonomous) — and why each phase is qualitatively different, not incrementally better.
dbt Package Ecosystem Hub
Navigation hub for the dbt package ecosystem — how installation works, what's available, version compatibility, and how to evaluate packages for production use.
Dagster Branch Deployments for dbt
How Dagster+ branch deployments create ephemeral preview environments for dbt changes on PR, with state-based selection and partitioned execution for CI/CD workflows.
OpenClaw Security Risks — What's Documented
A factual catalogue of the specific, documented security incidents, CVEs, regulatory warnings, and threat patterns that analytics engineers need to know before running OpenClaw near client data.
Looker Studio Data Blending Pitfalls
Why Looker Studio data blending silently creates cartesian products, how to identify it, and why pre-joining in BigQuery is almost always the right fix.
Google Ads Performance Max Data Gaps
Why Performance Max campaign data is incomplete in BigQuery DTS, what's actually missing, and how to get the data you need.
OpenClaw for Data People — Hub
A reading map for the OpenClaw introductory guide — architecture and design principles, tool comparison, security risks, persistent memory, and the ecosystem around OpenClaw.
Meta CAPI Server-Side Setup: Deduplication and Event Match Quality
How to configure Meta Conversions API via server-side GTM — event deduplication with shared event_id, user data mapping for EMQ score, and forwarding the _fbp and _fbc cookies.
AI-Powered dbt Documentation
A reading path through automating dbt documentation — from scaffolding tools to AI generation, business context enrichment, and CI enforcement
Orchestrator Pricing for dbt Teams
Managed orchestration costs compared — Dagster+, Prefect Cloud, Astronomer, Cloud Composer, and dbt Cloud — with entry-tier pricing, scaling models, and the hidden costs that shift the math.
Proactive vs. Reactive AI Agents
The distinction between AI tools that respond to prompts and AI agents that act on schedules — why this shift matters for automation use cases, and where each model fits.
Dagster GCP Deployment
How to deploy Dagster on GCP — Serverless vs Hybrid modes, GKE with Helm, Workload Identity authentication, Cloud SQL for storage, and the community Cloud Run option.
Choosing Between BigQuery MCP Options
Decision framework for BigQuery MCP access — Remote Server vs Toolbox vs bq CLI, matched to your client, team setup, and use case.
Dagster vs dbt Cloud Orchestration
When Dagster's dagster-dbt integration is worth the setup cost over dbt Cloud's built-in scheduler -- cost comparison, capability gaps, and the vendor independence argument after the Fivetran merger.
MCP Official Reference Servers
The servers maintained by the MCP Steering Group — which are actively developed, which have been handed to vendors, and why the distinction matters.
GTM Server-Side Managed Hosting Providers
Comparison of Stape, Addingwell, TAGGRS, and Cloudflare Zaraz as managed alternatives to self-hosting GTM Server-Side containers on cloud infrastructure.
dbt Cross-Database Macros
Hub for writing dbt macros that work across BigQuery, Snowflake, and Databricks — dialect differences, dispatch configuration, built-in macros, and array operations.
dbt Core vs Cloud Hub
Hub note connecting garden notes decomposed from the dbt Core vs dbt Cloud comparison article.
dbt Agent Skills
dbt Labs' official Markdown skill files that teach AI coding agents how to follow dbt best practices — what they cover, how they work, and what the benchmarks actually show.
Incremental Strategy Decision Framework
A decision framework for choosing the right dbt incremental materialization strategy — merge, delete+insert, insert_overwrite, append, and microbatch
Consent Mode v2 Hub
Hub note connecting all concepts involved in implementing, debugging, and maintaining Google Consent Mode v2 across web and server-side GTM containers.
dlt REST API Source Configuration
How to configure dlt's declarative REST API Source — the client block, resources block, endpoint paths, pagination wiring, and what dlt does automatically with the data.
Semantic Validation in dbt
How to encode business rules as dbt tests — regex pattern validation, cross-column logic, natural language AI validation, and when each approach fits.
Customer 360 Modeling
Hub note connecting the concepts involved in building a unified Customer 360 model from CRM and GA4 data — identity resolution, DAG architecture, conflict resolution, and privacy constraints.
Eventarc Event-Driven dbt Triggers
Using Eventarc to trigger dbt runs when upstream data arrives — Cloud Storage object creation, BigQuery audit log events, and combining event-driven with scheduled execution.
Writing Reusable dbt Macros
A map through the garden notes on designing, naming, documenting, testing, and evolving dbt macros — from when to extract to how to handle breaking changes.
Lightdash Open Source & Self-Hosting Hub
Hub note for Lightdash self-hosting — connecting to dbt, Docker Compose setup, Kubernetes deployment, and the open-source vs paid tier tradeoffs.
MCP Ecosystem Overview
A reading map for the MCP ecosystem — from protocol fundamentals through official servers, clients, data engineering integrations, and building custom servers.
Looker Studio Credentials and Security
The security risks of owner's credentials in public Looker Studio reports, the LeakyLooker vulnerability, cost attribution, and using service accounts for production dashboards.
Rule-Based Lead Scoring in dbt
How to build a configurable weighted lead scoring model in dbt using vars, seed files, and Jinja macros — so marketing can adjust weights without touching SQL.
Data Observability Total Cost of Ownership
The true cost comparison between OSS and managed data observability — accounting for engineering time, warehouse compute, training, and the costs that don't appear on invoices.
GA4 Sharded-to-Partitioned Base Model
How to convert GA4's date-sharded BigQuery export into a properly partitioned incremental dbt model, and why the static lookback pattern is critical for correctness.
Google Ads Developer Token
What the Google Ads developer token is, how access levels work, why approval takes months, and which loading tools require one.
dbt Package Ecosystem Governance
Who maintains the dbt package ecosystem — dbt Labs, Fivetran, and community contributors — and how to evaluate a package's reliability before committing to it in production.
CLAUDE.md for dbt Projects
A concrete CLAUDE.md template for dbt projects — what to include, what to leave out, and why the file should be grown reactively from real mistakes rather than written upfront.
GA4 Window Function Pitfalls
Three window function traps specific to GA4 sessionization: the LAST_VALUE framing trap, IGNORE NULLS for sparse event data, and MAX for session-scoped boolean flags.
Late-Arriving Data in dbt — Hub
Hub note connecting all concepts around handling late-arriving data in dbt incremental models: measurement, lookback windows, partition strategies, deduplication, testing, and operational safety.
dbt Macro Documentation in YAML
Why _macros.yml beats inline SQL comments for documenting dbt macros, and how to write entries that developers actually use.
Google Workspace CLI (gws)
The gws CLI gives programmatic access to every Google Workspace API through a single binary — Gmail, Drive, Calendar, Sheets, Docs — filling the gap gcloud has never covered.
LinkedIn Ads B2B Data Value
What makes LinkedIn Ads data uniquely valuable for B2B analytics — professional demographic pivots, the negative CTR-to-pipeline correlation, company-level impression attribution, and what metrics actually matter.
Unit Testing Incremental Models in dbt
The dual-mode testing pattern for incremental models — overriding is_incremental, mocking this, and understanding that expect blocks show inserts, not final state.
GTM Server-Side Hosting: Decision Framework
How to choose between Cloud Run, AWS ECS Fargate, Azure App Service, and managed providers for hosting your GTM Server-Side container in production.
dbt-utils v1.0 Migration: What Moved to dbt-core
The complete list of macros that moved from dbt-utils to the dbt namespace at v1.0, what was removed entirely, and how to migrate an existing project.
Managed ELT Tool Architectures: Fivetran, Airbyte, and dlt
How the three dominant data ingestion tools approach the same problem differently — fully managed connectors, self-hosted open source, and Python-native libraries.
dbt-Fivetran Merger and the 2026 Transformation Landscape
How the October 2025 dbt-Fivetran merger reshaped the analytics engineering landscape — unified platform strategy, Core/Cloud divergence, and what it means for tool choice.
Data Observability Scaling Thresholds
Team size and technical complexity thresholds that determine when to move from dbt tests to OSS observability to paid platforms.
Unit Testing CASE WHEN Boundary Logic in dbt
Systematic boundary testing for CASE WHEN statements — testing threshold values, just-under values, null handling, and implicit ELSE behavior.
Elementary setup troubleshooting
Fixes for the most common Elementary installation failures: empty reports, missing edr command, BigQuery location errors, tables materialized as views, and Databricks permission issues.
Unit Testing GA4 Sessionization
How to unit test GA4 sessionization logic in dbt — session boundary detection, cross-midnight sessions, microsecond timestamps, and single-event sessions.
Claude Code Bang Prefix for Shell Commands
Using the ! prefix to run shell commands directly inside Claude Code — how it saves tokens, speeds up authentication, and keeps your flow uninterrupted
dbt Test Failure Severity Framework
A four-tier framework for prioritizing dbt test failures by impact — combining test type, model layer, downstream dependents, and historical context into an actionable severity ranking.
Google Ads Server-Side: Conversion Linker and Enhanced Conversions
How to configure Google Ads conversion tracking server-side — the Conversion Linker tag that manages the FPGCLAW cookie, Enhanced Conversions for hashed user data, and realistic uplift expectations.
When to Write dbt Unit Tests
Specific decision criteria for where native dbt unit tests pay off — complex logic scenarios, the incremental model override pattern, and what to skip.
dbt Documentation People Actually Read
A reading path through writing dbt documentation that gets used — from diagnosing why docs go unread to writing patterns, delivery mechanisms, and the AI quality feedback loop
Claude Code Skills Activation
How Claude Code skills work under the hood — keyword matching against YAML frontmatter, the ~20% auto-activation rate, and why skills fit background domain knowledge better than repeatable workflows
Dataform vs dbt Cost Comparison
The real cost equation between Dataform and dbt — licensing savings vs ecosystem gaps, migration costs, and hidden engineering overhead
Dataform Testing Limitations
Dataform's built-in assertions cover three scenarios — uniqueness, null checks, and row conditions. Everything else requires custom implementation.
Soda Data Contract Verification
How Soda's contract engine validates schema, freshness, and quality rules against warehouse tables after loading but before transformation — filling the gap between EL and dbt.
BigQuery On-Demand Billing Mechanics
How BigQuery on-demand pricing actually charges you — columnar billing, the LIMIT clause trap, 10 MB minimums, caching, the free tier, and cross-cloud pricing.
BigQuery Cost Model
How BigQuery pricing works across on-demand and editions models — bytes billed, slot hours, storage costs, and optimization levers
BigQuery Slot Usage Monitoring
How to monitor BigQuery slot usage with INFORMATION_SCHEMA, the Slot Estimator, and Cloud Monitoring -- practical queries and tools for capacity planning.
dbt Attribution Packages Landscape
Open-source dbt packages and Python libraries for production-ready attribution models -- Snowplow, Tasman, Rittman Analytics, ChannelAttribution, and when to build your own
Deploying dbt Core on Cloud Functions
A step-by-step guide to deploying dbt Core on Google Cloud Functions — repository structure, service account setup, deployment, and scheduling with Cloud Scheduler.
dbt Package Anti-Patterns
Common mistakes in dbt packages — hardcoded schemas, missing dispatch, tight version constraints, generic model names, table defaults, and missing version bounds.
The Freelance Admin Overhead Problem
Why solo consultants spend 20-30% of their time on non-billable admin, why the standard fixes don't work, and what makes a single agent different from another SaaS subscription.
generate_schema_name: Environment-Aware Schema Naming in dbt
How to override dbt's generate_schema_name macro so dev environments get prefixed schema names while prod uses clean custom schema names directly.
Campaign Naming and UTM Standardization
How to standardize campaign names across ad platforms using naming conventions, regex parsing, and seed overrides — plus UTM hygiene rules that make cross-platform attribution possible.
Multi-Client Agent Reporting Architecture
How to structure per-client isolation for OpenClaw reporting workflows — separate cron jobs, credential management at scale, failure containment, and the security tradeoffs of running multiple clients on a single machine.
Lightdash Metric Types and Definition Syntax
The three categories of Lightdash metrics — aggregate, non-aggregate, and post-calculation — plus column-level vs model-level placement, filters, and display configuration.
Dataform for BigQuery
A structured guide to evaluating Dataform as a BigQuery transformation tool — what it is, how it compares to dbt, and when it makes sense
Claude Code Status Line Configuration
How to set up Claude Code's status line to display git branch, active model, and context usage — practical setup for analytics engineers
Cloud Workflows Orchestration
GCP Cloud Workflows as a middle-ground orchestration layer between Cloud Scheduler and Cloud Composer — serverless, cheap, and capable enough for multi-step pipelines.
What dbt docs generate actually produces
The static site artifacts that dbt docs generate creates — manifest.json, catalog.json, index.html — and the flags that control how they are built
Microbatch Automatic Upstream Filtering
How dbt's microbatch strategy automatically filters upstream models by event_time, reducing full table scans — and when to opt out with .render().
BigQuery Editions Testing Without Commitment
How to evaluate BigQuery Editions on real workloads before committing — creating a test reservation, rolling back instantly, opting out of org-level reservations, and using the Slot Estimator.
Build vs. Buy Data Pipelines
A reading path through the shifting economics of managed vs. custom data pipelines — from Fivetran's pricing changes through AI-assisted development with dlt to the hybrid strategy
Claude Code Skill Description Engineering
How to write Claude Code skill descriptions that actually trigger activation — explicit keywords, negative boundaries, and the specificity principle
BigQuery Cost Attribution with INFORMATION_SCHEMA
Using INFORMATION_SCHEMA queries to find expensive queries, attribute costs by user and dataset, identify unoptimized tables, and build a weekly cost review practice.
Dagster Components
Dagster's newest major abstraction — YAML-configured objects that generate assets, checks, and schedules with minimal Python, lowering the barrier for SQL-first analytics engineers.
Prompt Injection and the Lethal Trifecta
Simon Willison's lethal trifecta — why combining private data access, untrusted content exposure, and external communication ability creates a uniquely dangerous attack surface for AI agents handling data work.
GCP Processing Engine Selection: Dataflow, Dataproc, and BigQuery
When to use Dataflow, Dataproc, Dataproc Serverless, and BigQuery SQL for data transformation on GCP — matched to team expertise and workload type, not arbitrary scale thresholds.
BigQuery Resource Hierarchy
How BigQuery organizes resources from organization to table level — projects as billing boundaries, datasets as access control units, and naming conventions that scale.
Star Schema vs One Big Table
When to use entity-separated star schema vs wide denormalized tables in your data warehouse — BigQuery performance characteristics, OBT benchmarks, and the practical answer of building both.
GA4 Parameter Extraction Macro
A reusable dbt macro for extracting GA4 event parameters without row multiplication, including the numeric variant for int/float/double fields.
Warehouse Attribution Data Sources
The three categories of data required for warehouse-based attribution -- website interactions, ad platform spend, and conversions -- with platform-specific loading patterns and common data quality traps
BigQuery Fair Scheduling
How BigQuery distributes slots among competing queries -- the two-level fair scheduling algorithm, its project-level implications, and why project architecture matters for performance.
Asset-Centric Orchestration
The paradigm shift from task-based orchestration (what to run) to asset-based orchestration (what data should exist) — why it matters for analytics engineers and how it changes debugging, monitoring, and pipeline design.
dbt Testing Pyramid
The layered testing pyramid for dbt projects -- broad data test coverage at the base, targeted unit tests in the middle, anomaly detection and data diffs at the top.
CLAUDE.md as Project Memory
How CLAUDE.md gives Claude Code persistent project context — what to include, what to leave out, and why reactive additions beat proactive documentation
KPI Reporting via Direct Warehouse Queries
Why querying the warehouse directly beats dashboard scraping for scheduled KPI delivery — the BigQuery and Snowflake CLI patterns, how to structure pre-written SQL for agent-driven reporting, and the tradeoffs of the approach.
LLM Training Data Asymmetry for Tool Use
Why LLMs write better shell commands than MCP tool calls — the training data distribution that makes CLI fluency outperform structured tool-calling for well-established tools.
dbt Documentation Rollout Strategy
A practical week-by-week approach to rolling out dbt documentation standards — starting with model descriptions, adding enforcement incrementally, and using AI tools to close coverage gaps
Choosing Between Fivetran, Airbyte, and dlt
A decision framework for picking the right ELT tool based on team skills, budget, connector needs, and tolerance for operational burden — with practitioner sentiment from the field.
CLAUDE.md for Analytics Engineering — Hub
Hub note connecting all CLAUDE.md configuration concepts for dbt and BigQuery analytics engineering — project memory, dbt templates, BigQuery specifics, hooks, and slash commands.
Position-Based Attribution Models
U-shaped and W-shaped attribution models that weight credit by journey position — formulas, edge cases, industry weight variations, and BigQuery SQL implementation
dbt Documentation Freshness
A reading path through keeping dbt documentation accurate as your project evolves — from the case for automation to drift detection, coverage tracking, and a graduated rollout strategy
dbt Scheduling Without an Orchestrator
How to run dbt in production without Airflow, Dagster, or Prefect — the practical options from $0/month GitHub Actions to Cloud Run Jobs, when each fits, and when to move on.
YAML Formatting Options for dbt Descriptions
The four ways to write descriptions in dbt YAML — inline strings, folded scalars, literal scalars, and doc blocks — and when to use each one
GA4 BigQuery Timezone Handling
Three different timezone contexts coexist in GA4 BigQuery exports — event_timestamp, event_date, and _TABLE_SUFFIX each use different references that silently break date-range queries.
Pipeline Enforcement Layer Strategy
The four-layer model for data contract enforcement across the full pipeline — pre-warehouse, post-load, transformation, and continuous observability — with practical adoption ordering.
dbt Core Open-Source Fundamentals
What dbt Core is, how its CLI-driven workflow operates, the open-source ecosystem that powers it, and the technical profile of teams that choose it.
LinkedIn Ads Analytics Endpoint
The engineering quirks of LinkedIn's adAnalytics endpoint — no pagination, 15K element cap, 20-metric limit per request, query tunneling, cursor pagination migration, and monthly API versioning.
BigQuery Regional Architecture
How BigQuery's region model works — multi-region vs. single region, the cross-region join constraint, and how to choose a region you'll live with permanently.
Data Observability Minimum Viable Stack
The four non-negotiable observability capabilities every data team needs regardless of tooling — primary key tests, freshness monitoring, volume anomaly detection, and actionable alerting.
Lightdash Dimension Configuration in dbt YAML
How Lightdash turns dbt column definitions into dimensions — types, display properties, time intervals, and computed additional_dimensions.
Google Ads Scripts for BigQuery Export
Using Google Ads Scripts to export performance data directly to BigQuery — how the authentication model works, what the execution limits are, and when this approach beats the alternatives.
Advanced Claude Code Workflows for dbt
A reading path through Claude Code configuration, testing, documentation, and debugging workflows for analytics engineers working with dbt on BigQuery
Meta Ads to BigQuery Pipeline — Hub
Map of content for building and maintaining a Meta Ads to BigQuery pipeline — API structure, actions array flattening, attribution windows, iOS signal loss, and operational maintenance.
dbt Macro Naming Conventions
Naming patterns for dbt macros that make them discoverable, communicative, and well-organized — verb prefixes, descriptive names, internal helper conventions, and the one-macro-per-file rule.
Pipeline Alerting Delivery Patterns
How to structure pipeline monitoring alerts — tiered severity routing, Slack vs. Telegram tradeoffs, delivery modes (channel, DM, webhook, silent), and designing alert systems that don't become noise.
OpenClaw Architecture and Design Principles
How OpenClaw is built — the Gateway daemon, model-agnostic BYOK design, HEARTBEAT.md proactive loop, and plain-text-first philosophy that makes it feel natural to data people.
Late-Arriving Data and the Lookback Window Pattern
How to handle late-arriving data in dbt incremental models using lookback windows, including window sizing trade-offs and the limits of any lookback approach.
dbt-to-Dataform Migration Process
The step-by-step process for migrating a dbt project to Dataform — auditing what you have, running the automated tool, converting macros to JavaScript includes, recreating tests as assertions, and setting up orchestration.
BigQuery Cross-Organization Data Sharing
Patterns for sharing BigQuery data across organizations — agency/client models, Analytics Hub, authorized views, and row/column-level security.
BigQuery Multi-Environment Patterns
Three patterns for separating dev, staging, and production in BigQuery — separate projects, dataset prefixes, and central data lake with department marts.
CLOUDSDK_CONFIG for Per-Project gcloud Isolation
How CLOUDSDK_CONFIG isolates all gcloud state per project — credentials, ADC files, active config — and why it's the missing piece for multi-client GCP work.
Elementary materialization override for dbt 1.8+
Why Elementary requires a materialization override macro in dbt 1.8+ projects, what happens without it, and how to write it correctly for BigQuery and Snowflake.
Claude Code Authentication Options
The two ways to authenticate Claude Code — subscription OAuth and API keys — when to use each, and the precedence rule that trips people up
Time-Decay Attribution Model
Time-decay attribution using exponential decay with a configurable half-life — the formula, choosing half-life by industry, BigQuery SQL implementation, and parameterization
Meta Ads Insights API Structure
How the Meta Marketing API is organized — the five-level object hierarchy, Insights API as a reporting edge, versioning cadence, authentication models, and rate limit system.
Try-Heal-Retry pattern
How to add AI-powered remediation to data pipelines using structured LLM output, Pydantic schemas, and circuit breakers, with production examples using Claude.
MCP Client Primitives
The three capabilities MCP clients expose to servers — sampling (server-requested LLM completions), elicitation (server-requested user input), and roots (filesystem boundaries) — and when they matter for data engineering.
Elementary report sections
What each section of the Elementary HTML report shows and when to use each one during a data quality review.
dbt-utils Generic Tests
Full reference for dbt-utils generic tests: YAML syntax, the Fusion arguments: key change, group_by_columns support, and when to use each test.
Layered AI Stack for Analytics Engineering
The mental model of thinking about AI tools in layers — IDE, coding agent, orchestration, review — rather than choosing a single tool for everything
Terminal Safety for Beginners
Which terminal commands are safe, which are dangerous, how to read error messages, and the keyboard shortcuts that save you when something goes wrong
Consent Mode Impact on Identity Resolution
How GA4 Consent Mode V2 changes what identity data reaches BigQuery — cookieless pings without identifiers, the same-page backstitch nuance, and filtering consented data for stitching pipelines.
dbt Unit Test Edge Case Patterns
Three essential edge case patterns for dbt unit tests — null handling, empty tables with format: sql, and date boundary testing.
dbt-expectations Hub
Hub note for dbt-expectations — setup, test reference, conditional filtering, severity tuning, BigQuery implementation patterns, and the unit test vs data test distinction.
AI Agent Data Quality: What Works Today vs. What's Aspirational
An honest assessment of which AI agent capabilities for dbt data quality are production-ready, which require significant work but are achievable, and which are still too unreliable to depend on.
MCP Apps for Data Engineers
A reading path through MCP Apps — the January 2026 extension to MCP that renders interactive HTML visualizations directly inside AI client conversations.
BigQuery Partitioning Configuration Patterns
Domain-specific partitioning and clustering configurations for BigQuery in dbt -- event data, marketing, multi-tenant SaaS, and IoT patterns with rationale.
Data Contracts Hub
Hub note connecting garden notes on data contracts — definitions, specifications, ownership, tooling, validation layers, and adoption challenges.
BigQuery Idle Slot Sharing
How idle slot sharing works in BigQuery Enterprise editions -- requirements, configuration, preemption behavior, and when to disable it.
dbt Three-Layer Architecture
How the base, intermediate, and mart layers organize a dbt project, what belongs in each, and how data flows between them.
GA4 Session Key Construction
Why ga_session_id alone fails as a session identifier, how to build the correct composite key, and the edge cases that produce null sessions.
Triangulated Marketing Measurement
Why resilient marketing measurement combines three approaches -- multi-touch attribution for daily optimization, media mix modeling for strategic allocation, and incrementality testing for causal validation
GCP IAM Least Privilege for Data Teams
A sequenced guide to auditing and fixing IAM debt on GCP data platforms — from surfacing over-permissioned principals to implementing policy tags and row-level security.
CI/CD Data Quality Testing in dbt
How to integrate data quality testing into CI/CD pipelines — Slim CI with state:modified+, GitHub Actions workflows, and tools like Datafold and Recce for regression detection.
Debugging Custom dbt Materializations
Common errors in custom dbt materializations, what causes them, and how to test materializations systematically before deploying to production.
dbt Materialization Anatomy
The six-step structure every dbt materialization follows — setup, pre-hooks, main SQL, post-hooks, cleanup, and return — plus the key objects and adapter methods.
dbt Doc Block Syntax and Reuse Patterns
How dbt doc blocks work — syntax, naming rules, cross-package references, and patterns for writing column and model descriptions once and reusing them across your project
dbt Quality Morning Summary Pattern
A two-cycle design for automated dbt quality reporting — daily morning summaries with Slack threading and follow-up capability, plus a weekly digest that surfaces patterns individual days miss.
AI-Generated SQL Failure Modes
Why AI-generated SQL is dangerous — it runs without errors but returns wrong results. Research on temporal filter inconsistencies, join failures, and the confidence problem.
dbt Documentation with Claude Code
A systematic approach to dbt documentation using Claude Code — the codegen-plus-AI pattern, docs blocks for consistency, lineage diagrams, and slash commands for automation
OpenClaw GA4 Skill Integration
How to use community GA4 skills from ClawHub to pull analytics metrics into OpenClaw — the two main options, what each extracts, and how to feed the output into scheduled reporting.
HubSpot Deal Stage Modeling
Why deal stage transitions live in DEAL_STAGE not DEAL_PROPERTY_HISTORY, how to use the is_closed and label columns correctly, and patterns for time-in-stage and pipeline analysis.
Elementary for dbt: setup guide
A sequenced map of notes covering Elementary installation from scratch -- dbt package, materialization override, CLI profile configuration, and troubleshooting.
Server-Side Tracking Data Quality Evidence
The quantitative case for server-side tracking — the 41% average data quality improvement, case studies from Finobo, Forward Media, and seoplus+, ad platform Conversions API adoption, and the cost-benefit calculation that has flipped.
Dataform-to-dbt Migration Decision Criteria
When migrating from Dataform to dbt makes sense, when it doesn't, and the realistic cost-benefit calculation.
dbt Model Versioning
How dbt model versions work — breaking vs non-breaking changes, the state:modified selector, version integers, deprecation dates, and the friction points.
Cross-Platform Ad Metric Comparability
Why only five metrics can be meaningfully compared across ad platforms, how to handle platform-specific metrics, and conversion configuration details that determine what your 'conversions' column actually means.
GA4 Identity Stitching Techniques
The four SQL patterns for resolving GA4 anonymous-to-known user identity — last-touch, first-touch, full backstitch, and session-scoped — with a decision framework for choosing between them.
Cascading Agent Pattern
The architecture where an always-on monitoring agent detects issues and triggers a coding agent to investigate and fix them — how OpenClaw and Claude Code hand off work
dlt Environment Setup
Setting up a dlt project from scratch — Python virtual environment, installation, dlt init, and the project scaffold it creates.
Google Ads ClickType Impression Trap
Why Google Ads DTS stats tables silently inflate impression counts 3-6x, and the exact SQL filter that fixes it without breaking click counts.
dbt Mesh Governance Triad
How contracts, access controls, and model versioning combine in dbt Mesh to turn models into data products — and which models actually deserve that treatment.
Custom MCP Server Decision Criteria
When to build a custom MCP server versus using an existing one — the build-vs-browse decision framework for data engineering teams.
Consent Mode v2 Parameter Architecture
The four Consent Mode v2 parameters, how upstream browser controls differ from downstream server instructions, and the legal mandate that forced the change.
dlt for AI-Assisted Pipeline Development
Why dlt's Python-native, declarative design maps well to AI-assisted development — the REST API builder, BigQuery-specific features, LLM-friendly docs, and production results
Slack KPI Summary Format for Agent-Delivered Reports
A practical template for agent-generated Slack KPI summaries — directional arrows, week-over-week structure, percentage points vs. percentages, and how to handle the LLM math reliability problem in the output layer.
GA4 session_start Event Unreliability
Why counting session_start events produces wrong session counts in GA4 BigQuery data, and the correct approach using distinct session IDs.
MCP Tool Design Patterns
How to design MCP tools that work well with AI — docstrings as descriptions, Pydantic models for structured output, and input validation with schemas.
Identity Resolution Monitoring
Key metrics and anomaly detection SQL for monitoring a GA4 identity stitching pipeline — stitch rate, consolidation rate, shared device exposure, and week-over-week change alerts.
dbt-to-Dataform Migration Hub
Hub note for migrating from dbt to Dataform — the decision, the concept mapping, the procedural steps, and what you'll lose. For BigQuery teams evaluating the switch.
dbt-expectations BigQuery Implementation Patterns
Real-world dbt-expectations implementation on BigQuery — complete GA4 and advertising data quality YAML, test placement by DAG layer, and a practical starting checklist.
Ad Platform API Landscape
API characteristics, authentication models, and engineering gotchas for Google Ads, Meta, LinkedIn, Microsoft, TikTok, Pinterest, and Twitter ad platforms
Self-healing risk tiering
A framework for deciding which pipeline failures can self-heal automatically, which need human approval, and which should never be auto-remediated.
dbt Testing Anti-Patterns
Four common testing mistakes in dbt projects -- over-testing, happy-path-only coverage, drifting thresholds, and testing warehouse functions -- and what to do instead.
OpenClaw Persistent Memory Model
How OpenClaw's Markdown-based persistent memory differs from session-based tools, what it enables for long-running data monitoring, and how memory files work in practice.
Cloud Scheduler OIDC Authentication for HTTP Triggers
How Cloud Scheduler authenticates to secure HTTP endpoints using OIDC tokens — the service account requirements, the gcloud setup, and the pattern for Cloud Functions and Cloud Run.
HubSpot Associations as Bridge Tables
HubSpot's many-to-many association model requires bridge tables at every layer. How to model them correctly, handle fan-out, and resolve the primary company problem.
dbt Fusion Package Compatibility
How the dbt Fusion engine (v2.0) affects package compatibility — version bounds, manifest format changes, the Fusion badge, and how to prepare your project and packages for migration.
BigQuery Partition Pruning Patterns
How to combine partitioning and clustering in BigQuery for maximum scan reduction, including anti-patterns that silently defeat pruning.
dbt Testing Strategy
Hub note for building a complete dbt testing strategy — taxonomy, layer placement, unit test selection, alert routing, and package ecosystem.
GCP Auth Constraints for AI Coding Agents
How Claude Code, Codex, and Cursor each handle GCP authentication — and where each one breaks when tokens expire, contexts conflict, or interactive flows are required.
Consent Mode Server-Side GTM Propagation
How consent signals travel from the web container to server-side GTM via gcs and gcd parameters, and why non-Google vendor tags require manual consent enforcement.
BigQuery Job Failure Monitoring with INFORMATION_SCHEMA
SQL patterns for monitoring BigQuery job failures and detecting cost anomalies using INFORMATION_SCHEMA.JOBS — with filtering strategies for multi-project setups.
Data Observability Build vs. Buy
A reading path through the data observability decision — from the tool landscape through scaling thresholds, ML vs statistical detection, TCO, and the minimum viable stack.
BI Tool Self-Service Models
Three different approaches to self-service BI: governed exploration (Lightdash), visual query builder (Metabase), and LookML-powered Explore (Looker). How to match the model to your users.
OpenClaw Pipeline Monitoring
A reading path through the OpenClaw pipeline monitoring tutorial — cron scheduler mechanics, writing monitoring skills, tiered alerting delivery, BigQuery failure checks, and Snowflake cost monitoring.
GA4 Sessionization Hub
Hub note connecting all concepts involved in building session tables from GA4 BigQuery event data.
MetricFlow CLI querying
How to query MetricFlow metrics from the CLI in dbt Core (mf) and dbt Cloud (dbt sl): group-by, filters with Jinja dimension syntax, multi-metric queries, and the semantic manifest.
GA4 User Identity
Map of content for GA4 identity resolution in BigQuery — from understanding the two identifier types through stitching techniques, production pipelines, and ongoing monitoring.
AI Limitations in Data Engineering
A reading path through the five core limitations of AI in data engineering — SQL failure modes, the context gap, architectural judgment, the production gap, and context engineering as the response.
dbt Docs Customization and Deployment
A reading path through customizing and deploying dbt docs beyond localhost — from understanding the build artifacts to choosing a hosting platform, automating deployment, and knowing when to replace the default frontend
dbt Docker Containerization
Patterns for containerizing dbt Core for production — multi-stage Dockerfiles, version pinning, Artifact Registry, and the two-repository strategy that separates transformation logic from infrastructure.
Data Contract Tooling Ecosystem
The landscape of data contract tools in 2026 — dedicated contract tools, quality frameworks with contract support, and governance platforms.
dbt Attribution Comparison Pattern
How to structure a dbt project for multi-model attribution — running first-touch, last-touch, linear, position-based, and time-decay models in parallel with a union comparison layer
AI Personal CRM Pattern
Using an AI agent to auto-scan email and calendar for contact relationship tracking — how the pattern works, what SQLite with vector embeddings enables, and why this is the highest-risk integration to configure carefully.
2-Layer RBAC with Google Groups
Bind IAM roles to Google Groups representing job functions, not individual users — the pattern that makes onboarding, offboarding, and permission audits tractable.
Elementary report hosting
How to host Elementary HTML reports on S3, GCS, or Azure Blob Storage so the whole team has access, and how to automate report generation in CI pipelines.
Incremental Predicates for dbt Merge
How incremental_predicates limit destination table scans during dbt merge operations, turning full table scans into partition-pruned reads.
OpenClaw Security Risks — Hub
A reading map for the OpenClaw security risks guide — documented incidents, CVEs, regulatory warnings, supply chain attacks, context window safety failures, and what data teams specifically need to know.
dbt BigQuery Configuration
How to configure dbt for BigQuery — profiles.yml setup, authentication methods, generate_schema_name, job labels for cost attribution, and cost control settings.
Shapley Value Attribution
How cooperative game theory's Shapley values produce provably fair attribution by calculating each channel's average marginal contribution across all possible channel coalitions
Claude Code Strengths and Limitations for Data Work
Where Claude Code delivers real value in data engineering — boilerplate, multi-file changes, pattern replication — and where it struggles with novel logic, ambiguity, and over-engineering.
MCP Discovery Resources
Where to find MCP servers — the official registry, community directories, and how to evaluate what you find before installing.
MetricFlow Metric Types
The five metric types in dbt MetricFlow — simple, cumulative, derived, ratio, and conversion — with syntax, use cases, and gotchas for each
Data Contract Ownership Models
Producer-defined vs consumer-defined data contracts — why who writes the contract determines whether the initiative succeeds.
GA4 Engagement Event Query Recipes
Production-ready BigQuery SQL for GA4 engagement events — page views, scroll depth, outbound clicks, file downloads, and video engagement funnels.
GA4 First dbt Models Tutorial
Hub note for building your first GA4 dbt models — from understanding the raw event schema through base, intermediate, and mart layers.
Advertising Data in the Warehouse
Hub note for the complete guide to centralizing advertising data — from the measurement problem through extraction, pipeline challenges, and dbt transformation patterns
Unit Testing Conversion Funnels in dbt
How to unit test funnel analysis models in dbt — step-over-step conversion rates, user drop-off tracking, and the step-skipping edge case.
Organizing Lightdash Metrics at Scale
How to keep a large Lightdash implementation navigable — groups, group_details, the Metrics Catalog with Spotlight categories, and reusable parameters for values that change across deployments.
Migrating Incremental Models to Microbatch
How to convert traditional dbt incremental models to the microbatch strategy — step-by-step migration, side-by-side code examples, and first-run considerations.
dbt Intermediate Layer Patterns
What belongs in dbt intermediate models — joins, business logic, window functions — and the critical rule of never reducing grain.
dbt Ad Reporting Patterns
How to model advertising data in dbt — the dbt_ad_reporting package, cross-platform UNION patterns, platform-specific normalization, and reconciliation testing
Ad Platform Attribution Bias
Why every ad platform overcounts conversions, how walled-garden incentives create measurement gaps, and what only becomes visible when ad data lives in the warehouse
Measuring Data Latency Before Choosing an Incremental Strategy
How to profile the gap between event time and load time in your source tables, and use that distribution to size lookback windows and choose the right incremental strategy.
Elementary CLI profile configuration
How to configure the Elementary CLI (edr) profile for BigQuery, Snowflake, and Databricks -- including the gotchas that differ from your dbt profile.
MetricFlow time spine
The MetricFlow time spine is a continuous date table used for cumulative metrics and time series gap filling. How to create it, configure it, and understand when it's required.
Lightdash's Semantic Layer vs MetricFlow
How Lightdash's native metric layer differs from MetricFlow — simpler syntax, tighter coupling, no cross-platform API — and when the tradeoffs favor each approach.
dbt-expectations Test Reference
A categorized reference of the highest-value dbt-expectations tests — table-level, pattern, range, multi-column, and completeness — with BigQuery-ready YAML examples.
Markov Attribution SQL Implementation
SQL patterns for extracting journey paths and calculating transition probabilities in BigQuery, the data preparation layer for Markov chain attribution
Let’s talk.
Tell me what’s broken. I’ll reply within two working days with whether I can help — and if I can’t, I’ll point you somewhere useful.
Get in touch →