Analytics Engineer Skills in the Agent Era

The shift from copilot to agent changes which analytics engineering skills are most relevant. Copilots reward SQL fluency, modeling instincts, and domain knowledge. Agents handle more execution directly, shifting value toward directing execution, reviewing output, and making judgment calls that agents cannot make autonomously.

Seven skill areas are outlined below, corresponding to where that shift is most pronounced.

1. AI Orchestration

Setting up Skills, MCPs, hooks, and guardrails. Not prompting agents, but configuring them.

The distinction matters. Prompting is session-level: you describe what you want, the agent does it, the session ends. Orchestration is project-level: you define what the agent should always do, what it should never do, which tools it has access to, how it should handle failures, and what it should report back. The orchestration layer is what makes an agent into a system rather than a fast chatbot.

A well-crafted CLAUDE.md is orchestration. A library of tested skills is orchestration. Production hooks that block edits to production schemas and automatically run SQLFluff after every edit — also orchestration. These are the artifacts that separate useful agent output from constant correction cycles.

This skill is distinct from any specific tool — OpenClaw, Claude Code, dbt Agent Skills each have different configuration mechanisms, but the underlying discipline of defining agent behavior precisely is the same.

# Example: A guardrail in CLAUDE.md that prevents a specific class of error

## Safety Rules
- NEVER run `dbt run` without first running `dbt compile`
- NEVER modify models in marts/ without checking downstream impact first
  (`dbt ls --select +model_name+`)
- NEVER commit changes without running tests on affected models

The key judgment is knowing which rules need to exist — which agent behaviors, left unconstrained, will produce errors that outweigh the configuration overhead.

2. Specification Engineering

Writing requirements that machines can follow.

This is different from traditional documentation. Documentation is written for humans who will interpret, apply judgment, and ask clarifying questions when something is ambiguous. Specifications for AI have to be precise enough that an agent with no judgment about context can follow them correctly.

Think of it as writing a detailed ticket for a very capable junior engineer who has no access to you for follow-up questions and will interpret every instruction literally. Where a human would infer from context, an agent will do exactly what you said — and if what you said was ambiguous, it will make a choice you didn’t intend.

CLAUDE.md is the primary artifact of specification engineering for dbt work. But the skill extends to how you write skill descriptions, how you frame prompts for complex tasks, and how you structure test expectations. The ability to write precise, unambiguous instructions becomes a core competency because it directly determines the quality of agent output.

Good specification engineering produces:

Clear naming conventions that leave no room for interpretation (“base__, not base_”)
Explicit workflow sequences (“compile first, then run, then test — in that order”)
Negative boundaries that define what the agent should NOT do, not just what it should
Context about why rules exist, so the agent applies them appropriately in edge cases

The last point is underappreciated. Instructions without rationale get applied mechanically. Instructions with rationale get applied with something closer to judgment:

# Without rationale (applied mechanically, may get wrong edge case)
- Use LEFT JOIN when joining to the orders table

# With rationale (applied with judgment)
- Use LEFT JOIN when joining to the orders table — some customers
  have accounts but no orders yet, and we never want to drop them
  from customer-grain analyses

3. Critical Code Review

Catching the AI’s specific failure modes before they reach production.

AI-generated code tends to look correct — it follows patterns, uses consistent naming, and compiles without errors. Failures are typically semantic, not syntactic, and require understanding of what the data should represent to catch.

The failure modes are predictable enough to review for deliberately:

Join type errors. The most common silent failure. An inner join where a left join was needed drops records without any visible error. Revenue reports look cleaner than expected. The gap is invisible until someone checks row counts against source.

-- Agent wrote this (drops customers with no orders)
SELECT c.customer_id, SUM(o.revenue) AS total_revenue
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY 1

-- Should be this (preserves all customers)
SELECT c.customer_id, COALESCE(SUM(o.revenue), 0) AS total_revenue
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY 1

Temporal filter inconsistencies. Applying date filters to one table in a join but not the joined table. The Thomson Reuters finding — 73% of AI-generated time-based analyses had this error — means it’s not an edge case. It’s the default failure mode when an agent applies filters without understanding the temporal model of the warehouse.

Silent data filtering. Agents make data quality decisions without flagging them. Null values filtered, records excluded by an unanticipated condition, rows dropped because a join key was mismatched. The output is smaller than the input and nobody knows why.

Wrong layer references. In a three-layer dbt project, an intermediate model that directly references a source (bypassing the base layer) or a mart that directly references a base model (bypassing intermediate transformation) violates the architecture without producing errors. The model runs correctly in isolation and fails subtly in the system.

Critical code review for agent output is a different skill from code review for human output. For humans: look for logical errors, edge cases, performance issues. For agents: look for semantic mismatches — did the agent understand what the data means, or just what it looks like?

4. Business Domain Expertise

The value shifts from typing SQL to understanding business questions.

This skill exists at every stage of analytics engineering, but it becomes relatively more important as agents handle more execution. When execution is commoditized, the differentiated value is deep knowledge of the business that makes the execution correct.

An agent can calculate attribution. It cannot tell you that this client’s attribution model has a 30-day lookback window because their sales cycle is long, and that the marketing team has been lobbying to shorten it to 14 days, and that the data supporting the 14-day model exists but the business decision hasn’t been made yet. That context is what makes the difference between producing a technically correct model and producing one that’s actually useful for the decision at hand.

Domain expertise accumulates through stakeholder investment: time with the people who use the data, documentation of the tribal knowledge that makes pipelines correct for a specific business, and understanding of decisions that haven’t been finalized.

5. Data Governance and Ethics

Ensuring AI-generated pipelines meet quality, compliance, and regulatory standards.

This becomes especially critical when agents run autonomously. When a human writes every transformation, governance decisions are visible and deliberate. When an agent generates transformations in batches, governance decisions happen implicitly — and unless you’ve defined what “compliant” means in the agent’s configuration, the agent will optimize for functionality without awareness of regulatory constraints.

For European consulting work, GDPR touches almost every pipeline. Which tables contain personal data. Where that data can be processed. How long it can be retained. Who can access it. An agent that generates a pipeline without these constraints isn’t being malicious — it’s being context-free. Governance constraints have to be encoded into the specifications the agent follows.

The accountability question is practical, not theoretical: when an agent’s cron job processes personal data in a way that turns out to violate GDPR, who’s accountable? The agent isn’t. The person who configured and supervised it is. This shifts the governance responsibility from “make sure the code is compliant” to “make sure the agent only produces compliant code.” Same outcome, different skill.

6. Architecture and Systems Thinking

Designing the overall data platform, not individual models.

An agent can build a model. It cannot design a platform. The meta-decisions — which tools fit in the stack, where to draw layer boundaries, how to handle data across organizational boundaries, what trade-offs to accept between flexibility and governance — require understanding of the system as a whole.

The agent era makes systems thinking more important, not less, for a structural reason: when agents write more of the code, architectural decisions multiply through more code. A good architectural decision about how time is modeled across the warehouse makes every AI-generated query more likely to be correct. A bad architectural decision about temporal logic gets reproduced in every generated model that touches time.

This includes knowing when to use which tools: when OpenClaw versus Claude Code, when an incremental model versus a full refresh, when dbt Core versus dbt Cloud, when to build a custom solution versus use a package. These are architectural choices. The agent implements whatever architecture you give it.

7. AI Tool Fluency

Understanding the sweet spot of each tool in your stack.

The tooling is moving fast. Staying current with what’s available and what’s actually production-ready (versus demo-ready) is itself a skill. The four-layer mental model helps here — IDE assistant, coding agent, orchestration layer, review layer — because it provides a stable framework for evaluating new tools against the roles they fill.

Tool fluency isn’t about keeping up with every new release. It’s about knowing, for any given task:

Which tool is best suited to it
What the failure modes of that tool are for this task
What guardrails need to be in place before you rely on it for this type of work

A concrete example: Claude Code is excellent for building models end-to-end from a specification, but silently filters nulls when making data quality decisions. That failure mode means every model Claude Code builds should be reviewed with specific attention to null handling. OpenClaw is excellent for scheduled monitoring but has a serious security profile to consider before using it with client data. Security Posture for AI Agents covers this. Knowing these limitations is what allows the tools to be used without being surprised by their failure modes.