Agent-First CLI Design Principles

In March 2026, Justin Poehnelt — the Google DevRel engineer behind the [[Google Workspace CLI (gws)|gws CLI]] — published “You Need to Rewrite Your CLI for AI Agents.” The central argument: human developer experience optimizes for discoverability and forgiveness, while agent developer experience optimizes for predictability and defense-in-depth. The seven principles he articulates apply to any tool designed for agent consumption, not just gws.

1. Raw JSON Payloads Over Bespoke Flags

Traditional CLIs use flat flag namespaces: --title "Q1 Budget" --locale "en_US". This works for simple inputs but breaks for nested structures — you end up with invented serialization conventions (--filter.key=value --filter.operator=eq) that don’t map cleanly to the underlying API.

The agent-first approach: accept full API payloads via --json and --params flags that map directly to the API schema. LLMs can generate these with zero translation loss because the JSON structure is the API contract:

# Human-first CLI: flat flags that serialize to an API call
gws drive files create --title "Q1 Budget" --mime-type "application/vnd.google-apps.spreadsheet"

# Agent-first CLI: pass the API payload directly
gws drive files create --json '{"name": "Q1 Budget", "mimeType": "application/vnd.google-apps.spreadsheet"}'

The agent doesn’t need to learn the CLI’s custom flag conventions. It generates the API payload it already knows.

2. Schema Introspection Replacing Documentation

Baking static documentation into agent system prompts is expensive in tokens and prone to staleness. When the API updates, the docs in your prompt are wrong.

The alternative: make the tool itself queryable for its schema. gws schema drive.files.list dumps full method signatures as machine-readable JSON — parameters, request body, response types, OAuth scopes. The Discovery Document is the single source of truth, and it’s always current.

# Agent queries schema at task time rather than loading stale docs at startup
gws schema gmail.users.messages.list

This approach connects to the LLM Training Data Asymmetry for Tool Use observation: agents are better at using tools they can inspect dynamically than tools they have to memorize.

3. Context Window Discipline

A single Gmail message can consume a meaningful fraction of an LLM’s context window. Long-form responses are expensive, and a tool that returns walls of JSON will burn through context quickly.

Two mechanisms address this:

Field masks limit what the API returns to only what the agent needs:

# Without field mask: returns every field on every file object
gws drive files list --params '{"pageSize": 10}'

# With field mask: returns only id, name, mimeType
gws drive files list --params '{"pageSize": 10, "fields": "files(id,name,mimeType)"}'

NDJSON pagination (--page-all) emits one JSON object per page rather than buffering the entire response. This makes large result sets stream-processable — the agent can process and discard each page rather than holding everything in context.

Both techniques treat context as a scarce resource to be conserved, not a buffer to fill.

4. Input Hardening Against Hallucinations

Poehnelt’s framing: “Agents hallucinate. Build like it.”

This is the principle most easily dismissed and most important not to dismiss. When an agent generates inputs for a CLI, it will sometimes produce subtly wrong values. The question is whether those wrong values cause silent data corruption or get caught at the input layer.

gws validates inputs before executing:

File path validation prevents directory traversal — agents sometimes hallucinate paths like ../../.ssh/authorized_keys
Control character rejection — reject anything below ASCII 0x20
Query parameter injection — reject ? and # in resource identifiers (agents occasionally embed query parameters inside resource IDs)
Double-encoding prevention — reject % to prevent URL encoding issues

These aren’t hypothetical attack patterns. They’re observed failure modes from running AI agents against real APIs. The validation layer is the difference between a bad output that fails fast and a bad output that silently corrupts data.

5. Shipping Agent Skills, Not Just Commands

--help output describes syntax. It doesn’t encode operational wisdom.

Skills are structured Markdown files with YAML frontmatter — one per API surface plus higher-level workflow recipes — that tell agents things the help text doesn’t:

“Always use --dry-run for mutating operations”
“Always confirm with user before executing write/delete commands”
“When listing files, use field masks to avoid large responses”
“For large mailboxes, paginate rather than requesting all messages at once”

The repo ships 100+ of these. They’re loaded into the agent’s context alongside the CLI, so the agent has operational guidance without requiring a system prompt that enumerates every edge case:

npx skills add https://github.com/googleworkspace/cli/tree/main/skills/gws-drive

This is a pattern worth borrowing: if you’re building a tool agents will use, ship the skill files alongside the tool. Let the agent load them on demand rather than embedding everything in a system prompt.

6. Multi-Surface Delivery

The same core binary should work as a human CLI, an MCP server, a scripting tool, and a headless automation target. Forcing a choice between these modes means some legitimate use cases are second-class citizens.

gws delivers:

CLI mode: standard interactive terminal use
MCP server mode: gws mcp -s drive,gmail exposes Workspace as structured tools over stdio
Gemini CLI extension: registers as a tool for Gemini’s agent
Headless/CI mode: credentials via environment variables, no interactive prompts

# MCP mode for Claude Code integration
gws mcp -s drive,gmail,calendar

# Headless mode for pipeline use
export GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE=/path/to/creds.json
gws drive files list

The key design decision: don’t optimize the CLI for one consumer and bolt on support for others. Design the core interface to work across all surfaces, then add surface-specific protocol adapters.

7. Safety Rails

Two safety mechanisms address the distinct failure modes of agent-driven CLI use:

--dry-run validates requests against the API without executing them. Agents should default to dry-run for any mutating operation and only proceed after the user confirms the intended action. This is the “always confirm before write/delete” guidance from the skill files, made mechanical.

--sanitize pipes responses through Google Cloud Model Armor to defend against prompt injection embedded in data. The threat model: a malicious email body instructs the agent to forward all messages to an attacker. Without sanitization, an agent reading email via gws gmail and then acting on the content is vulnerable to exactly this. Model Armor strips injection patterns from responses before they reach the model’s context.

The sanitize flag is opt-in — it adds latency and cost — but for any workflow where the agent reads data it didn’t create and then acts on it, it should be considered non-optional.

Design Implications

For most CLIs, agent-first design means adding these capabilities to an existing tool rather than rebuilding from scratch: a --json input path alongside flags, a schema subcommand, a skill file directory, input validation, and a --dry-run flag. None require rewriting the core tool. Multi-surface delivery and NDJSON pagination are the two decisions that require planning before v1.0.