Agentic Harness Primitives
A framework of 12 production-grade infrastructure primitives for building reliable AI agents, derived from Nate B Jones’s analysis of the leaked Claude Code architecture. Organized into three tiers by implementation priority.
The central thesis: successful agents are 80% plumbing, 20% model. The primitives that separate production systems from demos are mostly boring backend engineering — crash recovery, permissions, logging, state management.
Tier 1 — Day One Non-Negotiables
These are the foundations everything else builds on. Teams that skip them build demos that don’t survive real use.
1. Tool Registry with Metadata-First Design
Define agent capabilities as a data structure before writing any implementation code. The registry answers what exists and what does it does without executing anything. Claude Code maintains 207-entry command registry + 184-entry tool registry; each entry has name, source hint, and description; implementations load on demand.
Without a registry: can’t filter tools by context, can’t introspect without side effects, every new tool requires orchestration changes.
2. Permission System and Trust Tiers
Classify tools by risk. Claude Code uses three tiers — built-in (always on, highest trust), plugin (medium trust, disableable), skills (user-defined, lowest trust). The bash tool has an 18-module security architecture: pre-approved patterns, destructive command detection, git-specific checks, sandbox termination.
An agent that can take real-world actions without a permissions layer is a demo, not a product.
3. Session Persistence That Survives Crashes
An agent session is not conversation history — it’s recoverable state. Must include: conversation, usage metrics, permission decisions, configuration. Claude Code stores sessions as JSON; full agentic engine can be reconstructed from stored state. Agents crash constantly. Every unrecoverable interruption degrades the customer experience.
4. Workflow State vs Conversation State
These are distinct and almost every framework conflates them.
| Conversation State | Workflow State | |
|---|---|---|
| Answers | What have we said? | What step are we on? |
| Handles | Chat history | Side effects, retry safety |
| Scope | Within the agent | Persists beyond the agent |
Without workflow state, a crash mid-execution may duplicate writes or re-run expensive operations. Model explicit states: planned, awaiting-approval, executing, waiting-on-external.
5. Token Budget Tracking with Pre-Turn Checks
Define hard limits: max turns, max token budget, compaction threshold. Before each API call, project token usage — stop with a structured reason if the budget would be exceeded. Prevents runaway cost loops. Claude Code implements this as a customer trust feature, not a revenue optimization.
Tier 2 — Operational Foundation
6. Structured Streaming Events
Emit typed events at each step: message_start, command_match, tool_match, etc. Include a crash event (with reason) as the final stream message on failure — a black-box recorder. Streaming is not just for showing text; it’s how users understand what the agent is doing and when to intervene.
7. System Event Logging
Maintain a history log that records what the agent did, separate from what it said. Contents: context loaded, registry initialization, routing decisions, execution counts, permission grants/denials. Required for enterprise auditability. Allows provable reconstruction of any run.
8. Verification at Two Levels
- Level 1 (common): Verify completed work is correct.
- Level 2 (often skipped): Test that harness changes don’t break existing guardrails. E.g., “Do destructive tools still require approval after this change?” Harnesses evolve — each change affects all future runs.
Tier 3 — Operational Maturity
9. Tool Pool Assembly
Don’t load all tools on every run. Assemble a session-specific pool based on mode flags, permission context, and deny lists. For general-purpose agents, give access to a broader tool set and let the agent select what it needs — rather than hardcoding a fixed list.
10. Transcript Compaction
Auto-compact conversation history after a configurable turn count. Retain recent entries; discard older ones; track persistence status. Balance: keep initial instructions, shed irrelevant intermediates. Critical for long-running agents facing context limits.
11. Permission Audit Trail
Permissions are first-class objects, not boolean gates. Claude Code has three permission handlers: interactive (human in the loop), coordinator (multi-agent orchestrator), swarm worker (autonomous under orchestrator). Permissions must be queryable, not just enforced.
12. Agent Type System
Define constrained agent types with specific prompts, allowed tools, and behavioral constraints. Claude Code’s six built-in types: Explore (cannot edit files), Plan (cannot execute code), Verify, Guide, General-purpose, Status-line-setup. Don’t spawn agents randomly — define observable types to manage agent populations and efficiency.
Common Failure Modes
- Building multi-agent coordination before permissions work
- Implementing a plugin marketplace before sessions survive crashes
- Conflating conversation state with workflow state
- Hardcoding tool sets instead of using dynamic pool assembly
- No token budgets → runaway cost loops
Relation to Claude Code Architecture
Claude Code is the primary public example of these primitives at production scale ($2.5B run rate, millions of users). The leak exposed the internal structure confirming these patterns. The primitives are model-agnostic — applicable to any agent system regardless of LLM choice.