agentic harness

Aryan Iyappan edited this page Apr 28, 2026 · 1 revision

title: Agentic Harness category: concepts tags: [harness, agents, pipeline, architecture] status: developing created: 2026年04月28日 updated: 2026年04月28日 sources:

"harness-implementation-plan" related:
"spec-hardening"
"structured-planning"
"grounding-checkpoints"
"adversarial-verification"
"automated-observability"
"persistent-memory"
"schema-orchestration"
"wiki-query-interface" summary: An 8-layer mandatory pipeline that every task flows through — spec hardening, planning, execution, QA, critics, observability, memory, and orchestration. No layer can be skipped. provenance: extracted: 0.8 inferred: 0.2 ambiguous: 0.0

Agentic Harness

What It Is

An 8-layer agent-native engineering harness for the ultimate-pi project. The pipeline is mandatory and always-on — every task flows through all 8 layers in sequence. Tuning parameters control behavior within layers, not whether they run.

Architecture: pi.dev TypeScript extensions implement intelligence (spec hardening, planning, QA, critics, observability, memory). Archon implements orchestration (workflow DAG execution, worktree isolation, human approval gates, loop nodes, persistent run history).

Pipeline Flow

User request
 → [L1] harden-spec → resolve ambiguities → spec_hardened
 → [L2] create-plan → review-plan → approve-plan → plan_approved
 → [Archon] orchestrate via workflow YAML
 → [L3] execute-next-task (grounding checkpoints)
 → subtask_completed
 → [L4] generate-tests + run-tests (spec-only, black-box)
 → subtask_tested (if green) or rework (if red)
 → [L5] run-critics → subtask_verified or subtask_failed
 → if verified: [L6] enforce-observability → subtask_observable
 → if failed: rework loop
 → ALL_TASKS_COMPLETE
 → [L7] capture-memory → store patterns + decisions

The 8 Layers

#	Layer	Purpose	Key Principle
1	spec-hardening	Block execution until underspecified inputs are resolved	Ambiguity is a bug
2	structured-planning	Machine-readable task DAG reviewed before code begins	No code without a plan
3	grounding-checkpoints	Smallest verifiable change + mandatory re-grounding against spec	Drift detection prevents scope creep
4	adversarial-verification	Critic agents attack solutions; no cooperative review	No code ships without adversarial review
5	automated-observability	Instrumentation is part of definition-of-done	Metrics before merge
6	persistent-memory	Wiki as knowledge base + claude-obsidian skills	The system must improve over time
7	schema-orchestration	Archon workflow DAG orchestrates all layers	Structured data flow, not narrative coordination
8	wiki-query-interface	LLM-native search via claude-obsidian skills (ADR-009)	hot.md → index.md → pages

Key Design Decisions

ADR-008: Layer 4 (QA) generates tests from spec ONLY — never from implementation code (black-box testing)
ADR-009: Layer 6 (Memory) uses claude-obsidian skills in GitHub Mode B — no custom WikiKnowledgeBase class, no Vectra, no embedding model
No skip rule: Verification is mandatory. Agent confidence is not evidence. Every layer runs for every task.
Config tunes behavior, not enablement: e.g., which critics to run, grounding interval, coverage target — but layers cannot be disabled

Build Order

Phased: schemas → memory (L7) → spec hardening (L1) → planner (L2) → execution+grounding (L3) → QA (L4) → critics (L5) → observability (L6) → Archon workflow (L8)

Token Budget

~17,500 tokens overhead per subtask. Typical 5-subtask plan: ~83,500 tokens + coding tokens.

Shared Foundation

All inter-layer data contracts live in lib/harness-schemas.ts. Config schema in lib/harness-config.ts — tuning only, no enable/disable.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agentic harness

Agentic Harness

What It Is

Pipeline Flow

The 8 Layers

Key Design Decisions

Build Order

Token Budget

Shared Foundation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally