Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

agentic harness

Aryan Iyappan edited this page Apr 28, 2026 · 1 revision

title: Agentic Harness category: concepts tags: [harness, agents, pipeline, architecture] status: developing created: 2026年04月28日 updated: 2026年04月28日 sources:


Agentic Harness

What It Is

An 8-layer agent-native engineering harness for the ultimate-pi project. The pipeline is mandatory and always-on — every task flows through all 8 layers in sequence. Tuning parameters control behavior within layers, not whether they run.

Architecture: pi.dev TypeScript extensions implement intelligence (spec hardening, planning, QA, critics, observability, memory). Archon implements orchestration (workflow DAG execution, worktree isolation, human approval gates, loop nodes, persistent run history).

Pipeline Flow

User request
 → [L1] harden-spec → resolve ambiguities → spec_hardened
 → [L2] create-plan → review-plan → approve-plan → plan_approved
 → [Archon] orchestrate via workflow YAML
 → [L3] execute-next-task (grounding checkpoints)
 → subtask_completed
 → [L4] generate-tests + run-tests (spec-only, black-box)
 → subtask_tested (if green) or rework (if red)
 → [L5] run-critics → subtask_verified or subtask_failed
 → if verified: [L6] enforce-observability → subtask_observable
 → if failed: rework loop
 → ALL_TASKS_COMPLETE
 → [L7] capture-memory → store patterns + decisions

The 8 Layers

# Layer Purpose Key Principle
1 spec-hardening Block execution until underspecified inputs are resolved Ambiguity is a bug
2 structured-planning Machine-readable task DAG reviewed before code begins No code without a plan
3 grounding-checkpoints Smallest verifiable change + mandatory re-grounding against spec Drift detection prevents scope creep
4 adversarial-verification Critic agents attack solutions; no cooperative review No code ships without adversarial review
5 automated-observability Instrumentation is part of definition-of-done Metrics before merge
6 persistent-memory Wiki as knowledge base + claude-obsidian skills The system must improve over time
7 schema-orchestration Archon workflow DAG orchestrates all layers Structured data flow, not narrative coordination
8 wiki-query-interface LLM-native search via claude-obsidian skills (ADR-009) hot.md → index.md → pages

Key Design Decisions

  • ADR-008: Layer 4 (QA) generates tests from spec ONLY — never from implementation code (black-box testing)
  • ADR-009: Layer 6 (Memory) uses claude-obsidian skills in GitHub Mode B — no custom WikiKnowledgeBase class, no Vectra, no embedding model
  • No skip rule: Verification is mandatory. Agent confidence is not evidence. Every layer runs for every task.
  • Config tunes behavior, not enablement: e.g., which critics to run, grounding interval, coverage target — but layers cannot be disabled

Build Order

Phased: schemas → memory (L7) → spec hardening (L1) → planner (L2) → execution+grounding (L3) → QA (L4) → critics (L5) → observability (L6) → Archon workflow (L8)

Token Budget

~17,500 tokens overhead per subtask. Typical 5-subtask plan: ~83,500 tokens + coding tokens.

Shared Foundation

All inter-layer data contracts live in lib/harness-schemas.ts. Config schema in lib/harness-config.ts — tuning only, no enable/disable.

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /