-
Notifications
You must be signed in to change notification settings - Fork 1
agentic harness
title: Agentic Harness category: concepts tags: [harness, agents, pipeline, architecture] status: developing created: 2026年04月28日 updated: 2026年04月28日 sources:
- "harness-implementation-plan" related:
- "spec-hardening"
- "structured-planning"
- "grounding-checkpoints"
- "adversarial-verification"
- "automated-observability"
- "persistent-memory"
- "schema-orchestration"
- "wiki-query-interface" summary: An 8-layer mandatory pipeline that every task flows through — spec hardening, planning, execution, QA, critics, observability, memory, and orchestration. No layer can be skipped. provenance: extracted: 0.8 inferred: 0.2 ambiguous: 0.0
An 8-layer agent-native engineering harness for the ultimate-pi project. The pipeline is mandatory and always-on — every task flows through all 8 layers in sequence. Tuning parameters control behavior within layers, not whether they run.
Architecture: pi.dev TypeScript extensions implement intelligence (spec hardening, planning, QA, critics, observability, memory). Archon implements orchestration (workflow DAG execution, worktree isolation, human approval gates, loop nodes, persistent run history).
User request
→ [L1] harden-spec → resolve ambiguities → spec_hardened
→ [L2] create-plan → review-plan → approve-plan → plan_approved
→ [Archon] orchestrate via workflow YAML
→ [L3] execute-next-task (grounding checkpoints)
→ subtask_completed
→ [L4] generate-tests + run-tests (spec-only, black-box)
→ subtask_tested (if green) or rework (if red)
→ [L5] run-critics → subtask_verified or subtask_failed
→ if verified: [L6] enforce-observability → subtask_observable
→ if failed: rework loop
→ ALL_TASKS_COMPLETE
→ [L7] capture-memory → store patterns + decisions
| # | Layer | Purpose | Key Principle |
|---|---|---|---|
| 1 | spec-hardening | Block execution until underspecified inputs are resolved | Ambiguity is a bug |
| 2 | structured-planning | Machine-readable task DAG reviewed before code begins | No code without a plan |
| 3 | grounding-checkpoints | Smallest verifiable change + mandatory re-grounding against spec | Drift detection prevents scope creep |
| 4 | adversarial-verification | Critic agents attack solutions; no cooperative review | No code ships without adversarial review |
| 5 | automated-observability | Instrumentation is part of definition-of-done | Metrics before merge |
| 6 | persistent-memory | Wiki as knowledge base + claude-obsidian skills | The system must improve over time |
| 7 | schema-orchestration | Archon workflow DAG orchestrates all layers | Structured data flow, not narrative coordination |
| 8 | wiki-query-interface | LLM-native search via claude-obsidian skills (ADR-009) | hot.md → index.md → pages |
- ADR-008: Layer 4 (QA) generates tests from spec ONLY — never from implementation code (black-box testing)
- ADR-009: Layer 6 (Memory) uses claude-obsidian skills in GitHub Mode B — no custom WikiKnowledgeBase class, no Vectra, no embedding model
- No skip rule: Verification is mandatory. Agent confidence is not evidence. Every layer runs for every task.
- Config tunes behavior, not enablement: e.g., which critics to run, grounding interval, coverage target — but layers cannot be disabled
Phased: schemas → memory (L7) → spec hardening (L1) → planner (L2) → execution+grounding (L3) → QA (L4) → critics (L5) → observability (L6) → Archon workflow (L8)
~17,500 tokens overhead per subtask. Typical 5-subtask plan: ~83,500 tokens + coding tokens.
All inter-layer data contracts live in lib/harness-schemas.ts. Config schema in lib/harness-config.ts — tuning only, no enable/disable.