-
Notifications
You must be signed in to change notification settings - Fork 1
harness implementation plan
title: Harness Implementation Plan category: projects tags: [harness, ultimate-pi, implementation, architecture] status: developing created: 2026年04月28日 updated: 2026年04月28日 sources:
- "harness-implementation-plan" related:
- "agentic-harness"
- "spec-hardening"
- "structured-planning"
- "grounding-checkpoints"
- "adversarial-verification"
- "persistent-memory"
- "automated-observability" path: "ultimate-pi" purpose: "Full implementation plan for the 8-layer agentic harness" summary: Project page for the ultimate-pi agentic harness implementation. Covers 9 build phases, shared schemas, key ADRs, token budgets, and risk surface. provenance: extracted: 0.9 inferred: 0.1 ambiguous: 0.0
The agentic harness is an 8-layer mandatory pipeline for the ultimate-pi project. Every task flows through all 8 layers. Build order: schemas → memory → spec hardening → planner → execution+grounding → QA → critics → observability → Archon workflow.
For the full concept, see agentic-harness.
| Phase | What | Files |
|---|---|---|
| 0 | Foundation |
lib/harness-schemas.ts, lib/harness-config.ts, .pi/harness.example.json
|
| 1 | Memory (L6) | Install claude-obsidian skills, scaffold vault Mode B, extensions/harness-knowledge-base.ts
|
| 2 | Spec Hardening (L1) |
lib/harness-spec.ts, extensions/harness-spec.ts
|
| 3 | Planning (L2) |
lib/harness-planner.ts, extensions/harness-planner.ts
|
| 4 | Execution+Grounding (L3) |
lib/harness-executor.ts, extensions/harness-executor.ts
|
| 5 | Automated QA (L4) |
lib/harness-qa.ts, extensions/harness-qa.ts
|
| 6 | Critics (L5) |
lib/harness-critics.ts, extensions/harness-critics.ts
|
| 7 | Observability (L6) |
lib/harness-observability.ts, extensions/harness-observability.ts
|
| 8 | Archon Integration (L7) |
.archon/workflows/*.yaml, .archon/commands/harness.md
|
| 9 | Package integration | Update package.json, README, PLAN.md |
- ADR-008 (Black-Box QA): Tests from spec only, never from implementation — see adversarial-verification
- ADR-009 (claude-obsidian Mode B): Replaces Vectra + embeddings with LLM-native search — see persistent-memory
All inter-layer data contracts in lib/harness-schemas.ts:
-
HardenedSpec— Layer 1 output (success criteria, anti-criteria, ambiguities, scope) -
ExecutionPlan/PlanNode— Layer 2 output (DAG with dependencies, risk, verification) -
GroundingCheckpoint— Layer 3 checkpoints (spec version, drift, state hash) -
SubtaskResult— Layer 3 output -
TestSuite/QAResult— Layer 4 output (spec-only test cases, coverage, verdict) -
CriticReview/CriticFailure— Layer 5 output (verdict, failures, severity) -
ObservabilitySpec— Layer 6 output (metrics, health checks, alerts) -
WikiEntryType— Layer 7+8 wiki categories -
AgentCapability/TaskRouting— Layer 7 orchestration
lib/harness-config.ts — all tunable parameters. No enable/disable for layers. The harness is always on when installed.
Default config in .pi/harness.json.
| Layer | Tokens/subtask |
|---|---|
| Spec hardening | ~2,000 |
| Planning + review | ~5,000 |
| Grounding checkpoints | ~500 |
| Automated QA (test gen + run) | ~3,500 |
| Critics (2 focus areas) | ~4,000 |
| Observability | ~1,500 |
| Memory writes | ~500 (standard) / ~1,500 (deep) |
| Total per subtask | ~17,500 |
Typical 5-subtask plan: ~83,500 tokens overhead + coding tokens.
Major risks and mitigations:
-
AI hallucinates ambiguity →
max_ambiguity_retriescap; human force-approve viaapprove-spec - Invalid plan DAG → Automated cycle detection + validation
- QA tests don't compile → Fallback to schema-only validation if runner fails
-
QA test writer leaks implementation → Architectural enforcement: prompt never includes implementation;
spec_onlyis immutable -
Critic false positives →
conditional_passdoesn't block;max_attack_roundscaps rework -
Memory grows unbounded →
max_entriesconfig; wiki-lint detects orphans; failures never evicted - Archon runtime dependency → Harness normalizes terminal states; compensation/resume in own state machine
- claude-obsidian is external → Pin skill versions; skills are ~50KB text, vendorable
- LLM-native search cost → hot.md short-circuits most queries at ~500 tokens; deep queries optional
Each phase complete when:
- TypeScript compiles (
npm run check:ts) - Extension loads in pi.dev
- Unit tests pass for core logic
- Integration test: minimal task through the layer in isolation
- Phase 5: generated tests pass; spec-only constraint verified
- Phase 8:
archon run harness-pipelinecompletes end-to-end - Workflow status round-trips through Archon JSON