harness implementation plan

Aryan Iyappan edited this page Apr 28, 2026 · 2 revisions

title: Harness Implementation Plan category: projects tags: [harness, ultimate-pi, implementation, architecture] status: developing created: 2026年04月28日 updated: 2026年04月28日 sources:

"harness-implementation-plan" related:
"agentic-harness"
"spec-hardening"
"structured-planning"
"grounding-checkpoints"
"adversarial-verification"
"persistent-memory"
"automated-observability" path: "ultimate-pi" purpose: "Full implementation plan for the 8-layer agentic harness" summary: Project page for the ultimate-pi agentic harness implementation. Covers 9 build phases, shared schemas, key ADRs, token budgets, and risk surface. provenance: extracted: 0.9 inferred: 0.1 ambiguous: 0.0

Harness Implementation Plan

Overview

The agentic harness is an 8-layer mandatory pipeline for the ultimate-pi project. Every task flows through all 8 layers. Build order: schemas → memory → spec hardening → planner → execution+grounding → QA → critics → observability → Archon workflow.

For the full concept, see agentic-harness.

Build Phases

Phase	What	Files
0	Foundation	`lib/harness-schemas.ts`, `lib/harness-config.ts`, `.pi/harness.example.json`
1	Memory (L6)	Install claude-obsidian skills, scaffold vault Mode B, `extensions/harness-knowledge-base.ts`
2	Spec Hardening (L1)	`lib/harness-spec.ts`, `extensions/harness-spec.ts`
3	Planning (L2)	`lib/harness-planner.ts`, `extensions/harness-planner.ts`
4	Execution+Grounding (L3)	`lib/harness-executor.ts`, `extensions/harness-executor.ts`
5	Automated QA (L4)	`lib/harness-qa.ts`, `extensions/harness-qa.ts`
6	Critics (L5)	`lib/harness-critics.ts`, `extensions/harness-critics.ts`
7	Observability (L6)	`lib/harness-observability.ts`, `extensions/harness-observability.ts`
8	Archon Integration (L7)	`.archon/workflows/*.yaml`, `.archon/commands/harness.md`
9	Package integration	Update package.json, README, PLAN.md

Key Architecture Decisions

ADR-008 (Black-Box QA): Tests from spec only, never from implementation — see adversarial-verification
ADR-009 (claude-obsidian Mode B): Replaces Vectra + embeddings with LLM-native search — see persistent-memory

Shared Schemas

All inter-layer data contracts in lib/harness-schemas.ts:

HardenedSpec — Layer 1 output (success criteria, anti-criteria, ambiguities, scope)
ExecutionPlan / PlanNode — Layer 2 output (DAG with dependencies, risk, verification)
GroundingCheckpoint — Layer 3 checkpoints (spec version, drift, state hash)
SubtaskResult — Layer 3 output
TestSuite / QAResult — Layer 4 output (spec-only test cases, coverage, verdict)
CriticReview / CriticFailure — Layer 5 output (verdict, failures, severity)
ObservabilitySpec — Layer 6 output (metrics, health checks, alerts)
WikiEntryType — Layer 7+8 wiki categories
AgentCapability / TaskRouting — Layer 7 orchestration

Config Schema

lib/harness-config.ts — all tunable parameters. No enable/disable for layers. The harness is always on when installed.

Default config in .pi/harness.json.

Token Budget

Layer	Tokens/subtask
Spec hardening	~2,000
Planning + review	~5,000
Grounding checkpoints	~500
Automated QA (test gen + run)	~3,500
Critics (2 focus areas)	~4,000
Observability	~1,500
Memory writes	~500 (standard) / ~1,500 (deep)
Total per subtask	~17,500

Typical 5-subtask plan: ~83,500 tokens overhead + coding tokens.

Risk Surface

Major risks and mitigations:

AI hallucinates ambiguity → max_ambiguity_retries cap; human force-approve via approve-spec
Invalid plan DAG → Automated cycle detection + validation
QA tests don't compile → Fallback to schema-only validation if runner fails
QA test writer leaks implementation → Architectural enforcement: prompt never includes implementation; spec_only is immutable
Critic false positives → conditional_pass doesn't block; max_attack_rounds caps rework
Memory grows unbounded → max_entries config; wiki-lint detects orphans; failures never evicted
Archon runtime dependency → Harness normalizes terminal states; compensation/resume in own state machine
claude-obsidian is external → Pin skill versions; skills are ~50KB text, vendorable
LLM-native search cost → hot.md short-circuits most queries at ~500 tokens; deep queries optional

Verification Criteria

Each phase complete when:

TypeScript compiles (npm run check:ts)
Extension loads in pi.dev
Unit tests pass for core logic
Integration test: minimal task through the layer in isolation
Phase 5: generated tests pass; spec-only constraint verified
Phase 8: archon run harness-pipeline completes end-to-end
Workflow status round-trips through Archon JSON

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harness implementation plan

Harness Implementation Plan

Overview

Build Phases

Key Architecture Decisions

Shared Schemas

Config Schema

Token Budget

Risk Surface

Verification Criteria

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally