Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

harness implementation plan

Aryan Iyappan edited this page Apr 28, 2026 · 2 revisions

title: Harness Implementation Plan category: projects tags: [harness, ultimate-pi, implementation, architecture] status: developing created: 2026年04月28日 updated: 2026年04月28日 sources:


Harness Implementation Plan

Overview

The agentic harness is an 8-layer mandatory pipeline for the ultimate-pi project. Every task flows through all 8 layers. Build order: schemas → memory → spec hardening → planner → execution+grounding → QA → critics → observability → Archon workflow.

For the full concept, see agentic-harness.

Build Phases

Phase What Files
0 Foundation lib/harness-schemas.ts, lib/harness-config.ts, .pi/harness.example.json
1 Memory (L6) Install claude-obsidian skills, scaffold vault Mode B, extensions/harness-knowledge-base.ts
2 Spec Hardening (L1) lib/harness-spec.ts, extensions/harness-spec.ts
3 Planning (L2) lib/harness-planner.ts, extensions/harness-planner.ts
4 Execution+Grounding (L3) lib/harness-executor.ts, extensions/harness-executor.ts
5 Automated QA (L4) lib/harness-qa.ts, extensions/harness-qa.ts
6 Critics (L5) lib/harness-critics.ts, extensions/harness-critics.ts
7 Observability (L6) lib/harness-observability.ts, extensions/harness-observability.ts
8 Archon Integration (L7) .archon/workflows/*.yaml, .archon/commands/harness.md
9 Package integration Update package.json, README, PLAN.md

Key Architecture Decisions

  • ADR-008 (Black-Box QA): Tests from spec only, never from implementation — see adversarial-verification
  • ADR-009 (claude-obsidian Mode B): Replaces Vectra + embeddings with LLM-native search — see persistent-memory

Shared Schemas

All inter-layer data contracts in lib/harness-schemas.ts:

  • HardenedSpec — Layer 1 output (success criteria, anti-criteria, ambiguities, scope)
  • ExecutionPlan / PlanNode — Layer 2 output (DAG with dependencies, risk, verification)
  • GroundingCheckpoint — Layer 3 checkpoints (spec version, drift, state hash)
  • SubtaskResult — Layer 3 output
  • TestSuite / QAResult — Layer 4 output (spec-only test cases, coverage, verdict)
  • CriticReview / CriticFailure — Layer 5 output (verdict, failures, severity)
  • ObservabilitySpec — Layer 6 output (metrics, health checks, alerts)
  • WikiEntryType — Layer 7+8 wiki categories
  • AgentCapability / TaskRouting — Layer 7 orchestration

Config Schema

lib/harness-config.ts — all tunable parameters. No enable/disable for layers. The harness is always on when installed.

Default config in .pi/harness.json.

Token Budget

Layer Tokens/subtask
Spec hardening ~2,000
Planning + review ~5,000
Grounding checkpoints ~500
Automated QA (test gen + run) ~3,500
Critics (2 focus areas) ~4,000
Observability ~1,500
Memory writes ~500 (standard) / ~1,500 (deep)
Total per subtask ~17,500

Typical 5-subtask plan: ~83,500 tokens overhead + coding tokens.

Risk Surface

Major risks and mitigations:

  • AI hallucinates ambiguitymax_ambiguity_retries cap; human force-approve via approve-spec
  • Invalid plan DAG → Automated cycle detection + validation
  • QA tests don't compile → Fallback to schema-only validation if runner fails
  • QA test writer leaks implementation → Architectural enforcement: prompt never includes implementation; spec_only is immutable
  • Critic false positivesconditional_pass doesn't block; max_attack_rounds caps rework
  • Memory grows unboundedmax_entries config; wiki-lint detects orphans; failures never evicted
  • Archon runtime dependency → Harness normalizes terminal states; compensation/resume in own state machine
  • claude-obsidian is external → Pin skill versions; skills are ~50KB text, vendorable
  • LLM-native search cost → hot.md short-circuits most queries at ~500 tokens; deep queries optional

Verification Criteria

Each phase complete when:

  1. TypeScript compiles (npm run check:ts)
  2. Extension loads in pi.dev
  3. Unit tests pass for core logic
  4. Integration test: minimal task through the layer in isolation
  5. Phase 5: generated tests pass; spec-only constraint verified
  6. Phase 8: archon run harness-pipeline completes end-to-end
  7. Workflow status round-trips through Archon JSON

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /