-
Notifications
You must be signed in to change notification settings - Fork 1
adversarial verification
title: Adversarial Verification category: concepts tags: [harness, critics, testing, layer-4, quality] status: developing created: 2026年04月28日 updated: 2026年04月28日 sources:
- "harness-implementation-plan" related:
- "agentic-harness"
- "spec-hardening"
- "agentic-harness" layer: "Layer_4" summary: Layer 4 runs adversarial critic agents against every subtask. Critics attack, not review. Multiple critics with different focus areas (correctness, security, performance, spec_compliance). A single critical failure blocks the subtask. provenance: extracted: 0.85 inferred: 0.15 ambiguous: 0.0
Human code review is partly social and partly quality control. For agents, only quality control matters — and it should be much stronger. The critic agent's explicit role is to find failures. Multiple independent critics with different focus areas are better than one. This layer is mandatory. No code ships without passing adversarial review.
| Focus | Attack Vector | On by default |
|---|---|---|
| Correctness | Logic errors, off-by-one, unhandled edges, null derefs, type mismatches | Yes |
| Security | Injection, auth bypass, data exposure, missing input validation | No |
| Performance | N+1 queries, unbounded loops, memory leaks, blocking async | No |
| Spec compliance | Missing functionality, anti-criteria violations, scope creep | Yes |
Security and performance critics are added for risk-sensitive changes (configurable via focus_areas).
| Verdict | Meaning | Action |
|---|---|---|
pass |
No failures found | Proceed to automated-observability |
fail |
At least one critical/major failure | Rework subtask |
conditional_pass |
Only minor issues | Proceed with logged caveats |
| Severity | Blocks? | Description |
|---|---|---|
critical |
Yes | Incorrect behavior, data loss, security breach |
major |
Yes | Significant defect, must fix before proceeding |
minor |
No | Style/cosmetic or unlikely edge case |
Failed subtask → rework → re-review. Up to max_attack_rounds (default: 2). If exhausted, the plan is blocked and escalated to human.
All critics use adversarial prompts: "Your ONLY job is to find failures. Do NOT suggest improvements. Do NOT be constructive. ATTACK."
Each critic receives the diff, task description, and relevant criteria. Output is structured JSON with verdict, failures (severity, description, evidence, remediation).
| Type | Name | Description |
|---|---|---|
| Event consumed | subtask_completed |
Begin critic review |
| Event emitted | subtask_verified |
All critics pass |
| Event emitted | subtask_failed |
Critical/major failure found |
| Tool | run-critics |
Run all/specified critics |
| Tool | run-critic-focus |
Run single focus-area critic |
| Command | /harness-critic-status |
Recent reviews, pass/fail counts |
{
"critics": {
"focus_areas": ["correctness", "spec_compliance"],
"max_attack_rounds": 2
}
}-
lib/harness-critics.ts— CriticAgent class, adversarial prompt templates -
extensions/harness-critics.ts— Extension with review routing