-
Notifications
You must be signed in to change notification settings - Fork 0
Releases: cskwork/supergoal-skill
v0.3.0 - REVIEW-ONLY mode
v0.3.0 - REVIEW-ONLY mode, lean-loop coherence, leaner domain-context
New: REVIEW-ONLY mode (9th mode)
"review / audit this code/diff/PR - no fixes" is now a routed intent (reference/review-only.md):
- Two independent reviewers in parallel:
code-reviewerin findings-only stance (each untested
required behavior becomes a finding naming the missing test - the repo stays untouched) plus
security-reviewer. - Every CRITICAL/HIGH finding is re-checked against the cited code before it enters the report.
- Deliverable: severity-ordered
report.mdwithUntested behaviors:andNot covered:anchors.
Fixes are a new objective - route to DEBUG/LEGACY with the report as input. - New contract test (
tests/review-only-contract.test.sh, 15 checks). Mode grids updated in
SKILL.md, README, README.ko, and the landing page (8 -> 9 modes).
Changed: reference/agents aligned with the lean loop
The baseline-first rewrite had removed the gated ceremony (Validate / Human Feedback / Committee /
Deliver, delivery-gate.sh, claims.md, ten-rules), but many reference and persona files still
spoke that vocabulary. All live files now speak the current loop (Frame -> Build -> Critic -> Fixer
-> Verify):
agents/code-reviewer.mdis now the Critic persona (it previously lacked the Write tool and could
not author the failing tests role-loop requires).agents/executor.mdis now Builder/Fixer with the fixer constraints (never edit test files, no
padding);claims.mdis retired in favor of arun-to-proveline in the run vaultREADME.md.- DEBUG no longer blocks on a "Human Feedback" round-trip: read-only until the root cause is
confirmed by direct evidence, user stops only on hard stops (destructive fix / genuine ambiguity). - The external
diagnose-skill dependency is gone; the trusted feedback-loop requirement is inlined. tests/role-loop-contract.test.shretargeted to thedocs/changelog/<YYYY-MM>/<DD-topic>/vault
(it was failing against the old path).
Changed: reference/domain-context.md compressed 255 -> 142 lines (-44%)
Same contract, fewer tokens: per-file contract folded to one line per file, duplicated loops merged,
all contract-pinned strings and the Domain Brief format preserved.
Verification
11/11 contract suites, 363 assertions pass, 0 fail.
Full changelog: log/changelog-2026年06月10日.md (commits 98c6bf8, 4d99bf5)
Assets 2
v0.2.2
Run-vault doc layout: month-grouped + surfaced-requirements unified
Per-task artifacts now live in one month-grouped run vault instead of a flat date-prefixed folder plus a separate global file.
- Run vault:
docs/changelog/<date>-<slug>/->docs/changelog/<YYYY-MM>/<DD-topic>/(e.g.docs/changelog/2026-06/10-add-auth/). - Surfaced requirements: flat global
docs/surfaced-requirements.md-> per-run.../<DD-topic>/surfaced-requirements.md— one file per run, same lifecycle as the run's other evidence. - QA vault and harness-eval
persist_pathfollow the same layout. - Keeps the
changelog/namespace so run vaults don't pollutedocs/root; existing history left as-is.
Also includes: interview-style checks in LEARN docs.
Files: SKILL.md, reference/{domain-context,role-loop,qa-only}.md, templates/surfaced-requirements.md, templates/qa-gate.sh, qa-only-gate.sh, templates/harness-eval-case.yaml + harness-eval-cases/*.yaml (16).
Assets 2
v0.2.1
Portable /supergoal docs and contract release.
Changed
- Present
/supergoalas a portable agent skill for Claude Code, Codex, agy, and other agent CLIs instead of a Claude Code-only workflow. - Polish the Korean README and landing page copy so the value proposition, workflow, install path, and proof sections read naturally for developers and non-developers.
- Sync the landing page interview metric to the current
<=5clarification-question contract. - Add the difficulty-gate contract: beyond very easy tasks, red-green evidence is required, and DB evidence is required when persisted data is load-bearing.
Verified
tests/*.test.shall pass.README.mdandREADME.ko.mdlocal links pass.docs/index.htmlEN/KO language blocks are balanced at 79/79, with key HTML tag counts balanced.
Assets 2
v0.2.0
Baseline-first supergoal skill: the role-separated critic→fixer→verify loop is the default, plus new recording / DB / aesthetic capabilities, a polished-by-default UI/UX overlay, and a hardened harness-eval methodology backed by an extensive eval sweep.
New
- Surfaced-requirements record. The role-loop critic now writes each implicit requirement it surfaces to
docs/surfaced-requirements.md(what the spec implies / why it's required though unstated / covering test / status); the verifier closes them out. A durable, human-readable trail alongside the failing tests. (reference/role-loop.md,templates/surfaced-requirements.md,tests/role-loop-contract.test.sh) - Self-contained read-only DB access.
templates/db-access/(db-access.mjs+ helper scripts +.env.example) andreference/db-access.md, wired into GREENFIELD/DEBUG/LEGACY/QA-ONLY for optional schema/data evidence. (tests/db-access-contract.test.sh) - Expressive aesthetic families. Selectable UI/UX families — minimalist-ui, high-end-visual-design, industrial-brutalist-ui — in
reference/taste-aesthetics.md, wired throughreference/ui-ux.mdandagents/designer.md.
UI/UX: polished by default
- Expressive/polished is the default for ALL user-facing UI. The overlay no longer classifies a surface into Expressive vs Functional tiers where Functional shipped a plainer result.
reference/taste-skill-v2.mdis now the authority for every user-facing surface, applied from Frame through Build and QA. functional-ui.mddemoted to a density add-on. It is layered ON TOP of the Expressive baseline for dense admin/dashboard surfaces (density discipline + complete UI states), never a reason to lower polish.- Core-principle carve-out. Scope-minimalism governs code surface area and feature count, NOT visual quality — for user-facing UI a polished result is baseline correctness, not padding to defer until asked. (
SKILL.md,reference/ui-ux.md)
Harness-eval methodology (hardened)
- Pick a discriminating regime (spec-completeness ×ばつ baseline strength, not difficulty tier), validate the fixture discriminates (stub fails / reference passes / lazy fails) before spending compute, and a mandatory equal-compute control with a two-claim framing: (a) skill vs one-shot default, (b) mechanism vs equal compute.
- RevFactory corpus + runnable fixtures (
templates/harness-eval-cases/fixtures/, authored under-spec specs). - Eval sweep (codex via headroom, ground-truth scoring): explicit-spec medium/hard/expert all tie baseline vs harness; on an under-specified latent-correctness task the skill catches a prototype-pollution false-GREEN a one-shot ships (useful via forced verification) — but an equal-compute naive loop matches it, so the active ingredient is the extra passes, not the role-separation.
Changed / removed
- Removed the dead HARNESS-MAKE machinery and its test suites.
- All 10 contract suites green.
Assets 2
v0.1.8 - SKILL-MINE mode
7373ad7 v0.1.8 — SKILL-MINE mode
A seventh mode for turning your repeated work into a reusable skill. Trigger it with "make a skill", "스킬 만들어", "learn new skill", "make skill from history", or "이거 자주 하는데 스킬로".
It mines your recent agent session history (~/.claude/projects/*.jsonl, adaptive 7-30 day window), surfaces 3-5 candidate skills ranked by frequency ×ばつ payoff, and lets you pick / reject / name a new one. On your pick it forges ONE cross-agent-portable SKILL.md (the agentskills.io open standard) and installs it. The human pick is a hard gate — it never creates or installs a skill you did not approve (the validation step that Hermes-style autonomous skill creation lacks). It writes no production code and no worktree.
Highlights
- Pipeline: Intake → Window → Mine → Rank → Suggest → Human pick/reject → Forge → Verify → Install → Journal. No worktree, no delivery gate.
- Mechanical miner (
templates/skill-mine/mine.mjs, pure Node, no deps): reads the full tool-call transcripts (not the prompt-onlyhistory.jsonl), picks an adaptive window, and emits frequent Bash command signatures + first-prompt intent clusters + already-used skills. A real-data experiment confirmed raw tool-name n-grams are generic noise; the signal lives in command signatures and intent clusters. - Portable forge: generates a directory +
SKILL.mdto the agentskills.io standard. A frontmatter gate (templates/skill-frontmatter-gate.mjs) enforces the portable limits — name lowercase/≤64/no reserved word,description+when_to_use≤1536 chars, body ≤~5k tokens. - Cross-agent install, no auto-sync: one canonical real dir (e.g.
~/.agents/skills/<name>/) symlinked into each agent (Claude Code / Codex / opencode / Hermes), so one edit lands everywhere. Every run ends by stating where the source skill lives. - Anti over-suggestion gate: surface a candidate only when frequency ≥ minsup AND it plausibly recurs (Horvitz mixed-initiative); rejection is free and ends the run.
Wiring & tests
New files: reference/skill-mine.md, templates/skill-mine/mine.mjs, templates/skill-frontmatter-gate.mjs, templates/skill.md.template, agents/skill-miner.md, agents/skill-forger.md. Wired into SKILL / Step 0 routing / reference map / template scripts; README + landing updated to 7 modes.
Frontmatter gate: 5/5 (valid passes; uppercase name, empty description, combined >1536, missing SKILL.md each fail). Miner verified on real history (7d, 19 sessions, 1290 tool-calls), sparse-history widen-to-30d, and --all (850 sessions). Dogfooded end-to-end — forged 4 real skills (gh-release, sync-skill, refine-skill-terse, update-gh-pages) installed across 4 agents.
Full rationale: log/changelog-2026年06月05日.md.
Full changelog: v0.1.7...v0.1.8
Assets 2
v0.1.7 - QA-ONLY mode
v0.1.7 — QA-ONLY mode
A sixth mode for when you want QA, data verification, or a data comparison and explicitly no code change. Trigger it with "QA only", "검증만", "데이터 정합성 / 데이터 비교", "API 수정 전후 확인", or "A/B".
It exercises an already-running app (and a read-only, DB-independent database) like a user, writes no production code, creates no worktree, and runs none of the implementation gates. It produces a human-friendly report.md (what worked / what didn't / what it discovered) and persists a reusable, indexed QA suite under .domain-agent/qa/ so the same check re-runs fast.
Highlights
- Pipeline: Intake → Target & Access → Scenario checkpoint → Exercise → Cross-check → Report → Persist. No worktree, no delivery gate.
- Browser driver:
agent-browserby default; attach-to-browser (Playwright CLI, attach-to-browser-skill) for authenticated / real logged-in sessions. Theagent-browser doctorpreflight is always recorded. - Read-only, DB-independent DB access: MySQL / PostgreSQL / SQLite via an intelligence skill if installed (e.g. aidt-mysql-cli, postgres-intelligence), else the raw CLI. SELECT-class only — enforced both in the agent prompt and by a gate SQL scan (per-
DB:-line read-only; blocks INSERT/UPDATE/DELETE/MERGE/REPLACE/DDL/GRANT/REVOKE/CALL). - Two isolated subagents:
qa-auditordrives the app,db-readerreads the DB — raw rows/PII never enter the browser agent. Conductor-mediated handoff via a small sanitizedqa/expected.md(which also serves as thebefore-afterbaseline); auth is transient and never written to a file. - Action cap (default 100): each browser interaction and DB query counts as one action; summed across both agents and gate-enforced so a run stays scoped and the user does not get lost in an unbounded crawl.
- New terminal gate
templates/qa-only-gate.sh: report anchors +qa-gate.shbrowser/CLI evidence +action_count <= action_cap+ DB read-only backstop.
Wiring & tests
New files: reference/qa-only.md, reference/db-access.md, agents/qa-auditor.md, agents/db-reader.md, templates/qa-report.md, templates/qa-only-gate.sh, tests/qa-only-contract.test.sh. Wired into SKILL / pipeline / qa / experts / domain-context / index template / vault / state.json / README.
Tests: qa-only-contract 56/0; full contract suite 293/0, no regressions. Dogfooded end-to-end against a live Cloudflare-protected site — the mode correctly tried agent-browser, recorded an honest BLOCKED verdict (no fake pass), recommended attach-to-browser as the remediation, and its gate passed on truthful evidence.
Full rationale: docs/changelog/changelog-2026年06月05日.md.
Full changelog: v0.1.6...v0.1.7
Assets 2
v0.1.6
Seventh release. Focus: a human-facing onboarding handbook for LEARN-DOMAIN, plus wiring so the saved domain knowledge actually reaches the agents that build/debug/extend code. After the agent learns a codebase, the new Onboard step renders the grounded .domain-agent/ knowledge into one self-contained HTML page for fast human onboarding - what the domain is, key terms, architecture, flows, and the rules that must not break - styled to the Functional UI tier. And the ## Domain Brief is now a first-class dispatch input, so GREENFIELD/DEBUG/LEGACY agents receive it instead of it living only in conductor-facing docs. Plus README/docs surfacing and a repo-root cleanup.
LEARN-DOMAIN Onboard step (new)
- New terminal step in the LEARN-DOMAIN pipeline:
Intake -> Survey -> Scope checkpoint -> Map -> Deepen -> Ground -> Persist -> **Onboard** -> Freshness. - Renders the grounded pack into one self-contained
<knowledgePath>/onboarding.html(default.domain-agent/onboarding.html, gitignored) for humans only. The markdown pack stays the agent's source of truth; the HTML is a derived snapshot, regenerated on every Freshness run. - Seven sections, plain summary first then expert detail in
<details>: Orientation (mental model), Key terms (glossary in this repo's meaning), Architecture (systems/contexts/entry points + one inline diagram), Key flows (perflows/*.md), Rules that must not break (invariants with grounding status), Get hands-on (test-map.mdcommands), and Trust & freshness (verified/unverified legend +lastUpdated). - Grounding discipline carries over: rendered from the pack only (no new facts; never upgrade
unverifiedto verified), and every load-bearing fact shows a visible verified/unverified badge so a reader is never misled. - Functional tier (
reference/functional-ui.md), not the Expressive taste tier: one accent + one type/spacing/radius scale, computed WCAG-AA contrast (body AAA), information density, minimal motion honoringprefers-reduced-motion, declaredcolor-schemewith light+dark tokens, no empty decoration. - Self-contained and offline: inline CSS only, no external scripts/fonts/CDN/network requests, inline SVG for diagrams (zero added security surface). This constraint overrides functional-ui's "adopt one named design system" rule, so the baseline is implemented with a small inline token set instead of Fluent/Carbon/shadcn.
- LEARN-DOMAIN runs no implementation gates, so the render deliberately does not pull the product Designer's
claims.md/ QA contrast gate / committee apparatus; the agent self-applies the functional-ui baseline. Dispatch isexplore(reads the persisted pack, writes the doc). - New
templates/domain-onboarding.htmlskeleton (token set, the seven sections with source-mapped<!-- FILL ... -->markers, ToC, badge legend,<details>progressive disclosure, light/dark). Wired intoSKILL.md(pipeline row, LEARN-vs-LEARN-DOMAIN note, template-script table) andreference/learn-domain.md(Step 7 + dispatch row + stop condition).tests/learn-domain-contract.test.shgrows from 17 to 29 assertions.
Domain Brief reaches the dispatched agents (new)
- Closes a consumption gap: the saved
.domain-agent/knowledge was described only in conductor-facing docs - no persona and not the locked-prompt template referenced the pack or the Brief, anddebugger/exploredid not even have it in their READ scope. Domain context could be built and still never reach the agent doing the work. reference/experts.md: adds aDOMAIN BRIEFslot to the locked-prompt template (routing index only; verify against current code, current code wins; never bulk-read the pack) and a dispatch step that passes the Brief to the architect (GREENFIELD Plan), debugger/tracer (DEBUG Reproduce/Diagnose), and explorer (LEGACY Explore).agents/explore.md+agents/debugger.md: add the run's## Domain Briefto READ scope plus a one-line rule to route by it but verify load-bearing facts against current code.agents/architect.md: names the Brief explicitly (was a vague "selected domain/architecture docs").executoris unchanged - it builds from the frozenplan.md.- Passes the capped, current-code-verified Brief (<=5 files, <=80 lines), never the raw pack path, so the retrieval cap / current-code-wins / no-bulk-read discipline holds. The human
onboarding.htmlis never an agent input.tests/domain-context-contract.test.shgrows from 30 to 37 assertions.
Other
docs: add LEARN-DOMAIN mode + Onboard handbook to README- the Modes table listed only four modes (prose said "three modes") even though LEARN-DOMAIN shipped in v0.1.5; adds the LEARN-DOMAIN row (pipeline ends in Onboard (human handbook)), a usage example, and listsinterview+learn-domainin the Layout summary.chore: keep root to SKILL/README/LICENSE; move DESIGN + preference files- repo root trimmed toSKILL.md,README.md,LICENSE;DESIGN.md->docs/DESIGN.md,USER_PREFERENCE.template.mdand the git-ignoredUSER_PREFERENCE.md->learn/.
Verification
Full contract suite: domain-context 37/0, gate-scenarios 100/0, interview-contract 26/0, learn-contract 11/0, learn-domain-contract 29/0, ui-ux 17/0, worktree 17/0 = 237 passed, 0 failed. No regressions. Template confirmed self-contained (a grep for external scripts/links/CDN/url() matches only the in-comment instruction forbidding them).
Included changes
feat: pass the Domain Brief to dispatched agents (close consumption gap)docs: add LEARN-DOMAIN mode + Onboard handbook to READMEfeat: complete LEARN-DOMAIN Onboard step (Step 7 spec, Functional tier, tests)feat: wire LEARN-DOMAIN Onboard step (SKILL spine + handbook template)chore: keep root to SKILL/README/LICENSE; move DESIGN + preference files
Full changelog: v0.1.5...v0.1.6
Assets 2
v0.1.5
Sixth release. Focus: an ambiguity-gated clarifying interview before plan freeze, a new LEARN-DOMAIN mode that learns a codebase for the agent with execution-grounded verification, distributed-systems DEBUG hardening, user-language output, and Windows (LF) compatibility.
Clarifying interview before plan freeze (new)
reference/interview.md(new): a conditional, ambiguity-gated clarifying interview inserted after context-gathering and before plan freeze for GREENFIELD, DEBUG, and LEGACY (LEARN / LEARN-DOMAIN exempt).- Fires only on genuine ambiguity (multiple plausible interpretations or an unclear load-bearing detail); skips when clear or a cheap code read answers it, and logs the skip.
- Code-first: code-answerable questions are resolved by reading current docs/code (reuses
plan-grounding.md); only user-only load-bearing choices reach the user. - Capped at 3-5 high-leverage questions, one round, one at a time, each with a recommended answer; drawn from a six-dimension menu (objective, definition-of-done, scope, constraints, environment, safety/reversibility) and selected by information gain.
- Hard gate: blocks plan freeze until must-have answers or a user-approved assumption. DEBUG variant presents 3-5 ranked root-cause hypotheses for re-ranking at end of Diagnose (non-blocking; proceeds on its own ranking if AFK).
- Backed by a fact-checked deep-research pass (Ambig-SWE/CMU, Active Task Disambiguation/ICLR 2025, SAGE-Agent, Ask-before-Plan, ClarEval). Procedural step recorded in
plan.md## Interview; no new machine gate. - Wired into
SKILL.md,reference/pipeline.md,reference/debugging.md; newtests/interview-contract.test.sh(26 assertions).
LEARN-DOMAIN mode (new)
- A fifth mode that learns a large/cryptic codebase for the agent and persists a source-grounded
.domain-agent/wiki (distinct from LEARN, which teaches a human). - Pipeline:
Intake -> Survey -> Scope checkpoint -> Map -> Deepen -> Ground -> Persist -> Freshness. Writes no production code; its only writes are the knowledge pack plus throwaway sandbox probes. reference/learn-domain.md(new) encodes six research-backed choices: agentic discovery (not embeddings/RAG), markdown-first persistence (Aider repo-map pattern), bottom-up symbol->repo hierarchy, optional structural index only (cache, never required), balanced retrieval budget, and execution-grounded verification.- New gate
templates/learn-grounding-gate.mjs: every populated invariant/flow must carry aGrounding: verified|unverifiedmarker,index.mdmust name a concrete entry point, and a high-precision secret scan must pass. Facts that cannot be executed are markedunverified, never faked. - Supporting edits to
SKILL.md,templates/domain-agent/code-map.md(Aider-style key-symbol signatures),templates/domain-agent/invariants.md,templates/domain-agent/flows/README.md; newtests/learn-domain-contract.test.sh(17 assertions) plus a 7-case grounding-gate scenario.
DEBUG hardening (distributed triage, F->P repro, evidence ledger)
- Six senior-engineer debugging disciplines encoded into
reference/debugging.md, sourced from Google SRE Book/Workbook, OpenTelemetry, W3C Trace Context, AWS Builders' Library, and SWE-bench-line agent papers (Agentless, SWE-Adept, SWT-Bench):- Golden-signal triage + symptom-vs-cause split (chase only definite, imminent causes).
- Correlation-ID propagation as a precondition for cross-service RCA.
- Known-good vs known-bad differential.
- Hypothesis ledger with evidence on both sides ("definite & imminent?"); only a confirmed cause advances.
- Reproduce-first as a fail-to-pass (F->P) gate; flaky/timing bugs must fail consistently over N runs.
- Context isolation + minimal-diff checkpointing + breadth-based escalation (single-driver stays the default).
- Adds a microservice failure-pattern checklist (cascading overload, retry storms, missing deadline propagation, partial-failure bimodal latency). Supporting edits in
reference/pipeline.mdandreference/vault.md.
Output language follows the user
- Agent-authored prose (vault
README.md/brief.md/plan.md/claims.md/verification.md, both Human Feedback briefs, run notes, changelog, LEARN journals, returned summaries) is written in the user's language, defaulting to English when unknown. - Machine-checked anchors, structural keys, code, identifiers, file paths, shell commands, and commit messages stay verbatim English so the gates' literal greps keep matching. Applied in
SKILL.md,reference/experts.md,reference/vault.md.
Windows compatibility (LF)
.gitattributes(new) forces LF (eol=lf) on*.sh,*.mjs,*.js,*.md,*.json, and tracked scripts were re-checked-out to LF. A Windows checkout withcore.autocrlf=truehad materialized the gate/test scripts as CRLF, which bash refused to parse ($'\r': command not found) - the real blocker to running the suite on Windows. Target support is bash-on-Windows (Git Bash or WSL); run the suite under WSL bash.
Other
chore: Require explicit merge-commit for integrations- stricter integration policy (explicit merge commit required);tests/worktree-contract.test.shre-anchored to the new wording.
Verification
Full suite under WSL: interview-contract 26/0, gate-scenarios 100/0, domain-context 30/0, worktree 17/0, ui-ux 17/0, learn-domain 17/0, learn 11/0 = 218 passed, 0 failed. No regressions.
Included changes
feat: add ambiguity-gated clarifying interview before plan freezefeat: add LEARN-DOMAIN mode with agentic-discovery wiki and grounding gatefeat: agent-authored docs follow the user's languagefeat: harden DEBUG with distributed triage, F->P repro, evidence ledger; add Windows LF supportchore: Require explicit merge-commit for integrations
Full changelog: v0.1.4...v0.1.5
Assets 2
v0.1.4
Fifth release. Focus: a new Functional UI design tier so dashboards, admin panels, and internal tools get design discipline (not just landing pages), a repo-local Domain Context overlay, LEARN-mode atomic-decomposition teaching, worktree-isolation hardening, and a round of gate hardening that turns four prose-only contract claims into machine-checked gates.
Functional UI design tier (new)
reference/functional-ui.md(new): a lightweight design authority for dashboards, data tables, admin panels, internal tools, settings, wizards, and CRUD forms. Enforces a real design system (Carbon/Fluent/Material/Polaris/Atlaskit/Radix/shadcn), computed WCAG contrast, consistent accent/type/spacing/radius tokens, complete UI states (loading/empty/error/focus/...), density-first dials, and minimal motion - without the marketing-only rules.reference/ui-ux.md: now a surface -> tier dispatcher. Expressive (landing/marketing) routes totaste-skill-v2; Functional (dashboard/admin) routes tofunctional-ui. Dashboards were previously skipped entirely, so neither the Designer agent nor the contrast gate ran on them.agents/designer.md: tier-aware authority. Universal bans (one accent, computed contrast, declared color-scheme, no empty decoration) are marked with (*) and apply to both tiers; Functional adds its own bans (design-system-not-hand-rolled, table empty/loading/error states, density-matches-dial, low motion).
Gate hardening (prose contracts -> machine gates)
- Contrast is now gate-enforced.
templates/qa-gate.shdetects aUI-tier:line (or aqa/contrast-pairs.json) and runscontrast-gate.mjs, failing QA - and, via the delivery backstop, Deliver - on a missing pair list or any sub-threshold pair. The "contrast blocks Deliver" guarantee was previously prose with no wired gate. - Committee approval + plan freeze enforced at Deliver.
templates/delivery-gate.shnow requires aCommittee:line inverification.mdwith architect + security + code-review all APPROVED (no reject/changes-requested), and recomputes a CR-stripped sha256 ofplan.mdto matchstate.json.plan_hash(with aRE-PLAN:escape inREADME.md). Both were declared in the contract but checked nowhere. - Cycle bound.
templates/cycle-bound.mjs(new) trips atmax_cycles_per_phase(default 5) regardless of error identity - the runaway-retry case the identical-signature circuit breaker never caught. reference/experts.md: the Designer dispatch row was stale (pointed only attaste-skill-v2, mis-routing Functional surfaces); now tier-routed viaui-ux.md.tests/ui-ux-contract.test.sh(new): 17-assertion contract test guarding the tier split. Suite is now 167 green (gate-scenarios 74 -> 92).
Domain Context overlay (new)
reference/domain-context.md+templates/domain-agent/: a compact repo-local## Domain Briefbuilt from a.domain-agent/knowledge pack, loaded at GREENFIELD Plan, DEBUG Reproduce/Diagnose, and LEGACY Explore. Missing pack prompts for a path and gitignores it.
LEARN mode
- Human-to-code bridge, atomic concept decomposition guidance, enforced atom map + process traces in explanations, and naturalized Korean term labels (핵심 용어 / 구성 요소 instead of literal 원자).
Worktree isolation hardening
- Verify both base and target branch refs exist in the target repo before
git worktree add; ask for corrected names instead of substituting nearby branches. - Retention contract: keep the three most recent completed run worktrees; prune only the oldest repo-managed one, never the active run, original checkout, or manual worktrees.
Plan grounding
- Added a standalone plan-grounding pressure test.
Included changes
feat: enforce committee, plan-freeze, contrast, cycle-bound; finish UI/UX tier splitfeat: add Functional UI design tier for dashboard/admin surfacesdocs: naturalize LEARN Korean term labelsdocs: enforce LEARN atom process tracesfix: Verify branch refs before worktree creationUpdate supergoal worktree retention contractdocs: add standalone plan grounding pressure testAdd domain context overlay to supergoaldocs: Add atomic concept decomposition guidancedocs: add human-to-code bridge to learn mode
Full changelog: v0.1.3...v0.1.4
Assets 2
v0.1.3
Fourth release. Focus: QA and the "contrast is computed" UI/UX rule are now machine-enforceable literal gates, runs are isolated in a git worktree, and browser QA requires an agent-browser preflight instead of silently falling back to headless Chrome.
QA + UI/UX gates (now machine-checkable)
templates/qa-gate.sh(new): browser apps must produceqa/as-is-*+qa/to-be-*evidence and a## QATool:line; any non-agent-browser driver needs aFallback:justification, blocking silent headless-Chrome fallback. Wired into the GREENFIELD/LEGACY QA exit gate.templates/contrast-gate.mjs(new): the agent enumerates text/bg pairs intoqa/contrast-pairs.json; the script computes WCAG ratios and judges them (body AAA, other text AA, large >=3) so contrast can't be eyeballed.templates/delivery-gate.sh: if the vault has aqa/dir, delivery re-runsqa-gate.shas a final backstop (CLI/DEBUG runs have noqa/dir, unaffected).- Gate test suite grew 51 -> 73 cases, all green.
Browser QA preflight
- The agent-browser-first sequence is now two steps (
npm i -g+agent-browser installfor the Chrome binary). - A missing browser binary is explicitly NOT "install impossible"; browser QA is gated on the preflight succeeding rather than falling back silently.
Worktree isolation
- supergoal runs are now required to execute inside an isolated git worktree.
Docs
- Compressed supergoal skill references and refreshed the landing page.
Included changes
feat: make QA + UI/UX contrast machine-enforceable gatesfix: require agent-browser CLI preflight for QAfix: gate browser QA on agent-browser preflightRequire worktree isolation for supergoal runsdocs: compress supergoal skill referencesdocs: refresh supergoal landing page
Full changelog: v0.1.2...v0.1.3