-
Notifications
You must be signed in to change notification settings - Fork 0
Releases: memorysaver/agentic-engineering-patterns
v2.3.0 — BDD layer-gate e2e skill + coverage-checked gates + per-project policy
7bd2a40 [2.3.0] - 2026年06月25日
Promote the downstream BDD layer-gate E2E pattern into AEP and make project
setup idempotent. The e2e-test skill a project gets is now a natural-language
journey library (Given/When/Then/Verify), tool-agnostic, with each journey
mapped to a layer gate so quality accrues layer by layer. Each gate is
two-phase and coverage-checked — scripted_passed (framework tests green) →
passed (all applicable tiers green and every layer acceptance criterion proven by
a test); /aep-build auto-authors any missing scenario to close a coverage gap
and /aep-wrap asks the human before advancing. It ships in
canonical cross-tool form (real skills/e2e-test/ + .claude/skills /
.agents/skills symlinks) so Claude Code, Codex, and Pi all see one copy. Which
browser/device tool drives the UI is resolved by a separate tool-selection.md
across four tracks (agent-browser/playwright/codex web set, webwright, agent-device
mobile, desktop/computer-use). Also retires /aep-testing-guide (its content is
redistributed, so no capability is lost) and the legacy sync.sh /
sync-downstream.sh / migrate-downstream-layout.sh scripts (canonical install is
the skills CLI). Downstreams pick this up on their next deliberate re-pin.
Added
/aep-e2e-skill-scaffoldingskill (project-setup): generates/upgrades a
project'se2e-testskill into the BDD layer-gate three-tier shape (scripted
gates / journey dogfood / API drivers), in canonical cross-tool placement. Ships
templates (e2e-test.SKILL.md,journeys/README.md,00-walking-skeleton.md,
tool-selection.md,seed.sh) and references (bdd-journeys.md,
three-tier-model.md,layer-gate-loop.md). Idempotent; migrates a legacy
.claude/skills/e2e-testreal dir intoskills/and never overwrites hand-written
journeys. Registered inmarketplace.jsonproject-setupplugin.scaffold/references/workspace-hook.md: the workspace-setup hook contract +
template (salvaged from the removed testing-guide Part 1).- Coverage-checked layer gates: the generated
e2e-testskill ships a
layer-gate-evidence.template.md(acceptance-traceability + scripted-coverage
matrices + dogfood checklist + waivers), and journeys carry acovers:front-matter
field tying each to the acceptance criteria it proves. "Coverage" is
acceptance/requirements coverage — every layer criterion proven by ≥1 test — not a
line/branch %. - Per-project E2E policy (
policy.mdin the generated skill): single source of
truth for applicable tiers, dogfood target (none/local/deployed:<url>),
and journey timing (pre-merge/post-deploy). The generator proposes from the
stack and confirms with the user;/aep-buildand/aep-wrapread it — a
CLI/noneproject gets no journey tier (and notool-selection.md), while a
pre-release app can dogfood post-deploy against a deployed (e.g. Cloudflare) URL.
Skill-managed, never silently overwritten, and noAGENTS.mdcopy — the skill is
canonical cross-tool, so every runtime reads the same file.
Changed
e2e_tool(target_type)(patterns/executor/references/dogfood-validation.md):
generalizes the web-onlydogfood_method()over web / mobile / desktop targets,
adding webwright (web) and agent-device (mobile) plus per-target health
probes and atopology.routing.e2e.tool.*pin.dogfood_method()is kept as a
:= e2e_tool('web')wrapper, so/aep-buildPhase 6 and the post-merge guard are
unchanged. The generatedtool-selection.mdis a self-contained projection of it./aep-scaffold: Phase 8 now delegates to/aep-e2e-skill-scaffolding; the
existing-project path is an idempotent audit → confirm → converge flow that
repairs drift (canonical skills layout, e2e-test shape, infra) and recommends
(never auto-runs) the skills-CLI version re-pin./aep-buildPhases 6–8: reference BDD journeys +e2e_tool(target_type)and
record the layer-gate evidence, instead of generating one-off bash<feature>-e2e.sh.
Phase 6 now computes the layer's coverage matrix and auto-authors missing tests to
close gaps; Phase 8 replays prior-layer journeys (regression) and checks coverage.layer_gatesschema (product-context-schema.yaml+ the generated gate loop):
reconciled to one canonical list shape and enriched with a two-phasestatus
(scripted_passed→passed), acoverageblock (criteria_total/
criteria_covered/uncovered), and structuredevidence(scripted/journeys
/matrix). Older gates without these keys still parse./aep-dispatch+/aep-wrap: dispatch reports ascripted_passedgate as
"machinery green, dogfood pending" (still blocks the next layer); wrap performs the
two-phase flip (all applicable tiers green + coverage complete-or-waived +
regression replay), then asks the human before advancing to the next layer./aep-workflow-feedback: pushes skill improvements downstream via a deliberate
skills-CLI re-pin (README upgrade flow) instead of the removedsync-downstream.sh.
Removed
/aep-testing-guideskill — content redistributed into
/aep-e2e-skill-scaffoldingreferences andscaffold/references/workspace-hook.md.- Legacy maintainer scripts
scripts/sync.sh,scripts/sync-downstream.sh,
scripts/migrate-downstream-layout.sh— superseded by theskillsCLI as the
canonical install/upgrade mechanism.
Assets 2
v2.2.0 — /aep-workflow dynamic-workflow pattern
9cb48a5 Adds a first-class dynamic-workflow pattern — /aep-workflow — extracting the idea from Anthropic's A harness for every task: dynamic workflows in Claude Code (Thariq Shihipar & Sid Bidasaria).
A dynamic workflow is a harness Claude writes itself for one task — a deterministic JS script that orchestrates context-isolated subagents — framed as a structural fix for three failure modes of long single-context work: agentic laziness, self-preferential bias, and goal drift.
Added
/aep-workflowskill (patterns): dual library + standalone skill (same shape as/aep-gen-eval) — the failure modes, the "when (and when not) to reach for a workflow" judgment, invocation (ultracode, "...with workflow",/loop+/goal, token budgets), and AEP touchpoints.- Sub-pattern catalog (
references/pattern-catalog.md): classify-and-route, fan-out-and-synthesize, adversarial verification, generate-and-filter, tournament, loop-until-done — each with the Workflow primitive to use, an AEP example, and a skeleton; plus architectural levers (per-agent model, worktree isolation, quarantine, budgets, resumability). - Research note
docs/research/dynamic-workflows.md(source + rationale).
Changed
/aep-executorand/aep-gen-evalnow cross-link/aep-workflow(the narrow executorworkflowmode → general catalog; gen/eval → its N-verifier generalization).- README / orientation / quick-reference / glossary updated; skill count 19 → 20.
Full notes: CHANGELOG.md · PR #14.
Downstream: consumer repos pick this up on their next re-pin via the skills CLI.
Assets 2
v2.1.0 — /aep-model object-first design stage (OOUX/ORCA)
Add an object-first design stage — /aep-model — that turns the verb-first
story map into a noun-first Object Map (OOUX/ORCA) before UI is built, so build
agents stop inventing one-step-one-screen task-wizard UIs. The structural UI plan
(objects, attributes, relationships, CTAs, screens) is auto-drafted from artifacts
AEP already produces, human-approved at a short gate, then governs build — leaving
only taste (look/voice/journey) to /aep-calibrate. Background and the verb-first
vs noun-first analysis: docs/research/ooux-object-modeling.md.
Added
/aep-modelskill (product-context): runs ORCA (Objects → Relationships →
Calls-to-action → Attributes → screens) to draft an Object Map, takes a short
human review gate (object boundaries, primary anchor, task-flow exceptions), and
writes the approved noun-first blueprint. Sits between/aep-mapand
/aep-dispatchfor UI-facing products. Registered in
marketplace.jsonproduct-contextplugin.- Object Map artifacts + schemas:
product/object-model.yaml(cross-capability
object ontology) andproduct/maps/<capability>/object-map.yaml(capability-scoped
ORCA/IA projection), via_shared/templates/object-model-schema.yamland
object-map-schema.yaml. Object-first is the default; task-oriented flows are an
opt-in escape hatch recorded with a reason. - ORCA reference (
product-context/model/references/orca-process.md):
round-by-round derivation from AEP inputs + the object-first/task-oriented decision
framework and the completeness checks. - Glossary terms: Object Model, Object Map, ORCA, Call-to-Action (CTA), Nested
Object Matrix, Object-First vs Task-Oriented. - Research note
docs/research/ooux-object-modeling.mdand aResearchcategory
indocs/README.md.
Changed
- Schema (
product-context-schema.yaml): addsstories[].object_model_refs,
stories[].capability,architecture.modules[].kind, and theobject-model
quality dimension. Object-map approvals are tracked as thincalibration.history
references — the artifact bodies stay underproduct/, not inlined into
product-context.yaml. /aep-envision: declares theobject-modelstructural gate by default for
UI-facing products./aep-map: auto-drafts Object Maps after decomposition; setsmodule.kind+
story.capability; flips an approved map tostaleon re-decompose; routes Next
Step to/aep-model./aep-dispatch: injects the minimal Object Map slice into a story's context
package and refuses UI-facing stories without an approved (non-stale) map./aep-launch: aborts a UI-facing story when no approved Object Map covers it./aep-build: UI implementation obeys the injected Object Map slice (object structure and CTA grammar; taste still from calibration)./aep-validate: Mode A gains Object Map completeness checks (coverage, object
homes, anchors, task-flow justification, ref resolution).
All additions are backward-compatible — the object-model path only engages for
UI-facing products that opt in.
Assets 2
v2.0.1 — operationalize dogfood → reflect → story
[2.0.1] - 2026年06月16日
Operationalize the dogfood → reflect classifier → story link so the G6
self-feeding loop fires for every dogfood trigger — not just the autopilot
post-merge guard. Previously "feed the report to the /aep-reflect classifier"
was prose-only: no adapter parsed the unified markdown report into the classifier's
normalized record, so a standalone / ad-hoc dogfood (and even the guard path) left
findings on disk with no auto-filing. A dogfood that surfaced real bugs would stop
and wait for a human to hand-author stories.
Added
dogfood_reportsource adapter (product-context/_shared/references/telemetry-ingestion.md):
parses each##finding in.dev-workflow/dogfood-*.md(unified
severity/category/repro format) into the normalized observation record the
/aep-reflectStep 2 classifier consumes. Maps Severity → priority, Category →
suggested_classhint, and assigns a deterministic
external_id = dogfood:<report>:<hash>so re-running the same dogfood never
duplicates stories. Self-describing file glob — not gated bycoverage_check.dogfood_reportwatch source (/aep-watch): a standalone, local, or
post-deploy dogfood report is now ingested headlessly — classified, deduped, and
(underfull_auto/watch.auto_create) auto-filed as a bug/refinement story
that autopilot dispatches on its next tick. Calibration / discovery /
opportunity-shift / process findings still surface to a human.
Changed
dogfood-validation.md"On issue" — replaced the prose assertion with the
concrete adapter + watch ingestion path, and made the report-file path an
explicit contract: findings left only in chat are a dead end./aep-reflectStep 1 — adds.dev-workflow/dogfood-*.mdto the gathered
feedback sources (was omitted), normalized via the same adapter/aep-watchuses./aep-buildPhase 6 — write the dogfood report file even when findings are
clean, since that path is the ingestion contract.
Assets 2
v2.0.0 — Autonomy Loop
071f98c The autonomy loop release. Closes the loop-engineering gaps identified in
docs/research/loop-engineering-autonomy-gap.md (G2–G7) and adds a full_auto
master switch. Every new capability defaults to human-in-the-loop — autonomy
is opt-in via topology.routing flags.
Added
/aep-watchskill (G6) — continuously ingests telemetry / error streams /
bug trackers, classifies findings with the/aep-reflectclassifier, and
auto-files bug/refinement stories so reflect→dispatch becomes self-feeding.- Change-strategy recovery ladder (G2) —
gen-eval/references/recovery-ladder.md;
on repeated eval FAIL the build climbs same-fix → re-ground → fresh
native-bg-subagentgenerator → decompose before theeval_not_converging
human gate. - Host-aware post-deploy dogfood (G4b) —
executor/references/dogfood-validation.md:
dogfood_method()(Claude → agent-browser; Codex → native in-app browser /
computer-use, or Playwright headless) +target_url()(config-first, CI fallback). - Post-merge guard (G4a) —
autopilot/references/post-merge-guard.md+ tick
Step 3.5: monitors merged stories' deploy health; dogfood issues → reflect story;
hard regression → conservativeauto_revert(default off, warn + escalate). - Telemetry-driven reflect (G5) —
reflect/references/telemetry-ingestion.md:
automated source ingestion + quantitative outcome-contract auto-evaluation. - Telemetry source determination — projects decide sources via a hybrid
metric-driven rule:/aep-scaffold//aep-onboarddetect the observability stack
(candidate sources);/aep-mapbinds each quantitativesuccess_metric+
health_signalto a source (metric_map); a sharedcoverage_check()lets
/aep-watch,/aep-reflect, and the post-merge guard block auto when the
binding is incomplete instead of silently no-op'ing. - Visual Design evaluator dimension (G3) — vision-model scoring of screenshots
against the design system, for both Claude and Codex (multimodal). full_automaster switch (A1) —topology.routing.full_auto(default false)
gates the strategic human pauses (design escalation, qualitative outcome eval);
impliesauto_design+auto_outcome_eval+watch.auto_create. New config keys
added to the product-context schema.
Changed
/aep-buildPhase 5 climbs the recovery ladder; Phase 6 dogfood is host-aware
(degrades instead of skipping when agent-browser is absent)./aep-reflectStep 1 supports automated ingestion; Step 2.75 auto-evaluates
quantitative outcome contracts (qualitative still pauses unlessfull_auto)./aep-autopilotgains the post-merge guard step andfull_auto-aware routing;
loop hygiene unified on--max-turns(G7).
Fixed
- Carries forward the v1.8.0 executor fix (claude-team removed;
native-bg-subagent
default + post-spawn liveness probe). Every new spawn path uses it.
Assets 2
v1.8.0 — native-bg-subagent default; remove claude-team
[1.8.0] - 2026年06月15日
Changed
- Executor: removed the
claude-teambackend. Its agent-teams spawn path
fails silently on Claude Code ≥ 2.1.x — the launch command is truncated in a
detachedclaude-swarmtmux pane and never submitted, so no worker starts,
yet the team roster still reports the member "active".native-bg-subagent
replaces it as the Claude Code default. The agent-teams env flag is no longer
consulted and there is no "...with agent team" opt-in. See
docs/decisions/remove-claude-team.md. - Orphan/stuck detection now decides liveness by the real-liveness probe
(process/agent exists AND worktree shows activity), never by roster/state
membership.
Added
native-bg-subagentexecutor backend (Claude Code default): Agent tool
withrun_in_background: true, no team. Success signature is a bare-hex
agentId+ JSONLoutput_file; steered viaSendMessage(to: agentId)+
feedback.md; human gate is gate-and-park.- Mandatory post-spawn liveness probe with auto-fall-back to
native-bg-subagenton failure
(skills/patterns/executor/scripts/spawn-liveness-probe.sh). - Explicit one launch = one worktree = one subagent = one story invariant in
/aep-launch(thecompile_mode: grouped_changestory group is the one
documented exception).
Fixed
claude-bgis now gated onBG_AVAILABLE; documented that theclaude --bg
one-shot spawn flag was removed on Claude Code ≥ 2.1.x (so claude-bg is skipped
there and native-bg-subagent is used).
Assets 2
v1.7.0 — goal-driven autopilot driver
0a35505 Goal-driven autopilot driver: /aep-autopilot now keeps itself ticking with
a goal driver by default — the host-native /goal primitive (Claude Code
v2.1.139+ and Codex's experimental goals feature) — which re-fires a tick when
there is work and self-terminates when the current layer is complete or a
human-judgment gate is hit. The fixed-interval /loop driver is retained as a
fallback (--loop). Only the driver changes; the 7-step CHECK→ACT tick, the
delegated cheap CHECK, the signals protocol, and the orchestrator boundary are
unchanged. Decision record:
docs/decisions/goal-driven-autopilot.md.
Added
- Goal driver (default) —
/aep-autopilotwith no--loopflag builds a
one-layer goal condition and drives it via/goal: "layer N complete (all
stories merged + wrapped) OR autopilot paused". Scoped to one layer per run
— it stops at the layer boundary so the human runs the layer gate //aep-reflect
and re-invokes for the next layer. Native and near-symmetric on both hosts
(Claude Code Haiku-evaluator Stop hook; Codex persisted thread goal with
token_budget). - Per-tick surface + wait tail (step 7, goal driver only) — each tick
surfaces a signals-onlyAUTOPILOT ...status line for the goal evaluator to
judge (boundary-safe: never workspace code), then waits a bounded floor
(default5m,--floor) before ending the turn. The floor is the anti-hot-loop
mechanism — CC usesMonitorwith a hard timeout (a raw foregroundsleepis
blocked in a turn); Codex uses shellsleep. - Goal-driver flags —
--floor <dur>(per-tick wait floor) and
--max-turns <n>(runaway backstop, default200); on Codex atoken_budget
is set as the hard wall (soft-stops tobudget_limited).
Changed
/aep-autopilotdefault behavior — the default driver is now goal-driven
and self-terminating per layer. The command surface is unchanged;--loop <interval>selects the prior fixed-interval behavior exactly./aep-autopilot stop— cancels whichever driver is active (/goal clear
for the goal driver;/loopcancel or cron/launchd removal for the loop
driver).- Driver ×ばつ backend compatibility (executor
backends.md) — the long-lived
session class now names two in-session variants,/goal(default) and/loop;
the goal driver is in-session-only, so the cron/launchd row stays the
/loop/codex execpath.
Assets 2
v1.6.0 — native-first executor backends + hub-and-spoke human gates
108d4c1 Native-first executor backends: launch/dispatch/build/autopilot/wrap now
target each host's native parallel-agent machinery instead of tmux+cmux. The
B1–B4 ladder is replaced by named launch modes; every mode still runs its
worker in the AEP-created worktree at .feature-workspaces/<ws> with the
file-based signals protocol as the source of truth. Decision record:
docs/decisions/native-first-executor.md.
Added
claude-teammode — Claude Code agent teams: one teammate per story in a
standing team (TeamCreate once; spawn/shutdown teammates per tick), push
steering viaSendMessage, native display viateammateMode. Requires the
experimentalCLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMSflag (project
.claude/settings.jsonenvblock;/aep-onboarddocuments it).claude-bgmode — Claude Code native background sessions
(claude --bg/attach/logs/stop/respawn): the GA fallback;
OS-bound, cron-driver compatible, worktree-enforced by process cwd.codex-subagentmode — Codex multi_agent workers (desktop app + CLI)
with shippedaep-builder/aep-evaluatorroles (.codex/agents/*.toml),
send_inputsteering, and the native approval overlay as a human gate.codex-execmode — headlesscodex exec --cd <worktree>workers,
steerable across sessions viacodex exec resume <id>; the Codex mode for
cron-driven autopilot and hard cwd isolation.- Human-gate protocol (hub-and-spoke) —
.dev-workflow/signals/needs-human.mdstatus.jsonblocked_on: "human": host-agnostic record of decisions only
the human can make. The main agent is the canonical human console: the
question flows back to the orchestrator, the human answers there, and the
answer is relayed per mode — by push on steerable modes
(block-in-place), or by resuming a parked worker into its worktree on
batch/pull modes (gate-and-park: the worker commits WIP, records the
gate, and ends its run cleanly). Direct worker surfaces (teammate pane,
claude attach, Codex thread,tmux attach) are optional conveniences.
Gated workspaces count as waiting, not stuck.
- Workflow mode is now a complete backend — gate-and-park gives the
dynamic-workflow fan-out a human-gate path: build agents return a structured
gatedresult instead of guessing or stalling; the main agent collects
gated stories after the run, asks the human, and resumes them into their
existing worktrees (continuation viaresumeFromRunIdor re-launch). New
"Mode: workflow" recipe inbackends.md; the/aep-dispatch ... with workflowpath now creates the.feature-workspaces/<ws>worktrees and
announces gate behavior instead of "no mid-flight feedback". - Orphan re-adoption — lead restarts no longer strand session-bound
workers: autopilot re-spawns into the existing worktree with the recovery
bootstrap instead of failing the story. - New executor references:
claude-native.md,codex-native.md,
tmux-session.md; newgate()operation; driver ×ばつ backend compatibility
matrix (session-bound vs OS-bound worker lifetimes). - Autopilot state:
backend,agent_id,last_liveness_hash(generalizes
last_tmux_hash),human_gateescalation type,readoptedaction.
Changed
- Behavior change: Claude Code with tmux installed no longer auto-selects
tmux — native modes win. Pin the old workflow with
git config aep.executor-backend tmux(or "...with tmux"). tmux+cmux recipes
are preserved verbatim as thelegacymode (generic-host fallback). - Autopilot accepts any steerable, driver-compatible mode (was: B1/B2 only);
Codex autopilot is now supported (in-thread ticks → codex-subagent; scheduled
codex execticks → codex-exec). /aep-buildPhase 5 evaluator spawn is mode-dispatched: foreground Task
subagent (Claude native modes) orcodex exec --cdwith the aep-evaluator
role (Codex modes); the evaluator prompt is delivered at spawn time — no
sleep/poll/kill-pane on native modes./aep-wrapteardown is mode-dispatched (teammate shutdown /claude stop/
close_agent/ no-op /tmux kill-session).- Command naming normalized to
/aep-*across all skills, README, and
docs: every AEP skill is invoked by its canonical registered name
(/aep-autopilot,/aep-dispatch,/aep-launch, ...) — the unprefixed forms
(/autopilot,/launch) no longer appear, eliminating the prefix confusion.
Non-AEP commands (/loop,/opsx:*, Codex/agent,/workflows) are
unchanged. (Changelog entries v1.5.0 and older keep their original wording
as historical record.)
Assets 2
v1.4.0
Codex /launch now uses Codex-native, worktree-bound subagents for coding launches by default, while Claude/generic executors keep the tmux session path.
Changed
aep-executorbackend selection now chooses B3 for Codex hosts before checking cmux/tmux, so Codex coding launches no longer create tmux sessions just because tmux is installed.- The B3 spawn recipe still creates the standard AEP worktree first (
.feature-workspaces/<name>onfeat/<name>), then starts the Codex worker/subagent with an explicit "operate only in this worktree" contract. /launch,/dispatch,/build, and/autopilotdocs now distinguish Codex subagent launches from steerable tmux sessions. Autopilot remains B1/B2-only because it requires livenudge()/liveness()..claude-plugin/marketplace.jsonversion1.3.2→1.4.0.
Assets 2
v1.3.2 — cmux launch-surface fix
1f7d452 Fixes how /launch and aep-executor attach a cmux review surface (reported and verified from a downstream session).
Fixed
- cmux detection no longer requires
$CMUX_SOCKET— keys on "cmux CLI reachable + a target pane resolves" (cmux tree◀ here/$CMUX_PANE_ID), so a cmux host where$CMUX_SOCKETis unset no longer falls back to headless B2. - The review tab opens as a sibling of the orchestrator's own tab — resolves the pane from
cmux treeand usesnew-surface --workspace <ws> --pane <pane> --focus true, instead of a barenew-surface(unset$CMUX_WORKSPACE_ID).cmux new-workspaceis kept out of the launch path. - Bootstrap is sent before the cmux surface attaches — an attached surface focuses the tmux composer and blocks
send-keys; reordered to spawn → ready → bootstrap (tmux send-keys) → attach tab.
Full changelog: v1.3.1...v1.3.2