Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat(recipes): the PR Factory — a published recipe for PR review, triage & testing#2742

Open
gavrielc wants to merge 8 commits into
main from
feat/pr-factory-recipe
Open

feat(recipes): the PR Factory — a published recipe for PR review, triage & testing #2742
gavrielc wants to merge 8 commits into
main from
feat/pr-factory-recipe

Conversation

@gavrielc

@gavrielc gavrielc commented Jun 11, 2026
edited
Loading

Copy link
Copy Markdown
Collaborator

PR Factory

Turn every pull request into a reviewed, test-planned, human-approved Slack thread, run by your own agents, on your own host.

When a PR opens, the Factory spins up a dedicated worker agent for it, posts a thread to Slack, and the worker triages the change, reviews the diff, and proposes a test plan. Nothing consequential happens on its own: merges, test runs, and credentialed gh actions each surface as an approval card in the thread, and only fire when a human clicks approve. An optional supervisor bot lets you improve the worker by talking to it ("you keep missing N+1 queries, fix the review skill"), and an optional tester bot runs approved plans on ephemeral VMs and reports back.

It's packaged as a recipe: a thin top-level skill that applies a set of component skills, each independently installable and removable. Point Claude at .claude/skills/recipes/pr-factory/SKILL.md and follow it. The recipe probes that your core is new enough, then applies the components in order and validates the composed result.

How it works

GitHub PR ──webhook──▶ host ──▶ per-PR worker session ──▶ Slack thread
 (open/push) │ triage · review · test-plan │
 │ ▼
 │ approval card per action
 │ (merge · gh write · run tests)
 ▼ │
 supervisor bot (skill feedback) tester bot ──▶ ephemeral VM ──▶ results
  • One worker per PR. Each PR gets its own agent session/container; a synchronize (new push) re-triages against the fresh diff.
  • Approval-gated by construction. The worker's GitHub credential is read-only; every write goes through a credentialed_gh action that posts an approval card first. The agent cannot merge, comment, or mutate anything a human didn't approve.
  • The bots are Slack app instances. Worker, supervisor, and tester are three Slack apps in one workspace, running as distinct channel instances on the same NanoClaw host.

Components

Component What it adds
pr-factory-core the engine: GitHub webhook → per-PR thread + worker session, the approval flows, the MCP tools, and the default review/triage/test-plan instructions
slack-bots the supervisor + tester Slack apps as channel instances, with sibling-bot echo suppression
gh-action-approval approval-gated credentialed gh actions (the only write path to GitHub)
vm-test-orchestrator runs approved test plans on ephemeral VMs over SSH
slack-canvas renders the worker's .md reports as Slack Canvases

Make it yours

The Factory ships with generic review, triage, and test-planning instructions so it works on any repo out of the box — but the point is to tailor it to your standards. The recipe's "Tailoring the bots" section walks you (interview-style) through writing your own container skills: it asks about your review dimensions, your triage categories and routing, and your test environments, then generates per-group skills you drop in and point PR_FACTORY_REVIEW_SKILL at. From there you refine them by talking to the supervisor bot — every fix lands as a diff behind an approval card.

Setup (sketch)

Full, ordered steps live in the recipe and component SKILL.md files. In brief:

  1. Slack: create three apps (worker / supervisor / tester) in one workspace; the recipe gives the scopes and event/interactivity config.
  2. GitHub: set GITHUB_WEBHOOK_SECRET, add a Pull-requests webhook → /webhook/github, and provision a read-only fine-grained PAT for the worker; approvers authenticate their own gh for the write path.
  3. Channels + repo: PR_FACTORY_SLACK_CHANNEL_ID, optional PR_FACTORY_SUPERVISOR_SLACK_CHANNEL_ID, and PR_FACTORY_DEFAULT_REPO.
  4. (Optional) testing: a pr-tester group + a template VM and SSH keys for the tester→VM path (TEST_VM_*).
  5. Apply the recipe; its validation chain (build, typecheck, host + container tests, composed-stack guards) confirms the install.

Good to know

  • One instance, one repository. A Factory instance serves a single repo (PR_FACTORY_DEFAULT_REPO); run/VM/timeout state is per-PR within it. Point separate instances at separate repos.
  • Single trust domain. The Factory registers host-wide delivery actions and trusts every agent group on the host — run it on a host you control, not co-located with untrusted agents.
  • Cost scales with PRs. A container and a full-diff LLM session per PR, re-triaged on each push — budget for your repo's PR/push rate. The docs have a cost/scale + operational section (including host-restart behaviour).
  • gh in the container. The default worker workflow uses gh; the stock image doesn't ship it — add it to the Dockerfile or supply a REST-based review skill (documented).

How it was tested

Because applying the recipe is what touches core, the test is the apply itself, on a fresh clone of main:

  • Apply — all core-version probes pass, every component applies clean, six green validation checkpoints (host suite grows 458 → 547), container suite green. An agent completes it unaided.
  • Idempotent — re-running every component is byte-for-byte a no-op.
  • Reversible — the REMOVE flow returns the tree byte-identical to pre-apply (only the recipe folder remains).

The diff in this PR is the recipe folder plus a two-line docs/skills-model.md / docs/skill-guidelines.md amendment defining published recipes — no src/ changes.

Requirements

NanoClaw ≥ 2.1.11 (the recipe probes for the specific core capabilities it needs and tells you if anything's missing). Slack workspace, a GitHub repo you can add a webhook to, and — for the testing path — an SSH-reachable VM host.

🤖 Generated with Claude Code

gavrielc and others added 8 commits June 11, 2026 22:31
One reviewable bundle under .claude/skills/recipes/pr-factory/: the recipe
SKILL.md/REMOVE.md on top, five component skills inside it (slack-bots,
pr-factory-core, gh-action-approval, vm-test-orchestrator, slack-canvas),
each with its own SKILL.md, REMOVE.md, files.txt manifest, and generated
files/ mirror. The recipe's own files.txt carries the architecture doc,
the composed-stack test, and the manifest/mirror sync infrastructure
(scripts/sync-skill-files.sh + src/skill-sync.test.ts) that every files/
folder in the bundle is generated by — applied by copy, like everything
else in the bundle. Nothing lands in src/, container/, or scripts/ until
an install applies the recipe.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
...context
Audited every SKILL.md step against a clean nanoclaw 2.1.11 tree (all seven
core-version probes pass literally; every cp source resolves against its
component's files/ mirror; every barrel/anchor the edit steps quote exists
verbatim). Two fixes: the /add-slack version override now spells out
'pnpm install @chat-adapter/slack@4.26.0 --save-exact' (a caret range
re-resolves to 4.27.0, whose chat@4.27.0 types break the build against
core's chat@^4.24.0), and every apply section states that its commands run
from the repo root.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
...bulary
The recipe is a standalone artifact now that every core capability it
presupposes ships in nanoclaw 2.1.11: the prerequisite table anchors on
'requires nanoclaw >= 2.1.11' with the probe commands as the real check
(no PR numbers as history), failed-probe guidance says 'update core'
instead of naming PRs to land, and the bot_id->instance migration is
described as a legacy-substrate upgrade rather than a fork upgrade
throughout (SKILL.mds, REMOVE.md, files.txt, the shipped architecture
doc).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
...t skills
New 'Tailoring the bots' section in the recipe SKILL.md: the verified core
mechanism (container/skills/<name> + per-group skills selection symlinked
at spawn; group-private project-level skills under
groups/<folder>/.claude/skills/), the precedence rule
(PR_FACTORY_REVIEW_SKILL replaces the seeded CLAUDE.local.md defaults in
every PR trigger), an interview flow an agent runs with the operator
(review standards -> triage categories and routing -> test environments
and depth) that generates the skill files, the supervisor-bot
propose_skill_edit loop as the iteration path, and one compact worked
review-skill skeleton.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
One paragraph in docs/skills-model.md: a composition worth sharing whole
can be contributed upstream as a single reviewable bundle under
.claude/skills/recipes/<name>/, components inside it each held to the
standalone-skill guidelines; fork-private recipes stay the default. One
matching line on the skill-guidelines checklist's recipe entry.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
...orkflow
The clear_session + retrigger capability was broken (a clear was meant to
always pair with a retrigger, but the flow killed the container, deleted the
session, and removed the pr_threads index entry — leaving retrigger unable to
re-resolve the thread). Rather than fix it, rip it out entirely.
Removed across the pr-factory-core component mirror:
- session-ops.ts (clearWorkerSession / retriggerWorker) — file deleted; its
 only exports were those two functions + their result type.
- rebootstrapPrSession() in handler.ts — used solely by retriggerWorker; the
 synchronize/new-push path has its own inline re-bootstrap (handleSynchronize)
 and does not share it, so it goes too.
- The clear_session / retrigger container MCP tools (defs + registerTools entry)
 in container/agent-runner/src/mcp-tools/pr-factory.ts.
- The pr_clear_session / pr_retrigger host delivery-action registrations and the
 clearWorkerSession/retriggerWorker re-exports in index.ts.
- Test coverage scoped to clear/retrigger (the two tool cases + the four-action
 registration list); the host default-repo guard now rides pr_submit_test_results.
- All clear/retrigger references in SKILL.md, files.txt, and docs/pr-factory.md.
The supervisor's propose_skill_edit / skill-feedback capability stays. A skill
edit now applies to the next PR the worker triages; the docs say so plainly.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
...w one
orchestrator.onVmReady() armed a 30-minute timeout keyed by prNumber but never
cleared a timer already armed for the same PR. A second VM-ready for the same PR
(e.g. a re-run) orphaned the first timer, which then fired against the wrong run
and was never cancelled. Clear the prior timer before setting the new one.
Keying stays by prNumber — a factory instance serves one repo (now documented).
Adds an orchestrator guard test: two onVmReady calls for the same PR plus one
results submission leave no orphaned timer to fire past the ceiling.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
...notes
DOCS-1 (read-only token): make load-bearing in pr-factory-core SKILL.md +
docs/pr-factory.md that the injected GitHub credential MUST be a fine-grained
read-only PAT (contents/pull_requests/metadata: read) — all writes go through
the human-approved credentialed_gh path. Framed as a required setup step + a
security-model note.
DOCS-2 (single-repo): state plainly that a factory instance serves ONE repo
(PR_FACTORY_DEFAULT_REPO); run/VM/timeout state is keyed per-PR within it.
Reframe PR_FACTORY_GH_REPO_ALLOWLIST as a write-target guard, not a multi-repo
switch.
DOCS-3 (apply/remove fixes):
- Fix the writeOutboundDirect core-version probe: replace the markdown-escaped
 pipe (broke on copy-paste) with a pipe-free awk.
- Recipe REMOVE.md deletes the recipe-owned guard tests FIRST; per-component
 REMOVE validate steps say they are skipped during full-recipe removal.
- Surface the @chat-adapter/slack 4.26.0 exact-pin at the /add-slack step with a
 version check; drop the refuted claim about that skill's text.
- Warn not to re-run /add-slack after slack-bots (clobbers the 3-line patch;
 slack-ignore-senders.test.ts catches it); add "skip if already present" to
 every append step.
- Replace slack-canvas's prose delivery.ts revert with the exact code blocks.
- Reconcile the contradictory review-skill container-config guidance to one
 instruction (default skills:'all' → no config change).
- Drop the "optional component" framing; the recipe is applied as a unit so the
 stack/sync tests are always satisfied.
DOCS-4 (adopter docs): gh-in-container requirement (base image ships no gh),
tester→VM SSH provisioning, host-restart drops in-flight runs / orphans VMs,
cost/scale, single-trust-domain, and a de-meta sweep of the docs/SKILL prose.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@gavrielc gavrielc force-pushed the feat/pr-factory-recipe branch from e2b7f85 to 2be6dbc Compare June 11, 2026 22:43
@gavrielc gavrielc changed the title (削除) feat(recipes): the PR Factory — published recipe of five component skills (削除ここまで) (追記) feat(recipes): the PR Factory — a published recipe for PR review, triage & testing (追記ここまで) Jun 11, 2026
@github-actions github-actions Bot added the PR: Skill Skill package or skill-related changes label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@gabi-simons gabi-simons Awaiting requested review from gabi-simons

Assignees

No one assigned

Labels

follows-guidelines PR was created using the current contributing template PR: Skill Skill package or skill-related changes

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

AltStyle によって変換されたページ (->オリジナル) /