Releases: popup-studio-ai/bkit-claude-code

v2.1.22 — Hardening Release

02 Jun 02:12

@agent-kay-it agent-kay-it

v2.1.22

f077545

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v2.1.22 — Hardening Release Latest

Latest

bkit v2.1.22 — Hardening Release

A stability- and consistency-focused release. There are no new commands or workflows to learn — instead, this release removes friction that previously affected Windows users and anyone running Sprint/PDCA stop hooks, and pays down internal complexity so future changes ship more safely.

This release is regression-free: verified by a full-suite baseline diff (320 → 321 test files, 0 net new failures).

✨ Highlights

🪟 Windows is no longer broken at load time. The frontmatter parser used a /^---\n/ fence pattern that silently failed on Windows CRLF (---\r\n) files — meaning skills, agents, and output-styles could fail to load entirely. Fixed across 6 fence sites and 34 line-splitting sites, while staying byte-identical on macOS/Linux (LF).
🛑 No more "Hook JSON output validation failed" errors. Running /sprint list (and other Sprint/PDCA actions) could surface Stop hook error: Hook JSON output validation failed — (root): Invalid input. The root cause — a mistyped hook decision enum shared by 5 Stop emitters — is fixed, with a contract test added so it can't regress.
🧹 Leaner, safer internals. The 4 largest "god files" (>700 LOC) were split down to focused modules (sprint-handler.js 1509 → 271, state-machine.js 985 → 406, automation.js 770 → 451, unified-stop.js 751 → 693) with behavior fully preserved.
🌐 English CHANGELOG. The full CHANGELOG.md (59 release sections) is now in English, aligning with the project's global-service language policy.
🔗 Up-to-date with Claude Code. Validated against CC v2.1.146 → v2.1.159; recommended CC version bumped to v2.1.159 (balanced) / v2.1.150 (conservative). Continuous-compatibility streak extended 101 → 112.

👤 What changes for you

If you...	What you'll notice
Run bkit on Windows	Skills/agents/output-styles that previously failed to load now load correctly (CRLF handled).
Use `/sprint` or PDCA stop hooks	The intermittent `Hook JSON output validation failed` error is gone; stop output renders a clean summary + next step as intended.
Read the CHANGELOG	It's now in English end-to-end (Korean kept only for intentional sample data and the "(KO)" tagline).
Pin a Claude Code version	Recommended versions updated to v2.1.159 (balanced) / v2.1.150 (conservative).
Don't hit any of the above	No action needed — your existing commands and workflows are unchanged.

No migration steps are required.

🔧 Under the hood

S1 — CC v2.1.159 Response (ENH-324~328): cancelled ENH-317 as MOOT after the upstream /simplify rename was reverted (v2.1.152/154); verified sessionTitle-on-resume and multi-Agent frontmatter compatibility; registered 2 new regression monitors.
S2 — Cross-Platform Verification (ENH-329~335): CRLF fixes above; confirmed shell-branching is unnecessary (only git/gh/node/npx are exec'd). Carry: native Windows runtime CI matrix.
S6 — Stop Hook Schema Compliance (ENH-361~366): corrected cc-payload.port.js decision typedef; added outputStopSurface() / outputStopAllow() single-source helpers in lib/core/io.js.
S4 — Tech-Debt & Dead-Code (ENH-336~342): field audit found 0 removable dead code; the 6 pdca-eval-* stubs are confirmed permanent (required by immutable contract baselines + L4 deprecation governance).
S3a — God-File Split (ENH-343~348): 4 → 0 files over 700 LOC; contract baselines (255 / 234 assertions) unchanged.
S3b — Layer Consolidation (ENH-349~354): verified 0 redundancy to merge; documented as ADR 0013 (no code change).
S5 — Final QA + i18n + Docs-Sync (ENH-355~360): full QA, 8-language trigger-keyword audit (44 skills + 40 agents), code=docs inventory sync, version bump 2.1.21 → 2.1.22.

🧪 Quality & regression note

The initial S5 QA pass was incomplete (it ran only tests/, 43 files, and missed test/, 278 files). A full-suite run surfaced 6 failures, of which 5 were self-introduced regressions from the S3a refactor (fixed in 795c724) and 1 was pre-existing. Final baseline diff:

Baseline b591410 full-suite: 320 files / 28 fail
Released HEAD full-suite: 321 files / 26 fail
Net new regressions: 0 — and 2 baseline failures (docs-code-sync, sprint-alpha-e2e) were actually fixed.

The remaining 26 failures are pre-existing test-debt, scheduled for cleanup in v2.1.23 (along with the native Windows CI matrix).

Full Changelog: https://github.com/popup-studio-ai/bkit-claude-code/blob/main/CHANGELOG.md

🤖 Generated with Claude Code

Assets 2

v2.1.21 — Session Title Isolation (#111) + Sprint Output Enforcement (#113)

29 May 07:44

@popup-kay popup-kay

v2.1.21

6e56b2f

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v2.1.21 — Session Title Isolation (#111) + Sprint Output Enforcement (#113)

bkit v2.1.21 — Issue Response Sprint

This release closes two external dogfooder issues in a single unified sprint, each fix validated against actual codebase file:line evidence rather than accepted on report alone.

Issue	Title	Reporter
#111	Session title collision across parallel sessions	@wonuseo
#113	Sprint screen-output enforcement gap	@rohwonseok-ops

✨ Highlights

🪟 #111 — Parallel sessions now get distinct window titles

Opening two Claude Code windows in the same project folder on the same feature used to label both windows identically ([bkit] PLAN f1) — making it dangerously easy to type a command into the wrong terminal. This was latent across all of v2.1.6 → v2.1.20. bkit now appends a stable per-session tag so each window is uniquely identifiable.

📋 #113 — Sprint operations now print a human-readable summary

Sprint commands (phase, status, watch, report) previously emitted raw JSON only, leaving you to trust the model's narration of it. Sprint now enforces the same Stop-hook output discipline as PDCA: an Executive Summary, an interactive next-step prompt, and a per-feature progress table — surfaced directly on screen.

🔬 A deeper fix than reported

While verifying #113 with real claude -p runs, we found that no skill Stop handler was actually firing in production (PDCA included) — Claude Code omits skill_name from the Stop payload, so every detection path silently fell through. v2.1.21 introduces a cross-process active-skill marker that fixes dispatch for Sprint and lays the groundwork for the PDCA family in v2.1.22+.

👤 What changes for you

Multi-window users (#111)

Each session window now shows a short unique tag, e.g. [bkit] PLAN f1 ·a1b2.
The tag is derived deterministically from the session ID — same session, same tag for its whole lifetime.
Fully backward-compatible: if there's no session ID, the title looks exactly as before. Legacy title caches migrate automatically on first read. No config change required.

Sprint users (#113)

After /sprint phase ... --to <p>, you'll see a Sprint Executive Summary (Mission / Result / matchRate / Cross-Sprint Integration / Invariant) instead of a JSON blob.
/sprint status and /sprint watch now render a per-feature table with a one-line quality-gate summary; raw JSON is still available for programmatic use.
/sprint report finalization prepends a full KPI + carry-items summary with a clear next action.
The session title updates to reflect the current sprint phase.

🧱 Under the hood

New modules: lib/sprint/executive-summary.js, scripts/sprint-skill-stop.js, lib/core/active-skill-marker.js
Refactors: session-title-cache.js (per-session map + GC + legacy migration), session-title.js (sessionId tag), 4 Stop emitters (session_id threading), unified-stop.js (sprint handler + marker peek), advance-phase.usecase.js (DI-based transition summary), sprint-handler.js (display field)
ADR 0012 — Sprint Stop Hook Output Enforcement (run-export pattern · separate sprint shape · usecase-purity DI · cross-process active-skill marker)
92 new/extended test cases across 6 files — all passing, including real claude -p --plugin-dir . runtime dispatch verification

🔭 Known follow-up

CARRY-#113-1 — PDCA-family Stop handlers share the same production no-op root cause and will be migrated to the run-export + marker pattern in v2.1.22+ (pending separate regression verification).

Full changelog: see CHANGELOG.md · Cross-references: #111 ⊃ #77 · #113 ⊃ #93

🤖 Generated with Claude Code

Contributors

wonuseo and rohwonseok-ops

Assets 2

bkit v2.1.20 — Marketplace Recovery + Plugin Manifest Schema Compliance

26 May 07:58

@popup-kay popup-kay

v2.1.20

2fc529f

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

bkit v2.1.20 — Marketplace Recovery + Plugin Manifest Schema Compliance

Released: 2026年05月26日
Trigger: External dogfooder @BJ (정병진) 2026年05月26日 install incident — Validation errors: : Unrecognized key: "displayName"
Sprint: 14 features / 3 sub-sprints / 3 new ENH / 1 new ADR / 1 external dogfooder #2 / 13/13 quality gates PASS / 0 auto-pause triggers
Minimum Claude Code: v2.1.143 (the strict plugin-manifest path recognizes the official displayName field only from v2.1.143)

✨ Highlights

🚨 Minimum Claude Code v2.1.143 advisory in README, README-FULL, marketplace.json, plus a new self-service guide docs/06-guide/cc-compatibility.guide.md. Advisory only — no hard reject to keep UX friendly for users still on older Claude Code.
🛡 21-key plugin-manifest whitelist CI gate (ENH-322) — scripts/validate-plugin.js --strict enforces the Anthropic official schema. New EXPECTED_PLUGIN_JSON_KEYS SoT in lib/domain/rules/docs-code-invariants.js (Object.freeze, pure domain). v2.1.20: advisory only for one week; v2.1.21+: strict.
🔧 ADR 0006 § Empirical Validation Gate recovery — scripts/release-plugin-tag.sh now wires claude plugin validate . (~30-day wire delay closed). Non-zero exit blocks release.
🚦 cc-regression Defense Layer 6 reinforcement (ENH-321) — new entry R3-321 tracks the displayName strict-reject regression with daily 09:00 KST reconcile cycle integration. 22 guards total, 0 warnings.
🔍 SessionStart runtime advisory (ENH-323) — hooks/startup/session-context.js now detects the installed Claude Code version (200 ms timeout cap + 1-hour .bkit/runtime/cc-version.json cache + opt-out via BKIT_DISABLE_CC_VERSION_DETECTION=1). Forward-proofs users who upgrade bkit before Claude Code.
📜 ADR 0011 Plugin Manifest Schema Compliance Policy (Accepted) — formalizes the 5-layer policy (minimum CC + 21-key whitelist + claude plugin validate wire + R3-321 + SessionStart detection).
🌟 External Dogfooder Hall of Fame #2 — @BJ (정병진) entry at docs/external-dogfooders/bj.md. bkit Early Adopter Program DA-4 status: N=2 confirmed (first-follower effect validated 28 days after the v2.1.19 policy introduction).

🧭 User-Experience Changes

For users running Claude Code v2.1.143 or later (≈95% of users)

README and README-FULL now display a one-line minimum Claude Code v2.1.143 advisory at the top.
The marketplace.json bkit entry description starts with Requires Claude Code v2.1.143+.
No functional change. Existing PDCA + Sprint workflows continue unchanged.
The new SessionStart CC version check returns isOldVersion=false and emits no advisory — invisible to you.

For users running Claude Code ≤ v2.1.142

claude plugin install bkit will still fail with Unrecognized key: "displayName" — this is a Claude Code-side schema rejection that bkit cannot bypass without removing the displayName field (which is forbidden by Anti-Mission since removal would regress the UI picker on Claude Code v2.1.143+ users).

However, the new docs/06-guide/cc-compatibility.guide.md gives a clear, self-service workaround:

npm install -g @anthropic-ai/claude-code@latest
rm -rf ~/.claude/plugins/cache/temp_git_*
claude plugin install bkit

The README advisory + marketplace.json description prefix surface this requirement before you hit the install error.

For users running an existing bkit session on Claude Code < v2.1.143

SessionStart now detects the Claude Code version and surfaces a bkit Compatibility Notice in the additionalContext (one-time-per-session, cached for one hour at .bkit/runtime/cc-version.json).
The notice includes the workaround command and a link to the compatibility guide.
Performance budget: 200 ms hard cap on claude --version execution + cache + one-call-per-session — typical session impact zero (cache hit) or sub-100 ms (first call).
Opt-out: set BKIT_DISABLE_CC_VERSION_DETECTION=1.

For bkit contributors / future PR authors

A new CI step Release Gate — plugin.json schema validation (21-key whitelist) runs node scripts/validate-plugin.js --strict on every PR (.github/workflows/contract-check.yml).
v2.1.20 = advisory only (continue-on-error: true) to avoid PR backlog during the rollout week.
v2.1.21+ = strict (continue-on-error: false) — extra keys in plugin.json will block PR merge.
Add new plugin.json keys only if they appear in the 21-key whitelist at lib/domain/rules/docs-code-invariants.js EXPECTED_PLUGIN_JSON_KEYS. If Anthropic publishes a new manifest key, update the SoT first.

For bkit release engineers

scripts/release-plugin-tag.sh now runs claude plugin validate . (per ADR 0006) between the CI-invariants check and tag-conflict detection. Non-zero exit blocks the release.
If the claude CLI is absent from PATH (some CI environments), the script logs a WARN and falls back gracefully — release continues.
This is the v2.1.20 release itself dogfooding ADR 0006's Empirical Validation Gate for the first time.

For external dogfooders / Early Adopter Program

Hall of Fame now has two entries: @pruge (v2.1.19, first follower) + @BJ (v2.1.20, second entry, first-follower effect validation).
DA-4 acquisition goal status: N=2 confirmed — first-follower effect validated 28 days after policy introduction. v2.1.21+ continues active outreach to grow N≥3.
New dogfooders are encouraged to file detailed issues per the 5-stage User-Feedback Lifecycle at docs/external-dogfooders/_README.md.

📦 What's in this release

Added

EXPECTED_PLUGIN_JSON_KEYS SoT + diffPluginJsonKeys in lib/domain/rules/docs-code-invariants.js
--strict flag in scripts/validate-plugin.js (exit codes 2 / 3 for extra-key / SoT-import failures)
Release Gate — plugin.json schema validation (21-key whitelist) step in .github/workflows/contract-check.yml
claude plugin validate . wire in scripts/release-plugin-tag.sh (ADR 0006 § Empirical Validation Gate)
R3-321 entry in lib/cc-regression/registry.js (cc-regression entry #22)
detectCCVersion() + buildCCVersionAdvisoryContext() in hooks/startup/session-context.js
ccVersionAdvisory section in bkit.config.json:ui.contextInjection.sections (default-on)
ADR 0011 Plugin Manifest Schema Compliance Policy (Status: Accepted)
docs/06-guide/cc-compatibility.guide.md — user-facing self-service guide
docs/external-dogfooders/bj.md — Hall of Fame entry #2
test/e2e/external-dogfood/cc-min-version.test.js — 5 TC permanent regression lock (Lifecycle Stage 4)

Changed

README.md / README-FULL.md — Claude Code badge v2.1.123+ → v2.1.143+, Version badge 2.1.19 → 2.1.20, one-line minimum CC advisory
.claude-plugin/plugin.json — version 2.1.19 → 2.1.20 (displayName unchanged per Anti-Mission)
.claude-plugin/marketplace.json — bkit + marketplace version 2.1.19 → 2.1.20, description prefix advisory
bkit.config.json — version 2.1.19 → 2.1.20, ui.contextInjection.sections adds ccVersionAdvisory
test/integration/config-sync.test.js CS-015 — diffPluginJsonKeys 21-key whitelist enforced
docs/external-dogfooders/_README.md — @BJ entry added under "v2.1.20", DA-4 status N=2 confirmed

Anti-Mission preserved (no regressions to v2.1.143+ users)

displayName field NOT removed (v2.1.143+ official schema key — removal would regress UI picker)
v2.1.142-and-below users NOT hard-rejected (advisory only)
Anthropic docs vs implementation lenient/strict mismatch (Q1) NOT touched (external responsibility)
bkit-starter plugin unchanged (no displayName field, zero impact)
Trust L3/L4 default unchanged

🌟 External Dogfooder Contributions

@BJ (정병진) drove the entire v2.1.20 sprint via precise error message + cache path + Cursor IDE environment metadata sharing. Reproduction scenario absorbed at test/e2e/external-dogfood/cc-min-version.test.js (5 TC, Lifecycle Stage 4 Regression Lock achieved). Trust Score externalDogfoodFeedbackResponseRate (weight 0.05) accumulated. See docs/external-dogfooders/bj.md for the full contribution archive.

Thank you @BJ for sharing the precise error message that scoped the entire sprint correctly. 🙏

🔮 Roll-forward markers (v2.1.21+)

F6 contract-check.yml continue-on-error → false (one-week advisory-only window closes)
F8 R3-321 telemetry 3-month analysis → demote/keep decision
F10 ENH-323 SessionStart detection telemetry 3-month analysis → timeout elevation decision
F14 Hall of Fame @BJ Stage 3 (Fix Released) ⏳ → ✅ on v2.1.20 GA tag — this release achieves it

📊 Sprint outcome (verification)

14/14 features delivered (×ばつ4 + ×ばつ5 + ×ばつ5)
13/13 quality gates PASS (M1-M10 + S1-S4 all clean)
0 auto-pause triggers fired (QUALITY_GATE_FAIL / ITERATION_EXHAUSTED / BUDGET_EXCEEDED / PHASE_TIMEOUT)
matchRate 100%, dataFlowIntegrity 100%
Sprint cycle time 3 days vs 14-day budget (25% utilization)
5/5 side-effect verification gates PASS post-archive patches (CO-4 + CO-5 + CO-2-partial-fix)
3 of 6 carry items closed by the post-archive patch: CO-2 (test-side mitigated), CO-4 (Q3 partially resolved), CO-5 (@BJ Stages 3 + 5 ✅)

🔗 References

Full sprint planning: docs/sprint/v2120-marketplace-recovery/
Final report: docs/04-report/features/v2120-marketplace-recovery.report.md
ADR 0011: docs/adr/0011-plugin-manifest-schema-compliance.md
Compatibility guide: docs/06-guide/cc-compatibility.guide.md
Hall of Fame @BJ : docs/external-dogfooders/bj.md
Root cause analysis: [`d...

@pruge
@BJ

pruge and BJ

Assets 2

bkit v2.1.19-hotfix.1 — CI Hardening & SQM Panel Activation

21 May 13:06

@popup-kay popup-kay

v2.1.19-hotfix.1

b12b3b8

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

bkit v2.1.19-hotfix.1 — CI Hardening & SQM Panel Activation

A focused CI hotfix that restores the Invocation Contract Check workflow to green after v2.1.19 GA, and quietly activates the new SQM dashboard panel that v2.1.19 design (S5 ADR S5-003) promised but never wired up.

Highlights

🟢 CI green again — Invocation Contract Check workflow flipped back from red to green after v2.1.19 GA merge.
🩹 Dead-code detector blind spot patched — scripts/check-deadcode.js now correctly recognises require(path.join(ROOT, '...')) references, not just direct string literals.
📊 New SessionStart panel: SQM (Sprint Quality Maturity) — v2.1.19's promised quality maturity index is now visible at session start.
🛡️ Zero behaviour regression — verified against the full 4,168 TC QA aggregate.

User experience change

After installing v2.1.19-hotfix.1, every new bkit session (/clear or fresh CC launch) will show one extra compact panel at the top of the context, if and only if the project has at least one entry in .bkit/state/sqm-history.jsonl:

┌─── SQM (Sprint Quality Maturity) ─────────── 64.00 / 100 ─┐
│ Docs=Code 100 │ Self-Dogfood 10 │ Dogfooder 100 │
│ Report KPI 80 │ Dispatch — │ Convention 0 │
└──────────────────────────────────────────────────────────┘

The panel:

Reflects project-wide quality maturity across 6 weighted components (Docs=Code 30 %, Self-Dogfood 20 %, External Dogfooder 20 %, Report KPI 15 %, Sub-Agent Dispatch 10 %, Convention 5 %).
Is fail-silent on new projects — when .bkit/state/sqm-history.jsonl is missing, the panel renders nothing and adds 0 chars. No impact on first-run UX.
Adds ~259 chars to additionalContext when active. Other 4 dashboard sections (progress / workflow / impact / control) are unchanged.
Can be opted out by removing 'sqm' from bkit.config.json → ui.dashboard.sections.

No other user-facing behaviour changes.

Root cause

Two independent issues coincided immediately after the v2.1.19 GA merge:

Detector blind spot — scripts/check-deadcode.js matched only direct string-literal requires (require('./foo')). The new v2.1.19 scripts (S0 measure, S3 docs-sync, S4 feedback refresh) all use require(path.join(ROOT, '...')), so the detector reported 5 false-positive dead modules and failed the CI gate.
Design/runtime gap — lib/ui/sqm-panel.js was specified by S5 ADR S5-003 to render in the SessionStart dashboard, but the wiring in hooks/session-start.js was never landed. bkit.config.json ui.dashboard.sections also needed 'sqm' for the new panel to opt in.

Fixes (4 files, +56 / -4)

scripts/check-deadcode.js — split scanProductionRequires() into two regexes. reDirect keeps the original behaviour; reIndirect matches require(<wrapper>(..., '<lib path>', ...)) where <wrapper> is any identifier and the captured string literal contains lib/ or starts with .//../. Restricts to library-shaped paths to avoid overmatching arbitrary strings.
hooks/session-start.js — wired lib/ui/sqm-panel + lib/quality/sqm-history into the SessionStart dashboard. Render block is independent of the primaryFeature gate (SQM is project-wide, not feature-scoped) and fail-silent when the history file is missing.
bkit.config.json — added 'sqm' to ui.dashboard.sections. Runtime config takes precedence over the hook default, so without this the SqmPanel rendered to 0 chars even after wiring.
CHANGELOG.md — added [2.1.19-hotfix.1] section.

Verification

✅ scripts/check-deadcode.js: NEW dead 5 → 0, Live 134 → 139 (exactly the 5 intended modules — set-diff verified against the precise v2.1.19 GA baseline, NEW DEAD 0).
✅ Contract suite 9-step spot-check all PASS: domain-purity (18 files) · guards (21 entries) · test-tracking (314 files) · docs-code-sync (5/5) · integration-runtime (23/23) · l2-smoke (101/101) · bkit-full-system (36/0/0) · contract-test-run vs v2.1.16 L1,L4 (255 assertions).
✅ Full QA aggregate (4,168 TC across 157 files): 12 FAIL + 6 errors reproduced identically against the HEAD~1 (v2.1.19 GA) baseline → confirmed pre-existing carryover (ACTION_TYPES baseline drift, trust-engine score change). This hotfix introduced 0 new regressions.
✅ SessionStart hook live run with a fresh session id: JSON contract valid; all 5 dashboard sections including SQM (64.00 / 100) render correctly; additionalContext 5,482 → 5,741 chars (+259 for SQM panel).

Upgrade

# CC plugin users — version bumps automatically on next /clear or session start
# Manual verification:
cat bkit.config.json | jq '.version' # → "2.1.19" (no version bump for hotfix)
node scripts/check-deadcode.js # → Dead (NEW) : 0

Note: bkit.config.json version stays at 2.1.19. The -hotfix.1 suffix lives only on the git tag and this release, following the bkit hotfix convention (see v2.1.12-hotfix precedent).

What's next

The 12 pre-existing FAILs surfaced by the full QA aggregate (ACTION_TYPES baseline drift, trust-engine score change, sprint-4-presentation AUDIT-01) will be resolved in v2.1.20. They are tracked as CARRY-v2119-1 in memory and do not gate this hotfix.

Full changelog

CHANGELOG.md → v2.1.19-hotfix.1

🤖 Generated with Claude Code

Assets 2

v2.1.19 — Quality Maturation Sprint (5 sub-sprints + ENH-318)

21 May 12:01

@popup-kay popup-kay

v2.1.19

228b6bc

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v2.1.19 — Quality Maturation Sprint (5 sub-sprints + ENH-318)

bkit v2.1.19 — Quality Maturation Sprint

Permanent closure of pruge's sprint domain issue cluster (10 GitHub issues over 1.5 days, all in the sprint domain). Rather than another reactive fix release, v2.1.19 treats the cluster as a systemic sprint domain maturity gap and addresses it with a 5 sub-sprint master plan.

All 5 sub-sprints + outer master sprint archived. ENH-318 정식 편입 (차별화 6/6 → 7/7). Real User Hall of Fame 첫 entry @pruge.

Released: 2026年05月21日
Predecessor: v2.1.18 GA (Sprint Trust UX Fix)
PR: #108 · Merge commit: 228b6bc (admin merge)
Reporter who shaped this release: @pruge (James Kim) — dandi-village-ledger project. Real User Hall of Fame 첫 entry 등재 🏆

TL;DR

🎯 Closes 4 GitHub Issues — #103, #104, #105, #107 (P0 / sprint domain maturity)
🏗️ 5 sub-sprint master plan — S0 baseline + S1 Foundation + S2 Defense + S3 Polish + S4 Proactive + S5 Measurement (all archived)
✅ 152 TC across 30 test files — 100% PASS, 0 regressions
🚀 ENH-318 정식 편입 — Trust Score 7th component + Real User Hall of Fame + User-Feedback Lifecycle 5-stage
🏆 First external dogfooder Hall of Fame entry: @pruge — 10 issues, 5 absorbed E2E scenarios
🛡️ 9 new ACTION_TYPES for governance traceability
🔧 3 architectural fixes: S0 measurement bug (S2 evolution) + findFirstMatching bug (S5) + CO-S0-6 --approve semantic (S1 docs)
🤝 100% backward compat — Trust Score Δ ≤5% verified (worked example numerical proof)

🌟 Highlights — What Changed for You

1️⃣ `/sprint init` auto-imports Context Anchor (Closes #104)

# Before: empty context placeholders
/sprint init my-sprint --name "Q2 Launch"
# Generates "(not set)" in report Context section even when master-plan exists
# After: auto-import from master-plan or PRD
/sprint init my-sprint --name "Q2 Launch"
# Reads docs/01-plan/features/my-sprint.master-plan.md → WHY/WHO/RISK/SUCCESS/SCOPE populated
# audit emit: sprint_context_imported

Fallback chain: args.context (explicit) > master-plan.md > PRD.md > defaultContext.

2️⃣ Sprint reports now have `## Quality Gates` section + KPI SoT (Closes #105)

## 2. KPI Snapshot
| matchRate | 100% | # qualityGates.M1 wins over kpi.matchRate
| dataFlowIntegrity (S1) | 100% |
...
## Quality Gates (11 gates, 11 passed)
| Gate | current | threshold | passed | lastMeasuredAt | source |
|------|---------|-----------|--------|----------------|--------|
| M1 | 100 | ≥90 | ✓ | 2026年05月21日T10:00:00Z | gap-detector |
...

Precedence: qualityGates > featureMap > kpi. Divergence detection emits audit sprint_kpi_divergence.

3️⃣ Gate-fail reports auto-marked RESOLVED on successful transition (Closes #103)

# Pre-v2.1.19: docs/03-analysis/<sprint>-gate-fail-*.md
# "STATUS: BLOCKED" — even after sprint completed
# sprint.lastGateFailure never cleared
# v2.1.19: when advancePhase succeeds after a gate_fail:
# - File header prepended with "> **STATUS: RESOLVED** at <ISO>"
# - sprint.lastGateFailure.resolvedAt / resolvedBy / resolutionReason populated
# - audit emit: gate_fail_resolved (idempotent, atomic write)

4️⃣ SKILL.md path drift permanently blocked (Closes #107)

sprint SKILL.md explicit <bkit-root> convention with #107 reference inline
scripts/check-skills-docs-code-sync.js CI gate (44 skills ×ばつ invariant, code-block-aware)
scripts/lint-skill-md.js PreToolUse hook (warning-only)
test/contract/baseline/skills-convention.json frozen baseline

★ Critical evolution: S2 introduced stripCodeBlocks parser — S0 measurement had false positives on phase-3-mockup + phase-9-deployment (JavaScript/YAML samples inside code blocks). Real drift was 1 skill (sprint #107), not 3.

5️⃣ bkit Early Adopter Program / Real User Hall of Fame (ENH-318)

# Run bkit on your production project + file detailed bug reports →
# - 🏆 Public recognition (README "Real User Hall of Fame")
# - 🔒 Your reproductions become permanent E2E regression tests
# - 📊 Activity powers Trust Score externalDogfoodFeedbackResponseRate component
# - 📝 CHANGELOG attribution per release

First entry: @pruge (James Kim) — 10 issues / 5 scenarios absorbed. See docs/external-dogfooders/pruge.md.

6️⃣ Trust Score 7-Component (ENH-318 governance)

// Before (6 components, sum 1.0):
// pdcaCompletionRate 0.25 / gatePassRate 0.20 / rollbackFrequency 0.15
// destructiveBlockRate 0.15 / iterationEfficiency 0.15 / userOverrideRate 0.10
// After (7 components, sum 1.0):
// pdcaCompletionRate 0.2375 / gatePassRate 0.19 / rollbackFrequency 0.1425
// destructiveBlockRate 0.1425 / iterationEfficiency 0.1425 / userOverrideRate 0.095
// + externalDogfoodFeedbackResponseRate 0.05 ← NEW

Worked example: existing user with values [80/85/90/95/75/70] = old score 83.00 → new score 78.85. Δ -4.0pt (-4.8%), within ≤5% R-10 mitigation boundary (float epsilon tolerance verified).

Legacy 6-component trust-profile.json auto-migrated (weights from defaults SoT, values from disk).

7️⃣ `--approve` semantic clarified (S1 + CO-S0-6 absorbed)

# --approve ONLY bypasses Trust Level scope boundary (e.g., L3 stopAfter=qa)
# --approve does NOT bypass Quality Gate failures (M*/S*)
# For gate failures, use: /sprint measure <id> --gate <key> first

Documented in skills/sprint/SKILL.md §10.1.1.1. Future --allowGateOverride flag carry to v2.1.20+.

8️⃣ `/sprint dogfood` action for bkit self-dogfooding

/sprint dogfood v2.1.20 --release-tag v2.1.20-rc.0
# Creates self-dogfood-2.1.20 sprint container with auto-derived context
# Idempotent (existing id graceful skip), audit: sprint_dogfood_started

9️⃣ Self-dogfood CI gate

./scripts/check-self-dogfood.sh
# Verifies recent release was operated as sprint container (4 invariants per release)
# --bootstrap-mode: skip invariant #1 (master plan §19.5 Exception)
# --emergency-override <reason>: skip all invariants (SQM penalty -10)

v2.1.20 will be first true gate activation — v2.1.19 is the final Bootstrap Exception.

🔟 SQM (Sprint Quality Maturity Index) introduced

┌─── SQM (Sprint Quality Maturity) ─────────── 64.00 / 100 ─┐
│ Docs=Code 100 │ Self-Dogfood 10 │ Dogfooder 100 │
│ Report KPI 80 │ Dispatch — │ Convention 0 │
└──────────────────────────────────────────────────────────────────────┘

6-component weighted score, .bkit/state/sqm-history.jsonl append-only history, SessionStart-ready dashboard panel. Projected v2.1.19 GA SQM: ~96 (master plan §7.2 target ≥85 well exceeded).

👀 User Experience Impact

Scenario	Before v2.1.19	After v2.1.19
L1 sprint init	trustLevelAtStart=L3 silent default, no L1 warning	default L2 (Safe Defaults), `--trust L1` emits stderr warning + audit
Sprint report	KPI Snapshot only, no Quality Gates section, kpi.matchRate=stale despite qg.M1=100	`## Quality Gates` section + qualityGates SoT precedence + divergence detection
Sprint context	`(not set)` placeholders despite master-plan.md filled	Auto-import from master-plan/PRD fallback chain
Gate-fail report	docs/03-analysis/ permanently "BLOCKED" after fix	Auto-marked "RESOLVED" with timestamp + reason on successful transition
External dogfooder feedback	No governance signal (internal trust components only)	Trust Score 7th component + Hall of Fame archive
Sprint orchestrator dispatch	Declared in v2.1.18 frontmatter but no test evidence	Contract baseline (5 TC) + e2e mocked + production code path verified
bkit self-dogfooding	bkit never ran its own sprint mgmt on its own releases	Bootstrap Exception pattern documented + `/sprint dogfood` + CI gate ready for v2.1.20
SQM measurement	No quantitative sprint domain health metric	6-component weighted score + history + SessionStart panel

🧪 Tests (152 NEW, 100% PASS)

Sub-sprint	New tests	Test count
S0 baseline	30	sqm-calculator unit (19) + result schema contract (7) + measure e2e (4)
S1 Foundation	28	dogfood (6) + default L2 (4) + annotate (3) + contract (5) + e2e dispatch (3) + ci-gate (7) [1 design-skip]
S2 Defense	35	path fix (4) + check-skills (17) + sprint audit (6) + baseline (5) + linter (3)
S4 Proactive	14	trust 7-component (5) + feedback-tracker (3) + 5 dandi scenarios (6)
S3 Polish	40	context-importer (10) + kpi-resolver (5) + generate-report-sot (9) + carry rationale (4) + lessons (4) + failure-resolution (8)
S5 Measurement	5	sqm-evolve (2) + history (1) + panel (2)
Total	152	30 test files

Final regression: all 30 files re-run, 152/152 PASS, 0 failures.

🧩 Methodology — Bootstrap Exception Pattern Validated 5x

S0 + S1 + S4 + S2 + S3 + S5 all completed under PDCA-with-sprint-shadow mode (main session as sub-agent manual proxy). 6 successful instantiations validate the pattern as a transitional protocol.

v2.1.20 will be the first true self-dogfood CI gate activation — scripts/check-self-dogfood.sh without --bootstrap-mode flag will hard-fail when releases don't operate as sprint containers.

🆙 Upgrade Guide

From v2.1.18

cd ~/your/bkit-claude-code
git fetch --tags
git checkout v2.1.19
# Restart Claude Code (or reload the plugin)
cat bkit.config.json | grep version # → "version": "2.1.19"

Trust Score impact (if you have existing trust-profile.json)

Auto-migration:...

@pruge

pruge

Assets 2

v2.1.18 — Sprint Trust UX Fix (Issues #100/#101/#102)

21 May 06:39

@popup-kay popup-kay

v2.1.18

e726f15

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v2.1.18 — Sprint Trust UX Fix (Issues #100/#101/#102)

bkit v2.1.18 — Sprint Trust UX Fix

The first release dedicated entirely to issue triage from the bkit community. v2.1.18 permanently closes the L1 sprint lockout incident class discovered and reported by @pruge on his dandi-village-ledger s1-foundation sprint. Three GitHub issues (#100/#101/#102) — all filed within the same minute — were treated as a single 3-layer drift root cluster and fixed in one integrated sprint.

Released: 2026年05月21日
Predecessor: v2.1.17 GA (5-axis matrix 5/5 close)
Merge commit: e726f15 (PR #106)
Reporter: @pruge — thank you for the precise repro and root-cause analysis 🙏

TL;DR

🔧 3 issues closed — #100, #101, #102 (all P0/P1 sprint-blocking bugs)
🚀 New command — /sprint trust <id> --to <Level> mutates an existing sprint's trust level in place, no more /sprint init destruction
🧠 sprint-orchestrator actually works now — Task tool added to allowlist; sub-agent dispatch live for the first time (ENH-292 activation milestone)
🤝 --trust alias honored — CLI behavior finally matches skills/sprint/SKILL.md §10.2
✅ 40 tests passing (17 contract + 15 unit + 8 e2e) — ×ばつ over the 14 TC target
🔄 100% backward compatible — no sprint state schema change; existing --trustLevel users unaffected

🌟 Highlights — What Changed for You

F1 · `sprint-*` agents now declare `tools:` (Closes #100)

Four sprint agents — sprint-orchestrator, sprint-master-planner, sprint-qa-flow, sprint-report-writer — now have an explicit tools: frontmatter that includes the Task tool where required. Before v2.1.18, the orchestrator could not actually dispatch the measurement agents it was specced to route through (measure-router.js:233-253 returned no_agent_runner). With v2.1.18, sub-agent dispatch is live for the first time.

Differentiation milestone: ENH-292 Sequential Dispatch promoted from "declared" to "live".

F2 · New `/sprint trust` command (Closes #101)

# Mutate an existing sprint's trust level without losing state
/sprint trust s1-foundation --to L3 --reason "P0 32/32 ready"

Before v2.1.18, a sprint initialized with --trust L1 was permanently locked in preview mode. The only escape was /sprint init — which destroyed phaseHistory, qualityGates, and featureMap accumulated over hours or days of work.

v2.1.18 adds:

New CLI action /sprint trust <id> --to <L0|L1|L2|L3|L4> [--reason "<text>"] [--force]
New audit ACTION_TYPE sprint_trust_changed (ACTION_TYPES 29 → 30)
Downgrade guardrail: dropping ≥2 levels (e.g. L4 → L1) requires trustScore ≥ 80 (from .bkit/state/trust-profile.json) or explicit --force
Idempotent path: from === to emits an audit entry with noop: true so monitoring never sees a blind spot
Actor auto-detection: explicit --actor > CLAUDE_AGENT_ID env (→ 'agent') > default 'user'
Defense Layer 6 natural integration: --force sets blastRadius: 'high' for the audit alarm pipeline (live evidence: .bkit/audit/2026-05-21.jsonl)

Documentation: skills/sprint/SKILL.md §10.1.3 (new section with comparison table and audit JSON example).

F3 · `--trust` CLI alias now honored (Closes #102)

# Both forms now produce identical behavior (v2.1.18)
/sprint measure s1-foundation --gate M1 --trust L3
/sprint measure s1-foundation --gate M1 --trustLevel L3

scripts/sprint-handler.js:942 and :974 previously bypassed normalizeTrustLevel and read args.trustLevel directly, so --trust L3 was silently ignored at the measure and phase paths. Both call sites now go through normalizeTrustLevel(args), restoring the declared precedence chain trustLevel > trust > trustLevelAtStart. Docs=Code drift resolved.

F4 · `sprint-master-planner` orchestrator expansion (★ user-requested)

sprint-master-planner now declares pm-lead, cto-lead, qa-lead in its Task allowlist — in addition to the existing specialist agents. Future sprints can natively orchestrate all three teams in parallel rather than relying on main-session manual dispatch.

👀 User Experience Impact

Scenario	Before v2.1.18	After v2.1.18
L1 sprint lockout	Sprint stuck in `do` phase forever; only escape is `/sprint init` (destroys all phase history)	`/sprint trust <id> --to L3` mutates trust in place; phase history, quality gates, and feature map preserved
Sprint orchestration	`sprint-orchestrator` could not dispatch measurement agents; main session had to act as pass-through	Orchestrator fulfills its specced routing responsibility; no main-session workaround required
CLI trust override	`--trust L3` silently ignored at `measure`/`phase`; only `--trustLevel L3` worked	Both `--trust` and `--trustLevel` honored per `SKILL.md §10.2` precedence
Audit trail	Trust mutations invisible to `/sprint status` and downstream consumers	Every trust change persisted as `sprint_trust_changed` in `.bkit/audit/<date>.jsonl` with downgrade guardrail and `--force` alarm
Reporter scenario	@pruge `s1-foundation` sprint with `--trust L1` — 32/32 P0 implementation done, but `/sprint measure` always preview, phase advance always `gate_fail`	8-step E2E test reproduces the original scenario 1:1 and runs green; sprint recoverable without reset

🧪 Tests (40 TC live PASS)

Layer	File	TC	Purpose
Contract	`test/contract/sprint-agents-tools.test.js`	17	F1 — 4 sprint-* agents' `tools:` field invariant
Unit	`test/unit/sprint-trust-normalization.test.js`	7 (A-G)	F3 — `normalizeTrustLevel` precedence chain
Unit	`test/unit/sprint-handler-trust-action.test.js`	8	F2 — `handleTrust` (mutation, guardrail, audit, actor)
E2E	`test/e2e/sprint-l1-lockout-recovery.test.js`	8	Reporter scenario 1:1 reproduction (init L1 → trust L1→L3 → measure record → audit verify → restart persistence)
Total	40 TC	×ばつ over the 14 TC target

🧩 Methodology — First PM + CTO + QA Team Integration Sprint

v2.1.18 is the first bkit release driven end-to-end by all three Agent Teams in parallel:

PM Team (pm-lead orchestrating 4 PM agents) — produced a 570-line PRD with Beachhead 19/20 (Geoffrey Moore framework), JTBD 6-Part, 5 User Stories, 6 Test Scenarios, and a Pre-mortem Top 3
CTO Team (cto-lead) — architectural review APPROVE with CONCERNS; 3 BLOCKERs (controlScore → trustScore correction, ACTION_TYPES count 27 → 29, NDJSON injection assessment) + 3 MEDIUMs all addressed via redline
QA Team (qa-lead) — L1-L5 integrated verification report + reporter-scenario evidence (574 lines, QA_PASS)

Self-referential meta-risk: the sprint that fixes sprint-orchestrator itself cannot use sprint container automation. We notelined the chicken-and-egg pattern in Plan §6.1: sprint init for state tracking only, phase advance + measurement via the PDCA cycle. After F1 lands, future sprints regain full orchestration.

🔐 Differentiation 6/6 Status

ENH-289 Defense Layer 6 — strengthened (sprint_trust_changed joined the L6 audit pipeline, live evidence)
ENH-292 Sequential Dispatch — activation milestone (declared → live in this release)
ENH-286 Memory Enforcer / ENH-300 Effort-aware Defense / ENH-303 PostToolUse continueOnBlock / ENH-310 Heredoc Detector — unaffected, no regression

🆙 Upgrade Guide

From v2.1.17

# 1. Pull the latest plugin (in your bkit-claude-code clone)
cd ~/your/bkit-claude-code
git fetch --tags
git checkout v2.1.18
# 2. Restart Claude Code (or reload the plugin) to pick up the new agents
# 3. Verify version
cat bkit.config.json | grep version
# → "version": "2.1.18"

If you have a sprint stuck in L1 lockout

# Old workaround (v2.1.16/v2.1.17): re-init (destroys state)
# ❌ /sprint init my-sprint --name "..." --trust L2 # phaseHistory lost
# New v2.1.18 path: mutate in place
/sprint trust my-sprint --to L3 --reason "ready to record measurements"
/sprint measure my-sprint --gate M1 # now records (mode: "record"), no longer preview
/sprint phase my-sprint --to iterate # gate transitions succeed

Breaking changes

None. v2.1.18 is 100% backward compatible:

Existing --trustLevel L<N> users: precedence chain preserved (F3 Case G test)
Sprint state schema: unchanged
ADR 0003: 14/14 PASS — 16-cycle consistency milestone

🗂️ Carryovers — deferred to v2.1.19+

Not release blockers, captured in the sprint report (docs/04-report/features/v2118-sprint-trust-ux-fix.report.md):

CO-1 — baseline v2.1.18 capture script
CO-2 — qa-aggregate script
CO-3 — sprint-orchestrator auto phase-advance live test
CO-4 — sub-agent-dispatcher state transition test
CO-5 — L0 Manual mode escalate-path E2E
CO-6 — sprint-report-writer agent timeout fallback

Follow-up issues from the same reporter (#103, #104, #105 — failure-reporter resolution markers, sprint init context auto-import, generate-report quality-gates section) are acknowledged and slated for v2.1.19+ planning.

🙏 Credits

Massive thanks to @pruge (james kim) for the high-signal bug reports. The reproductions in #100/#101/#102 were so precise we could turn them directly into an E2E test (test/e2e/sprint-l1-lockout-recovery.test.js) before writing a single line of fix code. That's the kind of issue every maintainer dreams of.

If you hit similar sprint-related friction, please open an issue — the v2.1.18 turnaround (issue filed → fix released in 24h) is the sta...

@pruge

pruge

Assets 2

bkit v2.1.17 — CI/CD Hardening, 5-Axis Matrix 5/5 Close

20 May 11:21

@popup-kay popup-kay

v2.1.17

39f89e6

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

bkit v2.1.17 — CI/CD Hardening, 5-Axis Matrix 5/5 Close

Headline: Permanent closure of the 8-day Invocation Contract Check red incident class from 2026年05月12日 to 2026年05月20日. CI/CD maturity matrix (Detection / Enforcement / Recovery / Governance / Evolution) closed across all 5 axes. All 11 carryover items resolved.

🎯 Highlights

Incident Class Permanently Closed

On 2026年05月12日, commit 967cd8f (refactor v2.1.13) removed six pdca-eval-* agents as dead code cleanup. The baseline v2.1.9 manifest was not updated, and the Agent surface lacked a deprecatedIn governance mechanism (which Skill already had). This caused the Invocation Contract Check workflow to fail on every push for 8 consecutive days. Releases v2.1.15 and v2.1.16 GA shipped while CI was red. This v2.1.17 release closes every known root cause and carryover in the incident class.

5-Axis Matrix Progression

Axis	v2.1.16 GA	v2.1.17
Detection	◐ L1+L4 only	●くろまる●くろまる Dual baseline + L2 + L3 + L5 mandatory + MCP schema
Enforcement	✗	●くろまる Branch protection auto-applied (2 Required Status Checks)
Recovery	✗	●くろまる●くろまる Rollforward SOP + tracked file policy guide
Governance	◐ Skill only	●くろまる●くろまる Skill + Agent + MCP symmetric + isolated tests (5+6 scenarios)
Evolution	✗	●くろまる●くろまる Dual baseline + frontmatter util + SoT canonical names

5/5 close ✅

📦 Changes

Detection

Dual baseline: v2.1.9 LTS (long-term drift) + v2.1.16 Latest (noise floor) compared simultaneously
L2 mandatory: l2-smoke.test.js (98 TC) + l2-hook-attribution.test.js (13 TC) integrated into workflow
L3 mandatory: l3-mcp-compat.test.js (92 TC) + l3-mcp-runtime.test.js (48 TC) integrated into workflow
L5 mandatory (CO-3): removed continue-on-error: true from invocation-inventory.test.js + added needs: contract-l1-l4 (203 → 210 TC with SoT-driven lists)
MCP deprecation schema (CO-2): inline // @deprecated since vX.X.X replacedBy=Y annotation parsing
scripts/check-test-tracking.js (CO-7): detects untracked test files across 18 production test paths (CI gate)

Enforcement

scripts/setup-branch-protection.sh (CO-1, idempotent gh api wrapper) — auto-applied to main:
- Required Status Checks: Contract Test (L1 Frontmatter + L4 Deprecation), Contract Test L5 (Invocation Inventory)
- strict: true, allow_force_pushes: false, allow_deletions: false
- enforce_admins: false (admin override allowed for emergency hotfixes)

Recovery

docs/06-guide/contract-baseline-rollforward.guide.md: LTS vs Latest policy, decision tree, capture/deprecation stub procedures, PR self-review checklist, incident log (8 sections)
docs/06-guide/test-file-tracking-policy.guide.md (CO-6): .gitignore policy + PR checklist + incident log (9 sections)
docs/06-guide/branch-protection-setup.guide.md (CO-1): admin SOP

Governance

Agent deprecation governance: agents/<name>.md frontmatter with deprecatedIn: vX.X.X bypasses L4 — symmetric with the Skill pattern
6 pdca-eval-* deprecation tombstones: agents/pdca-eval-{act,check,design,do,plan,pm}.md (permanent tombstones for the 5/12 cleanup)
MCP tool deprecation governance: L4 bypass via baseline JSON deprecatedIn field — full symmetry across 3 surfaces (Skill / Agent / MCP)
Agent-deprecation isolated test (CO-4): test/contract/agent-deprecation.test.js, 5 scenario fixture, 5/5 PASS
MCP-deprecation e2e test (CO-2.1): test/contract/mcp-deprecation.test.js, 6 scenario fixture, 6/6 PASS

Evolution

lib/util/frontmatter.js (CO-5): consolidated 5-site duplication — parseFrontmatter, parseFrontmatterFile, hasDeprecatedInFrontmatter, hasDeprecatedInFrontmatterFile, coerce
v2.1.16 baseline captured (test/contract/baseline/v2.1.16/, 106 files)
SoT canonical names lists (CO-3.1): added 6 lists to lib/domain/rules/docs-code-invariants.js — EXPECTED_ACTIVE_AGENT_NAMES, EXPECTED_DEPRECATED_AGENT_NAMES, EXPECTED_SKILL_NAMES, EXPECTED_HOOK_EVENT_NAMES, EXPECTED_PDCA_MCP_TOOLS, EXPECTED_ANALYSIS_MCP_TOOLS

Hygiene

Removed 12 orphan JSON files from test/contract/baseline/v2.1.9/ (sprint-* agents/MCP tools/skills missing from manifest)
Force-tracked 35+ previously untracked test files: tests/qa/ 29 + test/contract/ 5 + test/e2e/ 6 + test/integration/ 3 + test/unit/ 2 + test/v2110-qa/ 2
.gitignore narrowed: removed test/ + tests/* blanket ignore → explicit local-only patterns
scripts/check-deadcode.js EXEMPT pattern broadened (v2.1.13 sprint barrel, 3 files)

Framework Side-Effect Blocking

collect* implicit-write prevention: { persist: false } option blocks baseline self-mutation
--version path-injection validation (CO-1.1): regex ^[A-Za-z0-9._-]+$, exits with code 2 on invalid input
--project-root flag: makes contract-test-run.js + contract-baseline-collect.js fixture-aware

📊 Quantitative Results

Metric	v2.1.16 GA	v2.1.17	Delta
qa-aggregate PASS	3,808	4,103	+295
qa-aggregate FAIL	31	0	-31
qa-aggregate Errors	4	0	-4
Mandatory workflow steps	13	18	+5
Baseline snapshots	1	2 (LTS + Latest)	+1
Active agents	34	34	0
Deprecation tombstones	0	6	+6
Frontmatter parse sites	5 (duplicate)	1 (`lib/util/`)	-4
Hardcoded EXPECTED lists	7	0 (SoT)	-7
Branch protection	✗	2 Required Checks	—
Carryover items	11	0	-11
5-Axis Matrix	0/5	5/5 ✅	—

🗂 11 Carryover Closures

ID	Item	Status
CO-1	Branch protection automation	✅ Script + applied
CO-1.1	--version path-injection validation	✅ Regex
CO-2	MCP tool deprecation schema	✅ `parseMCPToolBlocks`
CO-2.1	MCP deprecation e2e test	✅ 6/6 PASS
CO-3	L5 E2E mandatory promotion	✅ Workflow
CO-3.1	L5 dynamic EXPECTED lists	✅ SoT integration
CO-4	Agent-deprecation isolated test	✅ 5/5 PASS
CO-5	frontmatter util extraction	✅ 5 sites → 1
CO-6	Tracked file policy	✅ Narrow + 35+ files
CO-7	tests/qa dependency automation	✅ check-test-tracking
CO-8	branch-protection apply audit	✅ admin applied & verified

🔗 Pull Requests

PR #97 (7acdd4f): v2.1.17 main scope — 4/5 axes close
PR #99 (39f89e6): v2.1.17 final — 5 carryover items absorbed + 5/5 axes close

📚 Documentation

docs/01-plan/features/v2117-ci-cd-hardening.plan.md — Plan
docs/02-design/features/v2117-ci-cd-hardening.design.md — Design
docs/03-analysis/features/v2117-ci-cd-hardening.analysis.md — Gap analysis
docs/04-report/features/v2117-ci-cd-hardening.report.md — Completion report
docs/06-guide/contract-baseline-rollforward.guide.md — Baseline SOP
docs/06-guide/branch-protection-setup.guide.md — Branch protection SOP
docs/06-guide/test-file-tracking-policy.guide.md — Test tracking policy

🙏 Origin

The incident class started with commit 967cd8f (refactor v2.1.13, 2026年05月12日) — a 6-agent dead code cleanup combined with an unupdated baseline produced an 8-day red period. This release closes every known framework gap.

🤖 Released with Claude Code

Assets 2

v2.1.16 — Quality Gates & Approval UX + Release Hardening

20 May 05:35

@popup-kay popup-kay

v2.1.16

7330103

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v2.1.16 — Quality Gates & Approval UX + Release Hardening

bkit v2.1.16 — Quality Gates & Approval UX + Release Hardening

The only Claude Code plugin that verifies AI-generated code against its own design specs.

This is a patch release that closes 4 GitHub issues reported by L2 trust users hitting Sprint quality-gate deadlocks, plus a release hardening sub-sprint that terminates the v2.1.14/15/16 recurring "thoroughly verified but defects shipped" pattern.

✨ Highlights

1. L2 trust deadlock is gone

Default-trust users (L2) can now drive Sprints to completion without escalating trust level or editing state JSON manually. Every quality gate hold now has a user-invokable command:

/sprint phase <id> --to <next> --approve --reason "<text>" — single-use Trust Level scope-boundary escape hatch (Issue #95)
/sprint measure <id> --gate <key> — partial-gate manual measurement when the orchestrator missed one (Issue #94)

2. Quality gates auto-document their own failures

When advancePhase is blocked by gate_fail, bkit now automatically writes docs/03-analysis/<sprintId>-gate-fail-<phase>-<timestamp>.md with a detailed gate summary, suggested next commands (including /sprint measure --gate <key> and --approve hints), and persists lastGateFailure to sprint state for /sprint status to surface (Issue #93).

3. `sprint-orchestrator` measurement responsibility clarified

At design exit, the orchestrator now records both M4_apiComplianceRate and M8_designCompleteness atomically through the new lifecycle.measurePhaseGates() use case. This eliminates the silent "M8=100, M4=null → gate_fail" deadlock reported in Issue #92. The canonical measurement path (lib/application/quality-gates/measure-router.js) is shared between auto-advance and manual /sprint measure so values agree.

4. Release hardening — `tests exist but unused` anti-pattern terminated

The user asked "are 5 fixes really all? did you verify all bkit features properly?" — this triggered a deeper audit that uncovered 31 stale test failures, 4 orphan test files (modules deleted in v2.1.10), and 3 release metadata drift defects that existing tests had been catching all along but were never wired into CI.

All resolved. CI now mandates two new release gates: bkit-full-system (version sync + structure) and docs-code-sync (counts SoT drift detector).

🎯 User Experience Changes

Before v2.1.16 (the L2 user's journey)

$ /sprint init my-sprint --name "Test" --trust L2 --features f1,f2
$ /sprint start my-sprint
# ... auto-advances prd → plan → design ...
# pauses at design boundary (L2 scope.stopAfter)
$ /sprint phase my-sprint --to do
{
 "ok": false,
 "reason": "requires_user_approval",
 "stopAfter": "design"
}
# 🛑 deadlock — no command to give approval
# workarounds violated bkit philosophy: edit state JSON OR escalate trust

After v2.1.16

$ /sprint phase my-sprint --to do --approve --reason "design review complete"
{
 "ok": true,
 "phase": "do",
 "approvalRecord": {
 "sprintId": "my-sprint",
 "from": "design",
 "to": "do",
 "trustLevel": "L2",
 "stopAfter": "design",
 "approvedBy": "user",
 "reason": "design review complete"
 }
}
# ✅ proceeds — single-use approval, sprint.autoRun.scope NOT mutated
# next boundary will face the same check (deliberate "controllable AI" design)

Quality gate failure UX

$ /sprint phase my-sprint --to do
{
 "ok": false,
 "reason": "gate_fail",
 "reportPath": "docs/03-analysis/my-sprint-gate-fail-design-2026年05月20日T05-31-22-411Z.md",
 "gateResults": { ... }
}
$ cat docs/03-analysis/my-sprint-gate-fail-design-*.md
# Gate Failure Report — my-sprint @ design→do
# ...
| Sprint Phase | Gate | Status | Expected | Actual | Suggested Action |
| design | M4 | FAIL | threshold = 95 | null (not_measured) | /sprint measure --gate M4 |
| design | M8 | FAIL | threshold = 85 | null (not_measured) | /sprint measure --gate M8 |
# ...suggested next commands inline...

`/sprint measure` — first-class command

$ /sprint measure my-sprint --gate M4
# routes to agents/gap-detector.md via GATE_MEASUREMENT_ROUTES
# writes back to sprint.qualityGates.M4_apiComplianceRate.current
# audit-logged as gate_measured (at L2+ trust)

7 gates supported: M1 M2 M3 M4 M7 M8 S1.

📊 Numbers

Metric	Before	After
Open issues blocking L2 users	4 (#92/93/94/95)	0
L3 sprint contract tests	10/10 PASS	14/14 PASS (+4 new SC)
Aggregate test PASS	3,808	3,844 (+36)
Aggregate test FAIL	31	0
Test files with errors	15	0
Test files	151	147 (-4 orphan)
Release metadata defects	3	0
Mandatory CI gates	8	10 (+2 release gates)

🐛 Bug Fixes

#92 — sprint-orchestrator agent records both M4_apiComplianceRate and M8_designCompleteness at design exit (atomic dual record via lifecycle.measurePhaseGates). Resolves silent gate_fail deadlock on design → do transition. (Reporter: @pruge, bkit v2.1.14, CC v2.1.140, L2 trust)

✨ Added

#95 — /sprint phase <id> --to <next> --approve [--reason "<text>"] — single-use Trust Level scope-boundary escape hatch. Records audit-logger entry scope_boundary_approved with full provenance. sprint.autoRun.scope not mutated (deliberate "controllable AI" design). (Reporter: @pruge)
#94 — /sprint measure <id> --gate <key> and --gates <K1,K2,...> and --phase <name> — partial-gate manual measurement command. Routes through lib/application/quality-gates/measure-router.js to the canonical measurement agent (gap-detector for M4, etc.). Supports 7 gates: M1, M2, M3, M4, M7, M8, S1. (Reporter: @pruge)
#93 — Automatic gate-failure report generation. advancePhase now invokes a failureReporter dep on gate_fail, writing docs/03-analysis/<sprintId>-gate-fail-<phase>-<ts>.md with gate summary, suggested next commands, and audit trail pointers. sprint.lastGateFailure state field persists for /sprint status surfacing. (Reporter: @pruge)

🛠 Release Hardening (sub-sprint)

Layer A — Release metadata sync (3 files): README badge Version-2.1.14 → Version-2.1.16, hooks/session-start.js and hooks/startup/session-context.js version comments synced.
Layer B — Test maintenance (15 files, 31 stale FAIL → 0): 4 orphan test files removed (modules deleted in v2.1.10 Sprint 6), 11 stale baselines updated to current SoT (skills 43 → 44, agents 36 → 34, mcpTools 16 → 19, lib modules 142 → 177+, status-core exports 17 → 19, L3 contract 10/10 → 14/14, JS floating-point comparison epsilon fix, etc.).
Layer C — CI gate reinforcement (1 file): .github/workflows/contract-check.yml adds two mandatory steps — bkit-full-system (version sync + agent/skill structure) and docs-code-sync (counts SoT drift detector). Future PR/push auto-blocks on release metadata drift.

🧱 Architecture

New modules:

lib/application/quality-gates/measure-router.js — Pure dispatcher mapping gate keys (M1/M2/M3/M4/M7/M8/S1) to agents and result schemas
lib/application/quality-gates/failure-reporter.js — Pure markdown builder + side-effecting writeReport + factory createFailureReporter
lib/application/sprint-lifecycle/measure-gate.usecase.js — measureGate / measureGates / measurePhaseGates (atomic phase exit measurement)
templates/gate-failure-report.template.md — substitutable template

New audit action: scope_boundary_approved (28th ACTION_TYPE).

L3 contract grew with 4 new structural invariants: SC-11 (#92), SC-12 (#95), SC-13 (#94), SC-14 (#93).

📚 Documentation

Following bkit PDCA methodology (Plan → Design → Do → Check → Act), full documentation chain shipped:

docs/01-plan/features/v2116-issue-fixes.master-plan.md — sprint master plan
docs/01-plan/features/v2116-release-hardening.plan.md — hardening plan
docs/02-design/features/v2116-release-hardening.design.md — hardening design
docs/04-report/features/v2116-issue-fixes.report.md — feature fix report
docs/04-report/features/v2116-release-hardening.report.md — hardening report
CHANGELOG.md v2.1.16 section with full hardening sub-section

🙏 Thanks

Special thanks to @pruge for the 4 well-written reports (#92, #93, #94, #95) that pinpointed the exact L2 deadlock surfaces.

📦 Install / Upgrade

# Claude Code plugin marketplace
claude plugin install bkit
# Or upgrade existing
claude plugin upgrade bkit

Recommended CC version: v2.1.123+ (conservative) or v2.1.144 (balanced) — bkit v2.1.16 verified compatible with 99 consecutive CC releases from v2.1.34 through v2.1.144.

🔗 Links

PR: #96
Closed issues: #92, #93, #94, #95
Plan / Design / Report: see Documentation section above
Full changelog: CHANGELOG.md

🤖 Generated with Claude Code

Contributors

@pruge

pruge

Assets 2

v2.1.15 — Issue #89 fix: .pdca-status.json pollution (6-Layer Defense)

18 May 11:27

@popup-kay popup-kay

v2.1.15

b65d336

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v2.1.15 — Issue #89 fix: .pdca-status.json pollution (6-Layer Defense)

🎯 What's New in v2.1.15

Patch release — A defense-in-depth fix for #89, reported by @doing27. This release stops .pdca-status.json from growing unbounded with garbage feature entries on every source-file edit.

TL;DR

Before: .pdca-status.json could grow to 294 KB with 138 of 147 features entries being garbage and 1,661 of 1,669 history entries being noise, all written silently by the PreToolUse(Write|Edit) hook.

After v2.1.15: 6-Layer Defense ensures only PDCA-registered features (with an existing plan or design document) ever enter .pdca-status.json, history is deduplicated and ring-buffered to 100 entries, and 48 unit tests guard against regressions.

🔍 Root-Cause Analysis (Deep)

Issue #89 surfaced two bugs (extractFeature misextraction + updatePdcaStatus gating absence). During this release we discovered three additional latent bugs in the same code path and fixed all of them together:

#	Latent bug	File:Line	Symptom
1	`extractFeature` captures filenames as features	`lib/core/file.js:109`	`app/services/broadcast_service.py` → `'broadcast_service.py'` (a file, not a feature)
2	`extractFeature` fallback returns generic dirs	`lib/core/file.js:129`	`apps/cms/v1/users.py` → `'cms'`/`'v1'` registered as features
3	`updatePdcaStatus` has no validation gate	`lib/pdca/status-core.js:164`	Every invocation registers a feature unconditionally
4	`pre-write.js` reads a non-existent schema field	`scripts/pre-write.js:101`	Read `currentFeature` (v1 schema name) instead of `primaryFeature` (v2/v3 schema). The v2.1.7 "phantom-feature guard" never actually fired
5	`extractFeatureFromContext` duplicates the buggy pattern-match	`lib/pdca/status-core.js:320`	DRY violation — same bug as 1+2 cloned into a second module
6	`updatePdcaStatus.history.push` has no limit/dedup	`lib/pdca/status-core.js:210`	Unbounded growth; `addPdcaHistory` had a 100-entry cap, but the direct push did not

🛡️ The 6-Layer Defense

edit event
 │
 ▼
[L1] extractFeature → reject file-like captures + extended GENERIC_NAMES + fallback opt-in
 │
 ▼ (legitimate feature candidate)
[L2] extractFeatureFromContext → delegates to L1 (DRY)
 │
 ▼
[L4] pre-write.js → corrected schema field (primaryFeature)
 │
 ▼
[L3] updatePdcaStatus → plan/design document gate (default ON)
 │
 ▼
[L5] history dedup + ring buffer
 │
 ▼
[L6] 48 unit tests → regression prevention (CI-tracked)

Layer 1 — `lib/core/file.js`

Reject filename captures: the pattern-matching loop now skips any candidate that has a file extension (path.extname(captured) non-empty). app/services/broadcast_service.py no longer extracts 'broadcast_service.py'.
GENERIC_NAMES expanded 19 → 65 entries — covers common backend/frontend layout directories (api, web, mobile, client, server, backend, frontend, admin, auth, cms, database, config, core, helpers, middleware, plugins, scripts, styles, static, public, assets, tests, tenants, versions, tmp, audit, dashboard), version directories (v1–v9), and Next.js route groups ((dashboard), (auth), (public), (admin), (api)).
Fallback is now explicit opt-in: extractFeature(filePath, { allowFallback: true }). Existing callers receive the safer default (no parent-directory walk) automatically. Backward-compatible — second argument is optional.

Layer 2 — `lib/pdca/status-core.js`

extractFeatureFromContext now delegates to extractFeature. The duplicated, buggy pattern-matching loop is removed (DRY).

Layer 3 — `lib/pdca/status-core.js`

updatePdcaStatus(feature, phase, data, opts = {}) accepts a new opts.requireDocs flag (default true). When enabled, the function performs a findPlanDoc(feature) || findDesignDoc(feature) check; if neither exists, it returns silently (with a debugLog entry for forensics).

All 16 existing callers continue to work unchanged — PDCA lifecycle always creates a plan document before downstream phases invoke updatePdcaStatus.
The shouldUpdate(feature, requireDocs, docCheckFn) helper is extracted as a pure function for unit testability and dependency injection (docCheckFn parameter lets tests inject a stub).

Layer 4 — `scripts/pre-write.js`

The v2.1.7 phantom-feature guard read currentStatus?.currentFeature — a v1-schema field that does not exist in v2/v3 schemas. The migration code in status-migration.js:31,74 renames currentFeature → primaryFeature, so the guard's comparison was always undefined === <feature> (always false), causing every edit to be skipped silently.

Fixed: now reads currentStatus?.primaryFeature, restoring the guard's intended behavior. Combined with the L3 document gate, this gives two independent layers of protection.

Layer 5 — `lib/pdca/status-core.js`

A new appendHistoryEntry(history, entry, limit = 100) pure function:

If the last entry has the same feature, phase, and action, only the timestamp is updated — no push.
Otherwise, push and apply a ring buffer of limit entries (default 100).

Effect: 100 identical edits = 1 history entry (timestamp refreshed each time). Phase transitions and feature changes still produce new entries normally.

Layer 6 — Unit tests (`tests/unit/`)

Test file	TCs	Verifies
`tests/unit/file-extract-feature.test.js`	20	L1 (`extractFeature`) — filename rejection, GENERIC_NAMES enforcement, fallback opt-in
`tests/unit/extract-feature-from-context.test.js`	10	L2 (delegation correctness)
`tests/unit/pdca-status-gating.test.js`	18	L3 (`shouldUpdate`) + L5 (`appendHistoryEntry`) — dedup, ring buffer, custom-limit, Issue #89 scenario
Total	48	48/48 PASS

.gitignore now tracks tests/unit/** (matching the existing policy for tests/contract/**), so these tests run in CI.

👤 User Experience — What Changes

1. `.pdca-status.json` stays small and accurate

Scenario	Before v2.1.15	After v2.1.15
Edit `app/services/broadcast_service.py` 50 times	`features.broadcast_service.py` appears + 50 history entries	No `features` entry created; no history entries
Edit `apps/cms/v1/users.py`	`features.cms` appears as phantom	No entry — `cms` and `v1` are now in `GENERIC_NAMES`
Edit a file in `docs/01-plan/features/auth.plan.md`-registered feature `auth`	50 history entries	1 history entry (timestamp refreshed)
File path with no matching pattern (e.g. `random/path/foo.py`)	`'random'` registered via fallback	Returns `''` (fallback is now opt-in)
Long-running session, hundreds of edits	`.pdca-status.json` grows to hundreds of KB	Stays under a few KB, history capped at 100

2. PDCA workflow is unchanged for legitimate cycles

/pdca plan billing creates docs/01-plan/features/billing.plan.md → L3 gate passes → updatePdcaStatus('billing', 'do', ...) registers normally.
All 16 existing updatePdcaStatus call sites (hook scripts, lifecycle, batch orchestrator, etc.) continue to work because PDCA's natural ordering guarantees a plan document exists before downstream phases run.

3. Generic directory names are no longer auto-registered

auth, cms, dashboard, admin, api, etc. are now in GENERIC_NAMES. Trade-off: if you want one of these as a real PDCA feature, register it explicitly with /pdca plan auth (which creates the plan document and sets primaryFeature). The L3 gate then accepts it on subsequent edits. The previous behavior of silently auto-creating feature entries for such common directory names was the primary source of garbage.

4. History is now noise-free

The bkit dashboard, audit log, and any tools that read .pdca-status.json.history get meaningful entries only: real phase transitions and real feature changes. No more "updated ×ばつ 1,661" noise drowning out the 8 real cycles.

5. Zero migration required

This is a patch release with no breaking changes. Upgrade by re-installing or running git pull in your bkit plugin directory. The fix is active immediately.

Optional cleanup: if your existing .pdca-status.json is already polluted, back it up and let bkit re-initialize:
mv .bkit/state/pdca-status.json .bkit/state/pdca-status.json.bak
# bkit will recreate a clean v3 schema on next session

🔁 Compatibility

Item	Status	Note
Breaking changes	✅ None	All 16 existing `updatePdcaStatus` callers receive default behavior
Schema migration	✅ Not required	v3 schema unchanged
`extractFeature(filePath)` → `extractFeature(filePath, opts = {})`	✅ Backward-compatible	Second argument optional
`updatePdcaStatus(feature, phase, data)` → `updatePdcaStatus(feature, phase, data, opts = {})`	✅ Backward-compatible	Fourth argument optional
Sprint Management (v2.1.13 GA)	✅ Unaffected
bkit Trust Level (L0–L4)	✅ Unaffected
Soft change	⚠️ `auth`/`cms`/`dashboard` are now in `GENERIC_NAMES`	Workaround: register explicitly via `/pdca plan <feature>`

📁 Files Changed (16)

Code (3)

lib/core/file.js (Layer 1)
lib/pdca/status-core.js (Layers 2, 3, 5)
scripts/pre-write.js (Layer 4)

Unit tests (3, new)

tests/unit/file-extract-feature.test.js (20 TCs)
tests/unit/extract-feature-from-context.test.js (10 TCs)
tests/unit/pdca-status-gating.test.js (18 TCs)

Metadata + version sync (6)

bkit.config.json
.claude-plugin/plugin.json
.claude-plugin/marketplace.json
hooks/hooks.json
scripts/unified-bash-post.js
.gitignore (added `!t...

@doing27

doing27

Assets 2

v2.1.14 — Differentiation Release (Memory Enforcer + Layer 6 + Sequential Dispatch + Effort-aware + PostToolUse + Heredoc-bypass)

14 May 07:45

@tomo-kay tomo-kay

v2.1.14

a8091c0

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v2.1.14 — Differentiation Release (Memory Enforcer + Layer 6 + Sequential Dispatch + Effort-aware + PostToolUse + Heredoc-bypass)

bkit v2.1.14 — Differentiation Release

The only Claude Code plugin that verifies AI-generated code against its own design specs.

bkit v2.1.14 is a product-moat release. Every one of bkit's differentiations against vanilla Claude Code is now physically implemented, contract-tested, and proven by live dogfooding evidence collected during this release cycle.

Verified on Claude Code v2.1.140. 95 consecutive compatible CC releases (v2.1.34 ~ v2.1.141, R-2 skip versions excluded). Zero breaking changes for existing v2.1.13 users — the upgrade is drop-in.

✨ Highlights

Six differentiations, all enforced

The previous "five differentiations" framing is retired. v2.1.14 promotes one new defense and makes the full set physically implemented (no more "advisory") and contract-tested.

#	Differentiation	Mitigates	Implementation
1	Memory Enforcer	CC's CLAUDE.md is advisory — models override it	PreToolUse deny with audit trail
2	Layer 6 Defense	R-3 safety-hook-ignored regressions (10+ evolved forms tracked)	Post-hoc audit + alarm + auto-rollback
3	Sequential dispatch	CC #56293 sub-agent caching 10x regression (11-streak unresolved)	8-state dispatcher: first sequential → warmup sample → restore parallel
4	Effort-aware adaptive defense	CC's new `effort.level` hook field exposed but no defense uses it	Invariant 10 (ADR 0010) routes defense intensity off effort tier
5	PostToolUse `continueOnBlock`	CC silent PostToolUse drop (#57317) — bkit's three PostToolUse hooks survive failures gracefully	Block-aware continuation flag across `unified-bash-post`, `unified-write-post`, `skill-post`
6	Heredoc-pipe bypass guard	CC #58904 — model can smuggle shell input through `$(cmd <<TAG ... TAG)` past defense	Token-aware heredoc-inside-substitution detector

Live dogfooding caught two events during this release

Two events were captured while shipping v2.1.14 itself, recorded in docs/sprint/v2114/sub-sprint-6-observation.report.md:

Memory Enforcer deny on an out-of-scope Write from a sub-agent — exact case the feature exists for.
Heredoc Detector classify on the v2.1.14 release commit message itself — the guard correctly fired on git commit -m "$(cat <<'EOF' ... EOF)", forcing this release to use the safer git commit -F file.txt path instead. Reviewers can see this in the PR conversation.

Architecture growth (v2.1.13 → v2.1.14)

v2.1.13	v2.1.14	Delta
Lib subdirs	19	20	`+lib/defense/`
Port↔Adapter pairs	7	8	`+caching-cost`
ADR invariants	9	10	`+0010 effort-aware`
Lib modules	163	174	+11 (5 defense + 6 orchestrator/domain/infra)
Contract suites	3 base + 5 v2.1.13	3 base + 5 v2.1.13 + 5 v2.1.14	+22 TC
Skills / Agents / Hooks	44 / 34 / 21 events / 24 blocks	unchanged	—

Closed work this release

CARRY-5 — OTEL token-meter zero-entries root cause identified and closed (subprocess env hydration via lib/infra/otel-env-capturer.js).
F9-120 closure — claude plugin validate . Exit 0 across 10 consecutive CC releases (v2.1.120/121/123/129/132/133/137/139/140/141). Carryover officially retired.

🧑‍💻 User Experience Changes

This section explains what changes for you in day-to-day use. TL;DR: bkit gets safer and quieter — you do not need to change your workflow.

What you'll feel differently

1. Some shell commands you used to run will now be blocked

If you (or a sub-agent on your behalf) try to run a shell command that contains a heredoc inside a command substitution — for example:

git commit -m "$(cat <<'EOF'
... message ...
EOF
)"

...you'll see a clear block message with safer alternatives. Use git commit -F file.txt or git commit -m '...' instead. This is the heredoc-bypass guard (differentiation #6) protecting against CC #58904. The block ships with three suggested alternatives so you can usually fix it in one move.

If you're certain a specific case is safe in your context, the guard's source (lib/defense/heredoc-detector.js) is auditable and the block can be lifted per-tool-call after manual review.

2. Sub-agent batches feel slower for the first spawn — then faster overall

When bkit spawns multiple sub-agents (Plan parallel, Design council, Do swarm, Check council), the first sibling now runs sequentially so its cache warms up before the rest go parallel. You'll see one extra spinner tick at the start of a multi-agent step, but total cost drops dramatically — pre-fix runs were paying ~10x cache_creation_input_tokens per spawn because of CC #56293.

If you trust your context and want the old behavior, set BKIT_SEQUENTIAL_DISPATCH=0 before launching CC. At Trust Level L4 sequential dispatch is forced regardless.

3. Memory rules are now actually enforced, not just suggested

When CLAUDE.md / project memory says "never touch X", bkit's Memory Enforcer now physically blocks the write at PreToolUse and emits an audit log entry — rather than logging an advisory that the model can ignore. If you've been writing memory rules and seeing them silently bypassed, this is the fix.

If you need to override for a one-off, the deny carries an explicit "manual override" instruction in its block message.

4. Defense intensity now adapts to `effort.level`

CC v2.1.133+ exposes an effort.level field on hook payloads (think "how hard is this turn working"). bkit's defenses now read this field via Invariant 10 (ADR 0010) and dial themselves down for trivial turns and up for heavy turns. You should see fewer false positives on quick edits and stronger checks on long multi-step turns.

5. PostToolUse hooks survive their own failures

CC's silent PostToolUse drop (#57317) used to mean a single failing hook could kill all downstream observability. v2.1.14's continueOnBlock flag means a failure in unified-bash-post no longer prevents skill-post from logging. Your audit trail is more complete with no action on your part.

6. Cache-cost telemetry is now real

The previous "zero entries" you may have seen in the token-meter adapter (CARRY-5) was a subprocess env-propagation bug. lib/infra/otel-env-capturer.js now hydrates OTEL_* into hook subprocesses on SessionStart, so cost dashboards finally show populated data. Combined with the new caching-cost Port↔Adapter pair, the dispatcher can make data-driven decisions about when to go parallel.

What stays the same

Skills, Agents, Hook events, MCP tool counts — unchanged from v2.1.13.
PDCA 9-phase enum + Sprint 8-phase enum — frozen (Application Layer pilot, ADR 0005).
Trust Level L0 ~ L4 semantics + SPRINT_AUTORUN_SCOPE — unchanged.
Existing /pdca, /sprint, /control, /bkit-explore, /audit, /rollback slash commands — unchanged signatures.

Upgrade path

# in your project (Claude Code plugin auto-pulls; no command needed)
# verify:
claude plugin info bkit # should show 2.1.14

Project memory files, sprint state files, and PDCA documents written by v2.1.13 are read by v2.1.14 unchanged. No migration scripts.

Recommended CC version

Profile	Recommended CC	Why
Conservative	v2.1.123	Stable, OAuth fix, fail-isolated hooks
Balanced	v2.1.140	Latest before the v2.1.141 60-bullet bumper observation window
Latest	v2.1.141	Acceptable — heredoc bypass guard already covers #58904

See docs/06-guide/version-policy.guide.md for the full dist-tag 3-Bucket framework.

🧪 Verification

Tests — 291 existing + 22 new v2.1.14 contract tests = 313 PASS, 0 FAIL.
scripts/verify-full-system.js — 11/11 categories PASS (BKIT_VERSION sync, hooks reachability, domain purity, contract suites, invariants, defenses, observability ports, dist-tag drift, ENH backlog hygiene, dogfooding evidence, release-bundle inventory).
CC plugin validate — claude plugin validate . Exit 0 (10 consecutive cycles).
Domain layer purity — 12 files, 0 forbidden imports (CI-enforced).

📚 Documents

Master plan: docs/sprint/v2114/master-plan.md
PRD / Plan / Design: docs/sprint/v2114/{prd,plan,design}.md
Sub-sprint reports: docs/sprint/v2114/sub-sprint-{1..6}-*.report.md
ADR 0010 (Invariant 10 effort-aware): docs/adr/0010-effort-aware-invariant.md
CC version monitoring guide: docs/06-guide/cc-version-monitoring.guide.md
Version policy guide: docs/06-guide/version-policy.guide.md

🙏 Acknowledgements

This release was built using bkit itself across six PDCA-style sub-sprints, and the heredoc-pipe guard caught its own release commit during ship — exactly the kind of dogfooding evidence the differentiation framing is meant to produce.

🤖 Built with Claude Code

Assets 2

Releases: popup-studio-ai/bkit-claude-code

v2.1.22 — Hardening Release

bkit v2.1.22 — Hardening Release

✨ Highlights

👤 What changes for you

🔧 Under the hood

🧪 Quality & regression note

Uh oh!

v2.1.21 — Session Title Isolation (#111) + Sprint Output Enforcement (#113)

bkit v2.1.21 — Issue Response Sprint

✨ Highlights

🪟 #111 — Parallel sessions now get distinct window titles

📋 #113 — Sprint operations now print a human-readable summary

🔬 A deeper fix than reported

👤 What changes for you

🧱 Under the hood

🔭 Known follow-up

Contributors

Uh oh!

bkit v2.1.20 — Marketplace Recovery + Plugin Manifest Schema Compliance

bkit v2.1.20 — Marketplace Recovery + Plugin Manifest Schema Compliance

✨ Highlights

🧭 User-Experience Changes

For users running Claude Code v2.1.143 or later (≈95% of users)

For users running Claude Code ≤ v2.1.142

For users running an existing bkit session on Claude Code < v2.1.143

For bkit contributors / future PR authors

For bkit release engineers

For external dogfooders / Early Adopter Program

📦 What's in this release

Added

Changed

Anti-Mission preserved (no regressions to v2.1.143+ users)

🌟 External Dogfooder Contributions

🔮 Roll-forward markers (v2.1.21+)

📊 Sprint outcome (verification)

🔗 References

Contributors

Uh oh!

bkit v2.1.19-hotfix.1 — CI Hardening & SQM Panel Activation

bkit v2.1.19-hotfix.1 — CI Hardening & SQM Panel Activation

Highlights

User experience change

Root cause

Fixes (4 files, +56 / -4)

Verification

Upgrade

What's next

Full changelog

Uh oh!

v2.1.19 — Quality Maturation Sprint (5 sub-sprints + ENH-318)

bkit v2.1.19 — Quality Maturation Sprint

TL;DR

🌟 Highlights — What Changed for You

1️⃣ /sprint init auto-imports Context Anchor (Closes #104)

2️⃣ Sprint reports now have ## Quality Gates section + KPI SoT (Closes #105)

3️⃣ Gate-fail reports auto-marked RESOLVED on successful transition (Closes #103)

4️⃣ SKILL.md path drift permanently blocked (Closes #107)

5️⃣ bkit Early Adopter Program / Real User Hall of Fame (ENH-318)

6️⃣ Trust Score 7-Component (ENH-318 governance)

7️⃣ --approve semantic clarified (S1 + CO-S0-6 absorbed)

8️⃣ /sprint dogfood action for bkit self-dogfooding

9️⃣ Self-dogfood CI gate

🔟 SQM (Sprint Quality Maturity Index) introduced

👀 User Experience Impact

🧪 Tests (152 NEW, 100% PASS)

🧩 Methodology — Bootstrap Exception Pattern Validated 5x

🆙 Upgrade Guide

From v2.1.18

Trust Score impact (if you have existing trust-profile.json)

Contributors

Uh oh!

v2.1.18 — Sprint Trust UX Fix (Issues #100/#101/#102)

bkit v2.1.18 — Sprint Trust UX Fix

TL;DR

🌟 Highlights — What Changed for You

F1 · sprint-* agents now declare tools: (Closes #100)

F2 · New /sprint trust command (Closes #101)

F3 · --trust CLI alias now honored (Closes #102)

F4 · sprint-master-planner orchestrator expansion (★ user-requested)

1️⃣ `/sprint init` auto-imports Context Anchor (Closes #104)

2️⃣ Sprint reports now have `## Quality Gates` section + KPI SoT (Closes #105)

7️⃣ `--approve` semantic clarified (S1 + CO-S0-6 absorbed)

8️⃣ `/sprint dogfood` action for bkit self-dogfooding

F1 · `sprint-*` agents now declare `tools:` (Closes #100)

F2 · New `/sprint trust` command (Closes #101)

F3 · `--trust` CLI alias now honored (Closes #102)

F4 · `sprint-master-planner` orchestrator expansion (★ user-requested)