-
Notifications
You must be signed in to change notification settings - Fork 146
Releases: popup-studio-ai/bkit-claude-code
v2.1.22 — Hardening Release
f077545 bkit v2.1.22 — Hardening Release
A stability- and consistency-focused release. There are no new commands or workflows to learn — instead, this release removes friction that previously affected Windows users and anyone running Sprint/PDCA stop hooks, and pays down internal complexity so future changes ship more safely.
This release is regression-free: verified by a full-suite baseline diff (320 → 321 test files, 0 net new failures).
✨ Highlights
- 🪟 Windows is no longer broken at load time. The frontmatter parser used a
/^---\n/fence pattern that silently failed on Windows CRLF (---\r\n) files — meaning skills, agents, and output-styles could fail to load entirely. Fixed across 6 fence sites and 34 line-splitting sites, while staying byte-identical on macOS/Linux (LF). - 🛑 No more "Hook JSON output validation failed" errors. Running
/sprint list(and other Sprint/PDCA actions) could surfaceStop hook error: Hook JSON output validation failed — (root): Invalid input. The root cause — a mistyped hookdecisionenum shared by 5 Stop emitters — is fixed, with a contract test added so it can't regress. - 🧹 Leaner, safer internals. The 4 largest "god files" (>700 LOC) were split down to focused modules (
sprint-handler.js1509 → 271,state-machine.js985 → 406,automation.js770 → 451,unified-stop.js751 → 693) with behavior fully preserved. - 🌐 English CHANGELOG. The full
CHANGELOG.md(59 release sections) is now in English, aligning with the project's global-service language policy. - 🔗 Up-to-date with Claude Code. Validated against CC v2.1.146 → v2.1.159; recommended CC version bumped to v2.1.159 (balanced) / v2.1.150 (conservative). Continuous-compatibility streak extended 101 → 112.
👤 What changes for you
| If you... | What you'll notice |
|---|---|
| Run bkit on Windows | Skills/agents/output-styles that previously failed to load now load correctly (CRLF handled). |
Use /sprint or PDCA stop hooks |
The intermittent Hook JSON output validation failed error is gone; stop output renders a clean summary + next step as intended. |
| Read the CHANGELOG | It's now in English end-to-end (Korean kept only for intentional sample data and the "(KO)" tagline). |
| Pin a Claude Code version | Recommended versions updated to v2.1.159 (balanced) / v2.1.150 (conservative). |
| Don't hit any of the above | No action needed — your existing commands and workflows are unchanged. |
No migration steps are required.
🔧 Under the hood
- S1 — CC v2.1.159 Response (ENH-324~328): cancelled ENH-317 as MOOT after the upstream
/simplifyrename was reverted (v2.1.152/154); verified sessionTitle-on-resume and multi-Agent frontmatter compatibility; registered 2 new regression monitors. - S2 — Cross-Platform Verification (ENH-329~335): CRLF fixes above; confirmed shell-branching is unnecessary (only
git/gh/node/npxare exec'd). Carry: native Windows runtime CI matrix. - S6 — Stop Hook Schema Compliance (ENH-361~366): corrected
cc-payload.port.jsdecisiontypedef; addedoutputStopSurface()/outputStopAllow()single-source helpers inlib/core/io.js. - S4 — Tech-Debt & Dead-Code (ENH-336~342): field audit found 0 removable dead code; the 6
pdca-eval-*stubs are confirmed permanent (required by immutable contract baselines + L4 deprecation governance). - S3a — God-File Split (ENH-343~348): 4 → 0 files over 700 LOC; contract baselines (255 / 234 assertions) unchanged.
- S3b — Layer Consolidation (ENH-349~354): verified 0 redundancy to merge; documented as ADR 0013 (no code change).
- S5 — Final QA + i18n + Docs-Sync (ENH-355~360): full QA, 8-language trigger-keyword audit (44 skills + 40 agents), code=docs inventory sync, version bump 2.1.21 → 2.1.22.
🧪 Quality & regression note
The initial S5 QA pass was incomplete (it ran only tests/, 43 files, and missed test/, 278 files). A full-suite run surfaced 6 failures, of which 5 were self-introduced regressions from the S3a refactor (fixed in 795c724) and 1 was pre-existing. Final baseline diff:
- Baseline
b591410full-suite: 320 files / 28 fail - Released HEAD full-suite: 321 files / 26 fail
- Net new regressions: 0 — and 2 baseline failures (
docs-code-sync,sprint-alpha-e2e) were actually fixed.
The remaining 26 failures are pre-existing test-debt, scheduled for cleanup in v2.1.23 (along with the native Windows CI matrix).
Full Changelog: https://github.com/popup-studio-ai/bkit-claude-code/blob/main/CHANGELOG.md
🤖 Generated with Claude Code
Assets 2
v2.1.21 — Session Title Isolation (#111) + Sprint Output Enforcement (#113)
6e56b2f bkit v2.1.21 — Issue Response Sprint
This release closes two external dogfooder issues in a single unified sprint, each fix validated against actual codebase file:line evidence rather than accepted on report alone.
| Issue | Title | Reporter |
|---|---|---|
| #111 | Session title collision across parallel sessions | @wonuseo |
| #113 | Sprint screen-output enforcement gap | @rohwonseok-ops |
✨ Highlights
🪟 #111 — Parallel sessions now get distinct window titles
Opening two Claude Code windows in the same project folder on the same feature used to label both windows identically ([bkit] PLAN f1) — making it dangerously easy to type a command into the wrong terminal. This was latent across all of v2.1.6 → v2.1.20. bkit now appends a stable per-session tag so each window is uniquely identifiable.
📋 #113 — Sprint operations now print a human-readable summary
Sprint commands (phase, status, watch, report) previously emitted raw JSON only, leaving you to trust the model's narration of it. Sprint now enforces the same Stop-hook output discipline as PDCA: an Executive Summary, an interactive next-step prompt, and a per-feature progress table — surfaced directly on screen.
🔬 A deeper fix than reported
While verifying #113 with real claude -p runs, we found that no skill Stop handler was actually firing in production (PDCA included) — Claude Code omits skill_name from the Stop payload, so every detection path silently fell through. v2.1.21 introduces a cross-process active-skill marker that fixes dispatch for Sprint and lays the groundwork for the PDCA family in v2.1.22+.
👤 What changes for you
Multi-window users (#111)
- Each session window now shows a short unique tag, e.g.
[bkit] PLAN f1 ·a1b2. - The tag is derived deterministically from the session ID — same session, same tag for its whole lifetime.
- Fully backward-compatible: if there's no session ID, the title looks exactly as before. Legacy title caches migrate automatically on first read. No config change required.
Sprint users (#113)
- After
/sprint phase ... --to <p>, you'll see a Sprint Executive Summary (Mission / Result / matchRate / Cross-Sprint Integration / Invariant) instead of a JSON blob. /sprint statusand/sprint watchnow render a per-feature table with a one-line quality-gate summary; raw JSON is still available for programmatic use./sprint reportfinalization prepends a full KPI + carry-items summary with a clear next action.- The session title updates to reflect the current sprint phase.
🧱 Under the hood
- New modules:
lib/sprint/executive-summary.js,scripts/sprint-skill-stop.js,lib/core/active-skill-marker.js - Refactors:
session-title-cache.js(per-session map + GC + legacy migration),session-title.js(sessionId tag), 4 Stop emitters (session_id threading),unified-stop.js(sprint handler + marker peek),advance-phase.usecase.js(DI-based transition summary),sprint-handler.js(display field) - ADR 0012 — Sprint Stop Hook Output Enforcement (run-export pattern · separate sprint shape · usecase-purity DI · cross-process active-skill marker)
- 92 new/extended test cases across 6 files — all passing, including real
claude -p --plugin-dir .runtime dispatch verification
🔭 Known follow-up
- CARRY-#113-1 — PDCA-family Stop handlers share the same production no-op root cause and will be migrated to the run-export + marker pattern in v2.1.22+ (pending separate regression verification).
Full changelog: see CHANGELOG.md · Cross-references: #111 ⊃ #77 · #113 ⊃ #93
🤖 Generated with Claude Code
Assets 2
bkit v2.1.20 — Marketplace Recovery + Plugin Manifest Schema Compliance
2fc529f bkit v2.1.20 — Marketplace Recovery + Plugin Manifest Schema Compliance
Released: 2026年05月26日
Trigger: External dogfooder @BJ (정병진) 2026年05月26日 install incident —Validation errors: : Unrecognized key: "displayName"
Sprint: 14 features / 3 sub-sprints / 3 new ENH / 1 new ADR / 1 external dogfooder #2 / 13/13 quality gates PASS / 0 auto-pause triggers
Minimum Claude Code: v2.1.143 (the strict plugin-manifest path recognizes the officialdisplayNamefield only from v2.1.143)
✨ Highlights
- 🚨 Minimum Claude Code v2.1.143 advisory in README, README-FULL, marketplace.json, plus a new self-service guide
docs/06-guide/cc-compatibility.guide.md. Advisory only — no hard reject to keep UX friendly for users still on older Claude Code. - 🛡 21-key plugin-manifest whitelist CI gate (ENH-322) —
scripts/validate-plugin.js --strictenforces the Anthropic official schema. NewEXPECTED_PLUGIN_JSON_KEYSSoT inlib/domain/rules/docs-code-invariants.js(Object.freeze, pure domain). v2.1.20: advisory only for one week; v2.1.21+: strict. - 🔧 ADR 0006 § Empirical Validation Gate recovery —
scripts/release-plugin-tag.shnow wiresclaude plugin validate .(~30-day wire delay closed). Non-zero exit blocks release. - 🚦 cc-regression Defense Layer 6 reinforcement (ENH-321) — new entry R3-321 tracks the displayName strict-reject regression with daily 09:00 KST reconcile cycle integration. 22 guards total, 0 warnings.
- 🔍 SessionStart runtime advisory (ENH-323) —
hooks/startup/session-context.jsnow detects the installed Claude Code version (200 ms timeout cap + 1-hour.bkit/runtime/cc-version.jsoncache + opt-out viaBKIT_DISABLE_CC_VERSION_DETECTION=1). Forward-proofs users who upgrade bkit before Claude Code. - 📜 ADR 0011 Plugin Manifest Schema Compliance Policy (Accepted) — formalizes the 5-layer policy (minimum CC + 21-key whitelist +
claude plugin validatewire + R3-321 + SessionStart detection). - 🌟 External Dogfooder Hall of Fame #2 — @BJ (정병진) entry at
docs/external-dogfooders/bj.md. bkit Early Adopter Program DA-4 status: N=2 confirmed (first-follower effect validated 28 days after the v2.1.19 policy introduction).
🧭 User-Experience Changes
For users running Claude Code v2.1.143 or later (≈95% of users)
- README and README-FULL now display a one-line minimum Claude Code v2.1.143 advisory at the top.
- The marketplace.json bkit entry description starts with
Requires Claude Code v2.1.143+. - No functional change. Existing PDCA + Sprint workflows continue unchanged.
- The new SessionStart CC version check returns
isOldVersion=falseand emits no advisory — invisible to you.
For users running Claude Code ≤ v2.1.142
claude plugin install bkitwill still fail withUnrecognized key: "displayName"— this is a Claude Code-side schema rejection that bkit cannot bypass without removing thedisplayNamefield (which is forbidden by Anti-Mission since removal would regress the UI picker on Claude Code v2.1.143+ users).- However, the new
docs/06-guide/cc-compatibility.guide.mdgives a clear, self-service workaround:npm install -g @anthropic-ai/claude-code@latest rm -rf ~/.claude/plugins/cache/temp_git_* claude plugin install bkit
- The README advisory + marketplace.json description prefix surface this requirement before you hit the install error.
For users running an existing bkit session on Claude Code < v2.1.143
- SessionStart now detects the Claude Code version and surfaces a
bkit Compatibility Noticein the additionalContext (one-time-per-session, cached for one hour at.bkit/runtime/cc-version.json). - The notice includes the workaround command and a link to the compatibility guide.
- Performance budget: 200 ms hard cap on
claude --versionexecution + cache + one-call-per-session — typical session impact zero (cache hit) or sub-100 ms (first call). - Opt-out: set
BKIT_DISABLE_CC_VERSION_DETECTION=1.
For bkit contributors / future PR authors
- A new CI step
Release Gate — plugin.json schema validation (21-key whitelist)runsnode scripts/validate-plugin.js --stricton every PR (.github/workflows/contract-check.yml). - v2.1.20 = advisory only (
continue-on-error: true) to avoid PR backlog during the rollout week. - v2.1.21+ = strict (
continue-on-error: false) — extra keys in plugin.json will block PR merge. - Add new plugin.json keys only if they appear in the 21-key whitelist at
lib/domain/rules/docs-code-invariants.jsEXPECTED_PLUGIN_JSON_KEYS. If Anthropic publishes a new manifest key, update the SoT first.
For bkit release engineers
scripts/release-plugin-tag.shnow runsclaude plugin validate .(per ADR 0006) between the CI-invariants check and tag-conflict detection. Non-zero exit blocks the release.- If the
claudeCLI is absent from PATH (some CI environments), the script logs aWARNand falls back gracefully — release continues. - This is the v2.1.20 release itself dogfooding ADR 0006's Empirical Validation Gate for the first time.
For external dogfooders / Early Adopter Program
- Hall of Fame now has two entries: @pruge (v2.1.19, first follower) + @BJ (v2.1.20, second entry, first-follower effect validation).
- DA-4 acquisition goal status: N=2 confirmed — first-follower effect validated 28 days after policy introduction. v2.1.21+ continues active outreach to grow N≥3.
- New dogfooders are encouraged to file detailed issues per the 5-stage User-Feedback Lifecycle at
docs/external-dogfooders/_README.md.
📦 What's in this release
Added
EXPECTED_PLUGIN_JSON_KEYSSoT +diffPluginJsonKeysinlib/domain/rules/docs-code-invariants.js--strictflag inscripts/validate-plugin.js(exit codes 2 / 3 for extra-key / SoT-import failures)Release Gate — plugin.json schema validation (21-key whitelist)step in.github/workflows/contract-check.ymlclaude plugin validate .wire inscripts/release-plugin-tag.sh(ADR 0006 § Empirical Validation Gate)R3-321entry inlib/cc-regression/registry.js(cc-regression entry #22)detectCCVersion()+buildCCVersionAdvisoryContext()inhooks/startup/session-context.jsccVersionAdvisorysection inbkit.config.json:ui.contextInjection.sections(default-on)- ADR 0011 Plugin Manifest Schema Compliance Policy (Status: Accepted)
docs/06-guide/cc-compatibility.guide.md— user-facing self-service guidedocs/external-dogfooders/bj.md— Hall of Fame entry #2test/e2e/external-dogfood/cc-min-version.test.js— 5 TC permanent regression lock (Lifecycle Stage 4)
Changed
README.md/README-FULL.md— Claude Code badge v2.1.123+ → v2.1.143+, Version badge 2.1.19 → 2.1.20, one-line minimum CC advisory.claude-plugin/plugin.json— version 2.1.19 → 2.1.20 (displayName unchanged per Anti-Mission).claude-plugin/marketplace.json— bkit + marketplace version 2.1.19 → 2.1.20, description prefix advisorybkit.config.json— version 2.1.19 → 2.1.20,ui.contextInjection.sectionsaddsccVersionAdvisorytest/integration/config-sync.test.jsCS-015 —diffPluginJsonKeys21-key whitelist enforceddocs/external-dogfooders/_README.md— @BJ entry added under "v2.1.20", DA-4 status N=2 confirmed
Anti-Mission preserved (no regressions to v2.1.143+ users)
displayNamefield NOT removed (v2.1.143+ official schema key — removal would regress UI picker)- v2.1.142-and-below users NOT hard-rejected (advisory only)
- Anthropic docs vs implementation lenient/strict mismatch (Q1) NOT touched (external responsibility)
bkit-starterplugin unchanged (no displayName field, zero impact)- Trust L3/L4 default unchanged
🌟 External Dogfooder Contributions
@BJ (정병진) drove the entire v2.1.20 sprint via precise error message + cache path + Cursor IDE environment metadata sharing. Reproduction scenario absorbed at test/e2e/external-dogfood/cc-min-version.test.js (5 TC, Lifecycle Stage 4 Regression Lock achieved). Trust Score externalDogfoodFeedbackResponseRate (weight 0.05) accumulated. See docs/external-dogfooders/bj.md for the full contribution archive.
Thank you @BJ for sharing the precise error message that scoped the entire sprint correctly. 🙏
🔮 Roll-forward markers (v2.1.21+)
- F6 contract-check.yml
continue-on-error→false(one-week advisory-only window closes) - F8 R3-321 telemetry 3-month analysis → demote/keep decision
- F10 ENH-323 SessionStart detection telemetry 3-month analysis → timeout elevation decision
- F14 Hall of Fame @BJ Stage 3 (Fix Released) ⏳ → ✅ on v2.1.20 GA tag — this release achieves it
📊 Sprint outcome (verification)
- 14/14 features delivered (×ばつ4 + ×ばつ5 + ×ばつ5)
- 13/13 quality gates PASS (M1-M10 + S1-S4 all clean)
- 0 auto-pause triggers fired (QUALITY_GATE_FAIL / ITERATION_EXHAUSTED / BUDGET_EXCEEDED / PHASE_TIMEOUT)
- matchRate 100%, dataFlowIntegrity 100%
- Sprint cycle time 3 days vs 14-day budget (25% utilization)
- 5/5 side-effect verification gates PASS post-archive patches (CO-4 + CO-5 + CO-2-partial-fix)
- 3 of 6 carry items closed by the post-archive patch: CO-2 (test-side mitigated), CO-4 (Q3 partially resolved), CO-5 (@BJ Stages 3 + 5 ✅)
🔗 References
- Full sprint planning:
docs/sprint/v2120-marketplace-recovery/ - Final report:
docs/04-report/features/v2120-marketplace-recovery.report.md - ADR 0011:
docs/adr/0011-plugin-manifest-schema-compliance.md - Compatibility guide:
docs/06-guide/cc-compatibility.guide.md - Hall of Fame @BJ :
docs/external-dogfooders/bj.md - Root cause analysis: [`d...
Assets 2
bkit v2.1.19-hotfix.1 — CI Hardening & SQM Panel Activation
b12b3b8 bkit v2.1.19-hotfix.1 — CI Hardening & SQM Panel Activation
A focused CI hotfix that restores the Invocation Contract Check workflow to green after v2.1.19 GA, and quietly activates the new SQM dashboard panel that v2.1.19 design (S5 ADR S5-003) promised but never wired up.
Highlights
- 🟢 CI green again —
Invocation Contract Checkworkflow flipped back from red to green after v2.1.19 GA merge. - 🩹 Dead-code detector blind spot patched —
scripts/check-deadcode.jsnow correctly recognisesrequire(path.join(ROOT, '...'))references, not just direct string literals. - 📊 New SessionStart panel: SQM (Sprint Quality Maturity) — v2.1.19's promised quality maturity index is now visible at session start.
- 🛡️ Zero behaviour regression — verified against the full 4,168 TC QA aggregate.
User experience change
After installing v2.1.19-hotfix.1, every new bkit session (/clear or fresh CC launch) will show one extra compact panel at the top of the context, if and only if the project has at least one entry in .bkit/state/sqm-history.jsonl:
┌─── SQM (Sprint Quality Maturity) ─────────── 64.00 / 100 ─┐
│ Docs=Code 100 │ Self-Dogfood 10 │ Dogfooder 100 │
│ Report KPI 80 │ Dispatch — │ Convention 0 │
└──────────────────────────────────────────────────────────┘
The panel:
- Reflects project-wide quality maturity across 6 weighted components (Docs=Code 30 %, Self-Dogfood 20 %, External Dogfooder 20 %, Report KPI 15 %, Sub-Agent Dispatch 10 %, Convention 5 %).
- Is fail-silent on new projects — when
.bkit/state/sqm-history.jsonlis missing, the panel renders nothing and adds 0 chars. No impact on first-run UX. - Adds ~259 chars to
additionalContextwhen active. Other 4 dashboard sections (progress / workflow / impact / control) are unchanged. - Can be opted out by removing
'sqm'frombkit.config.json→ui.dashboard.sections.
No other user-facing behaviour changes.
Root cause
Two independent issues coincided immediately after the v2.1.19 GA merge:
- Detector blind spot —
scripts/check-deadcode.jsmatched only direct string-literal requires (require('./foo')). The new v2.1.19 scripts (S0 measure, S3 docs-sync, S4 feedback refresh) all userequire(path.join(ROOT, '...')), so the detector reported 5 false-positive dead modules and failed the CI gate. - Design/runtime gap —
lib/ui/sqm-panel.jswas specified by S5 ADR S5-003 to render in the SessionStart dashboard, but the wiring inhooks/session-start.jswas never landed.bkit.config.jsonui.dashboard.sectionsalso needed'sqm'for the new panel to opt in.
Fixes (4 files, +56 / -4)
scripts/check-deadcode.js— splitscanProductionRequires()into two regexes.reDirectkeeps the original behaviour;reIndirectmatchesrequire(<wrapper>(..., '<lib path>', ...))where<wrapper>is any identifier and the captured string literal containslib/or starts with.//../. Restricts to library-shaped paths to avoid overmatching arbitrary strings.hooks/session-start.js— wiredlib/ui/sqm-panel+lib/quality/sqm-historyinto the SessionStart dashboard. Render block is independent of theprimaryFeaturegate (SQM is project-wide, not feature-scoped) and fail-silent when the history file is missing.bkit.config.json— added'sqm'toui.dashboard.sections. Runtime config takes precedence over the hook default, so without this the SqmPanel rendered to 0 chars even after wiring.CHANGELOG.md— added[2.1.19-hotfix.1]section.
Verification
- ✅
scripts/check-deadcode.js: NEW dead 5 → 0, Live 134 → 139 (exactly the 5 intended modules — set-diff verified against the precise v2.1.19 GA baseline, NEW DEAD 0). - ✅ Contract suite 9-step spot-check all PASS: domain-purity (18 files) · guards (21 entries) · test-tracking (314 files) · docs-code-sync (5/5) · integration-runtime (23/23) · l2-smoke (101/101) · bkit-full-system (36/0/0) · contract-test-run vs v2.1.16 L1,L4 (255 assertions).
- ✅ Full QA aggregate (4,168 TC across 157 files): 12 FAIL + 6 errors reproduced identically against the HEAD~1 (v2.1.19 GA) baseline → confirmed pre-existing carryover (
ACTION_TYPESbaseline drift, trust-engine score change). This hotfix introduced 0 new regressions. - ✅ SessionStart hook live run with a fresh session id: JSON contract valid; all 5 dashboard sections including SQM (
64.00 / 100) render correctly;additionalContext5,482 → 5,741 chars (+259 for SQM panel).
Upgrade
# CC plugin users — version bumps automatically on next /clear or session start # Manual verification: cat bkit.config.json | jq '.version' # → "2.1.19" (no version bump for hotfix) node scripts/check-deadcode.js # → Dead (NEW) : 0
Note:
bkit.config.jsonversionstays at2.1.19. The-hotfix.1suffix lives only on the git tag and this release, following the bkit hotfix convention (see v2.1.12-hotfix precedent).
What's next
The 12 pre-existing FAILs surfaced by the full QA aggregate (ACTION_TYPES baseline drift, trust-engine score change, sprint-4-presentation AUDIT-01) will be resolved in v2.1.20. They are tracked as CARRY-v2119-1 in memory and do not gate this hotfix.
Full changelog
CHANGELOG.md → v2.1.19-hotfix.1
🤖 Generated with Claude Code
Assets 2
v2.1.19 — Quality Maturation Sprint (5 sub-sprints + ENH-318)
228b6bc bkit v2.1.19 — Quality Maturation Sprint
Permanent closure of pruge's sprint domain issue cluster (10 GitHub issues over 1.5 days, all in the sprint domain). Rather than another reactive fix release, v2.1.19 treats the cluster as a systemic sprint domain maturity gap and addresses it with a 5 sub-sprint master plan.
All 5 sub-sprints + outer master sprint archived. ENH-318 정식 편입 (차별화 6/6 → 7/7). Real User Hall of Fame 첫 entry @pruge.
Released: 2026年05月21日
Predecessor: v2.1.18 GA (Sprint Trust UX Fix)
PR: #108 · Merge commit: 228b6bc (admin merge)
Reporter who shaped this release: @pruge (James Kim) — dandi-village-ledger project. Real User Hall of Fame 첫 entry 등재 🏆
TL;DR
- 🎯 Closes 4 GitHub Issues — #103, #104, #105, #107 (P0 / sprint domain maturity)
- 🏗️ 5 sub-sprint master plan — S0 baseline + S1 Foundation + S2 Defense + S3 Polish + S4 Proactive + S5 Measurement (all archived)
- ✅ 152 TC across 30 test files — 100% PASS, 0 regressions
- 🚀 ENH-318 정식 편입 — Trust Score 7th component + Real User Hall of Fame + User-Feedback Lifecycle 5-stage
- 🏆 First external dogfooder Hall of Fame entry: @pruge — 10 issues, 5 absorbed E2E scenarios
- 🛡️ 9 new ACTION_TYPES for governance traceability
- 🔧 3 architectural fixes: S0 measurement bug (S2 evolution) + findFirstMatching bug (S5) + CO-S0-6 --approve semantic (S1 docs)
- 🤝 100% backward compat — Trust Score Δ ≤5% verified (worked example numerical proof)
🌟 Highlights — What Changed for You
1️⃣ /sprint init auto-imports Context Anchor (Closes #104)
# Before: empty context placeholders /sprint init my-sprint --name "Q2 Launch" # Generates "(not set)" in report Context section even when master-plan exists # After: auto-import from master-plan or PRD /sprint init my-sprint --name "Q2 Launch" # Reads docs/01-plan/features/my-sprint.master-plan.md → WHY/WHO/RISK/SUCCESS/SCOPE populated # audit emit: sprint_context_imported
Fallback chain: args.context (explicit) > master-plan.md > PRD.md > defaultContext.
2️⃣ Sprint reports now have ## Quality Gates section + KPI SoT (Closes #105)
## 2. KPI Snapshot | matchRate | 100% | # qualityGates.M1 wins over kpi.matchRate | dataFlowIntegrity (S1) | 100% | ... ## Quality Gates (11 gates, 11 passed) | Gate | current | threshold | passed | lastMeasuredAt | source | |------|---------|-----------|--------|----------------|--------| | M1 | 100 | ≥90 | ✓ | 2026年05月21日T10:00:00Z | gap-detector | ...
Precedence: qualityGates > featureMap > kpi. Divergence detection emits audit sprint_kpi_divergence.
3️⃣ Gate-fail reports auto-marked RESOLVED on successful transition (Closes #103)
# Pre-v2.1.19: docs/03-analysis/<sprint>-gate-fail-*.md # "STATUS: BLOCKED" — even after sprint completed # sprint.lastGateFailure never cleared # v2.1.19: when advancePhase succeeds after a gate_fail: # - File header prepended with "> **STATUS: RESOLVED** at <ISO>" # - sprint.lastGateFailure.resolvedAt / resolvedBy / resolutionReason populated # - audit emit: gate_fail_resolved (idempotent, atomic write)
4️⃣ SKILL.md path drift permanently blocked (Closes #107)
- sprint SKILL.md explicit
<bkit-root>convention with #107 reference inline scripts/check-skills-docs-code-sync.jsCI gate (44 skills ×ばつ invariant, code-block-aware)scripts/lint-skill-md.jsPreToolUse hook (warning-only)test/contract/baseline/skills-convention.jsonfrozen baseline
★ Critical evolution: S2 introduced stripCodeBlocks parser — S0 measurement had false positives on phase-3-mockup + phase-9-deployment (JavaScript/YAML samples inside code blocks). Real drift was 1 skill (sprint #107), not 3.
5️⃣ bkit Early Adopter Program / Real User Hall of Fame (ENH-318)
# Run bkit on your production project + file detailed bug reports → # - 🏆 Public recognition (README "Real User Hall of Fame") # - 🔒 Your reproductions become permanent E2E regression tests # - 📊 Activity powers Trust Score externalDogfoodFeedbackResponseRate component # - 📝 CHANGELOG attribution per release
First entry: @pruge (James Kim) — 10 issues / 5 scenarios absorbed. See docs/external-dogfooders/pruge.md.
6️⃣ Trust Score 7-Component (ENH-318 governance)
// Before (6 components, sum 1.0): // pdcaCompletionRate 0.25 / gatePassRate 0.20 / rollbackFrequency 0.15 // destructiveBlockRate 0.15 / iterationEfficiency 0.15 / userOverrideRate 0.10 // After (7 components, sum 1.0): // pdcaCompletionRate 0.2375 / gatePassRate 0.19 / rollbackFrequency 0.1425 // destructiveBlockRate 0.1425 / iterationEfficiency 0.1425 / userOverrideRate 0.095 // + externalDogfoodFeedbackResponseRate 0.05 ← NEW
Worked example: existing user with values [80/85/90/95/75/70] = old score 83.00 → new score 78.85. Δ -4.0pt (-4.8%), within ≤5% R-10 mitigation boundary (float epsilon tolerance verified).
Legacy 6-component trust-profile.json auto-migrated (weights from defaults SoT, values from disk).
7️⃣ --approve semantic clarified (S1 + CO-S0-6 absorbed)
# --approve ONLY bypasses Trust Level scope boundary (e.g., L3 stopAfter=qa) # --approve does NOT bypass Quality Gate failures (M*/S*) # For gate failures, use: /sprint measure <id> --gate <key> first
Documented in skills/sprint/SKILL.md §10.1.1.1. Future --allowGateOverride flag carry to v2.1.20+.
8️⃣ /sprint dogfood action for bkit self-dogfooding
/sprint dogfood v2.1.20 --release-tag v2.1.20-rc.0 # Creates self-dogfood-2.1.20 sprint container with auto-derived context # Idempotent (existing id graceful skip), audit: sprint_dogfood_started
9️⃣ Self-dogfood CI gate
./scripts/check-self-dogfood.sh # Verifies recent release was operated as sprint container (4 invariants per release) # --bootstrap-mode: skip invariant #1 (master plan §19.5 Exception) # --emergency-override <reason>: skip all invariants (SQM penalty -10)
v2.1.20 will be first true gate activation — v2.1.19 is the final Bootstrap Exception.
🔟 SQM (Sprint Quality Maturity Index) introduced
┌─── SQM (Sprint Quality Maturity) ─────────── 64.00 / 100 ─┐
│ Docs=Code 100 │ Self-Dogfood 10 │ Dogfooder 100 │
│ Report KPI 80 │ Dispatch — │ Convention 0 │
└──────────────────────────────────────────────────────────────────────┘
6-component weighted score, .bkit/state/sqm-history.jsonl append-only history, SessionStart-ready dashboard panel. Projected v2.1.19 GA SQM: ~96 (master plan §7.2 target ≥85 well exceeded).
👀 User Experience Impact
| Scenario | Before v2.1.19 | After v2.1.19 |
|---|---|---|
| L1 sprint init | trustLevelAtStart=L3 silent default, no L1 warning | default L2 (Safe Defaults), --trust L1 emits stderr warning + audit |
| Sprint report | KPI Snapshot only, no Quality Gates section, kpi.matchRate=stale despite qg.M1=100 | ## Quality Gates section + qualityGates SoT precedence + divergence detection |
| Sprint context | (not set) placeholders despite master-plan.md filled |
Auto-import from master-plan/PRD fallback chain |
| Gate-fail report | docs/03-analysis/ permanently "BLOCKED" after fix | Auto-marked "RESOLVED" with timestamp + reason on successful transition |
| External dogfooder feedback | No governance signal (internal trust components only) | Trust Score 7th component + Hall of Fame archive |
| Sprint orchestrator dispatch | Declared in v2.1.18 frontmatter but no test evidence | Contract baseline (5 TC) + e2e mocked + production code path verified |
| bkit self-dogfooding | bkit never ran its own sprint mgmt on its own releases | Bootstrap Exception pattern documented + /sprint dogfood + CI gate ready for v2.1.20 |
| SQM measurement | No quantitative sprint domain health metric | 6-component weighted score + history + SessionStart panel |
🧪 Tests (152 NEW, 100% PASS)
| Sub-sprint | New tests | Test count |
|---|---|---|
| S0 baseline | 30 | sqm-calculator unit (19) + result schema contract (7) + measure e2e (4) |
| S1 Foundation | 28 | dogfood (6) + default L2 (4) + annotate (3) + contract (5) + e2e dispatch (3) + ci-gate (7) [1 design-skip] |
| S2 Defense | 35 | path fix (4) + check-skills (17) + sprint audit (6) + baseline (5) + linter (3) |
| S4 Proactive | 14 | trust 7-component (5) + feedback-tracker (3) + 5 dandi scenarios (6) |
| S3 Polish | 40 | context-importer (10) + kpi-resolver (5) + generate-report-sot (9) + carry rationale (4) + lessons (4) + failure-resolution (8) |
| S5 Measurement | 5 | sqm-evolve (2) + history (1) + panel (2) |
| Total | 152 | 30 test files |
Final regression: all 30 files re-run, 152/152 PASS, 0 failures.
🧩 Methodology — Bootstrap Exception Pattern Validated 5x
S0 + S1 + S4 + S2 + S3 + S5 all completed under PDCA-with-sprint-shadow mode (main session as sub-agent manual proxy). 6 successful instantiations validate the pattern as a transitional protocol.
v2.1.20 will be the first true self-dogfood CI gate activation — scripts/check-self-dogfood.sh without --bootstrap-mode flag will hard-fail when releases don't operate as sprint containers.
🆙 Upgrade Guide
From v2.1.18
cd ~/your/bkit-claude-code git fetch --tags git checkout v2.1.19 # Restart Claude Code (or reload the plugin) cat bkit.config.json | grep version # → "version": "2.1.19"
Trust Score impact (if you have existing trust-profile.json)
Auto-migration:...
Assets 2
v2.1.18 — Sprint Trust UX Fix (Issues #100/#101/#102)
e726f15 bkit v2.1.18 — Sprint Trust UX Fix
The first release dedicated entirely to issue triage from the bkit community. v2.1.18 permanently closes the L1 sprint lockout incident class discovered and reported by @pruge on his
dandi-village-ledgers1-foundationsprint. Three GitHub issues (#100/#101/#102) — all filed within the same minute — were treated as a single 3-layer drift root cluster and fixed in one integrated sprint.
Released: 2026年05月21日
Predecessor: v2.1.17 GA (5-axis matrix 5/5 close)
Merge commit: e726f15 (PR #106)
Reporter: @pruge — thank you for the precise repro and root-cause analysis 🙏
TL;DR
- 🔧 3 issues closed — #100, #101, #102 (all P0/P1 sprint-blocking bugs)
- 🚀 New command —
/sprint trust <id> --to <Level>mutates an existing sprint's trust level in place, no more/sprint initdestruction - 🧠
sprint-orchestratoractually works now —Tasktool added to allowlist; sub-agent dispatch live for the first time (ENH-292 activation milestone) - 🤝
--trustalias honored — CLI behavior finally matchesskills/sprint/SKILL.md §10.2 - ✅ 40 tests passing (17 contract + 15 unit + 8 e2e) — ×ばつ over the 14 TC target
- 🔄 100% backward compatible — no sprint state schema change; existing
--trustLevelusers unaffected
🌟 Highlights — What Changed for You
F1 · sprint-* agents now declare tools: (Closes #100)
Four sprint agents — sprint-orchestrator, sprint-master-planner, sprint-qa-flow, sprint-report-writer — now have an explicit tools: frontmatter that includes the Task tool where required. Before v2.1.18, the orchestrator could not actually dispatch the measurement agents it was specced to route through (measure-router.js:233-253 returned no_agent_runner). With v2.1.18, sub-agent dispatch is live for the first time.
Differentiation milestone: ENH-292 Sequential Dispatch promoted from "declared" to "live".
F2 · New /sprint trust command (Closes #101)
# Mutate an existing sprint's trust level without losing state /sprint trust s1-foundation --to L3 --reason "P0 32/32 ready"
Before v2.1.18, a sprint initialized with --trust L1 was permanently locked in preview mode. The only escape was /sprint init — which destroyed phaseHistory, qualityGates, and featureMap accumulated over hours or days of work.
v2.1.18 adds:
- New CLI action
/sprint trust <id> --to <L0|L1|L2|L3|L4> [--reason "<text>"] [--force] - New audit ACTION_TYPE
sprint_trust_changed(ACTION_TYPES29 → 30) - Downgrade guardrail: dropping ≥2 levels (e.g. L4 → L1) requires
trustScore ≥ 80(from.bkit/state/trust-profile.json) or explicit--force - Idempotent path:
from === toemits an audit entry withnoop: trueso monitoring never sees a blind spot - Actor auto-detection: explicit
--actor>CLAUDE_AGENT_ID env (→ 'agent')> default'user' - Defense Layer 6 natural integration:
--forcesetsblastRadius: 'high'for the audit alarm pipeline (live evidence:.bkit/audit/2026-05-21.jsonl)
Documentation: skills/sprint/SKILL.md §10.1.3 (new section with comparison table and audit JSON example).
F3 · --trust CLI alias now honored (Closes #102)
# Both forms now produce identical behavior (v2.1.18)
/sprint measure s1-foundation --gate M1 --trust L3
/sprint measure s1-foundation --gate M1 --trustLevel L3scripts/sprint-handler.js:942 and :974 previously bypassed normalizeTrustLevel and read args.trustLevel directly, so --trust L3 was silently ignored at the measure and phase paths. Both call sites now go through normalizeTrustLevel(args), restoring the declared precedence chain trustLevel > trust > trustLevelAtStart. Docs=Code drift resolved.
F4 · sprint-master-planner orchestrator expansion (★ user-requested)
sprint-master-planner now declares pm-lead, cto-lead, qa-lead in its Task allowlist — in addition to the existing specialist agents. Future sprints can natively orchestrate all three teams in parallel rather than relying on main-session manual dispatch.
👀 User Experience Impact
| Scenario | Before v2.1.18 | After v2.1.18 |
|---|---|---|
| L1 sprint lockout | Sprint stuck in do phase forever; only escape is /sprint init (destroys all phase history) |
/sprint trust <id> --to L3 mutates trust in place; phase history, quality gates, and feature map preserved |
| Sprint orchestration | sprint-orchestrator could not dispatch measurement agents; main session had to act as pass-through |
Orchestrator fulfills its specced routing responsibility; no main-session workaround required |
| CLI trust override | --trust L3 silently ignored at measure/phase; only --trustLevel L3 worked |
Both --trust and --trustLevel honored per SKILL.md §10.2 precedence |
| Audit trail | Trust mutations invisible to /sprint status and downstream consumers |
Every trust change persisted as sprint_trust_changed in .bkit/audit/<date>.jsonl with downgrade guardrail and --force alarm |
| Reporter scenario | @pruge s1-foundation sprint with --trust L1 — 32/32 P0 implementation done, but /sprint measure always preview, phase advance always gate_fail |
8-step E2E test reproduces the original scenario 1:1 and runs green; sprint recoverable without reset |
🧪 Tests (40 TC live PASS)
| Layer | File | TC | Purpose |
|---|---|---|---|
| Contract | test/contract/sprint-agents-tools.test.js |
17 | F1 — 4 sprint-* agents' tools: field invariant |
| Unit | test/unit/sprint-trust-normalization.test.js |
7 (A-G) | F3 — normalizeTrustLevel precedence chain |
| Unit | test/unit/sprint-handler-trust-action.test.js |
8 | F2 — handleTrust (mutation, guardrail, audit, actor) |
| E2E | test/e2e/sprint-l1-lockout-recovery.test.js |
8 | Reporter scenario 1:1 reproduction (init L1 → trust L1→L3 → measure record → audit verify → restart persistence) |
| Total | 40 TC | ×ばつ over the 14 TC target |
🧩 Methodology — First PM + CTO + QA Team Integration Sprint
v2.1.18 is the first bkit release driven end-to-end by all three Agent Teams in parallel:
- PM Team (
pm-leadorchestrating 4 PM agents) — produced a 570-line PRD with Beachhead 19/20 (Geoffrey Moore framework), JTBD 6-Part, 5 User Stories, 6 Test Scenarios, and a Pre-mortem Top 3 - CTO Team (
cto-lead) — architectural review APPROVE with CONCERNS; 3 BLOCKERs (controlScore → trustScore correction,ACTION_TYPEScount 27 → 29, NDJSON injection assessment) + 3 MEDIUMs all addressed via redline - QA Team (
qa-lead) — L1-L5 integrated verification report + reporter-scenario evidence (574 lines, QA_PASS)
Self-referential meta-risk: the sprint that fixes sprint-orchestrator itself cannot use sprint container automation. We notelined the chicken-and-egg pattern in Plan §6.1: sprint init for state tracking only, phase advance + measurement via the PDCA cycle. After F1 lands, future sprints regain full orchestration.
🔐 Differentiation 6/6 Status
- ENH-289 Defense Layer 6 — strengthened (
sprint_trust_changedjoined the L6 audit pipeline, live evidence) - ENH-292 Sequential Dispatch — activation milestone (declared → live in this release)
- ENH-286 Memory Enforcer / ENH-300 Effort-aware Defense / ENH-303 PostToolUse
continueOnBlock/ ENH-310 Heredoc Detector — unaffected, no regression
🆙 Upgrade Guide
From v2.1.17
# 1. Pull the latest plugin (in your bkit-claude-code clone) cd ~/your/bkit-claude-code git fetch --tags git checkout v2.1.18 # 2. Restart Claude Code (or reload the plugin) to pick up the new agents # 3. Verify version cat bkit.config.json | grep version # → "version": "2.1.18"
If you have a sprint stuck in L1 lockout
# Old workaround (v2.1.16/v2.1.17): re-init (destroys state) # ❌ /sprint init my-sprint --name "..." --trust L2 # phaseHistory lost # New v2.1.18 path: mutate in place /sprint trust my-sprint --to L3 --reason "ready to record measurements" /sprint measure my-sprint --gate M1 # now records (mode: "record"), no longer preview /sprint phase my-sprint --to iterate # gate transitions succeed
Breaking changes
None. v2.1.18 is 100% backward compatible:
- Existing
--trustLevel L<N>users: precedence chain preserved (F3 Case G test) - Sprint state schema: unchanged
- ADR 0003: 14/14 PASS — 16-cycle consistency milestone
🗂️ Carryovers — deferred to v2.1.19+
Not release blockers, captured in the sprint report (docs/04-report/features/v2118-sprint-trust-ux-fix.report.md):
- CO-1 — baseline v2.1.18 capture script
- CO-2 — qa-aggregate script
- CO-3 —
sprint-orchestratorauto phase-advance live test - CO-4 — sub-agent-dispatcher state transition test
- CO-5 — L0 Manual mode escalate-path E2E
- CO-6 —
sprint-report-writeragent timeout fallback
Follow-up issues from the same reporter (#103, #104, #105 — failure-reporter resolution markers, sprint init context auto-import, generate-report quality-gates section) are acknowledged and slated for v2.1.19+ planning.
🙏 Credits
Massive thanks to @pruge (james kim) for the high-signal bug reports. The reproductions in #100/#101/#102 were so precise we could turn them directly into an E2E test (test/e2e/sprint-l1-lockout-recovery.test.js) before writing a single line of fix code. That's the kind of issue every maintainer dreams of.
If you hit similar sprint-related friction, please open an issue — the v2.1.18 turnaround (issue filed → fix released in 24h) is the sta...
Assets 2
bkit v2.1.17 — CI/CD Hardening, 5-Axis Matrix 5/5 Close
39f89e6 bkit v2.1.17 — CI/CD Hardening, 5-Axis Matrix 5/5 Close
Headline: Permanent closure of the 8-day
Invocation Contract Checkred incident class from 2026年05月12日 to 2026年05月20日. CI/CD maturity matrix (Detection / Enforcement / Recovery / Governance / Evolution) closed across all 5 axes. All 11 carryover items resolved.
🎯 Highlights
Incident Class Permanently Closed
On 2026年05月12日, commit 967cd8f (refactor v2.1.13) removed six pdca-eval-* agents as dead code cleanup. The baseline v2.1.9 manifest was not updated, and the Agent surface lacked a deprecatedIn governance mechanism (which Skill already had). This caused the Invocation Contract Check workflow to fail on every push for 8 consecutive days. Releases v2.1.15 and v2.1.16 GA shipped while CI was red. This v2.1.17 release closes every known root cause and carryover in the incident class.
5-Axis Matrix Progression
| Axis | v2.1.16 GA | v2.1.17 |
|---|---|---|
| Detection | ◐ L1+L4 only | ●くろまる●くろまる Dual baseline + L2 + L3 + L5 mandatory + MCP schema |
| Enforcement | ✗ | ●くろまる Branch protection auto-applied (2 Required Status Checks) |
| Recovery | ✗ | ●くろまる●くろまる Rollforward SOP + tracked file policy guide |
| Governance | ◐ Skill only | ●くろまる●くろまる Skill + Agent + MCP symmetric + isolated tests (5+6 scenarios) |
| Evolution | ✗ | ●くろまる●くろまる Dual baseline + frontmatter util + SoT canonical names |
5/5 close ✅
📦 Changes
Detection
- Dual baseline:
v2.1.9 LTS(long-term drift) +v2.1.16 Latest(noise floor) compared simultaneously - L2 mandatory:
l2-smoke.test.js(98 TC) +l2-hook-attribution.test.js(13 TC) integrated into workflow - L3 mandatory:
l3-mcp-compat.test.js(92 TC) +l3-mcp-runtime.test.js(48 TC) integrated into workflow - L5 mandatory (CO-3): removed
continue-on-error: truefrominvocation-inventory.test.js+ addedneeds: contract-l1-l4(203 → 210 TC with SoT-driven lists) - MCP deprecation schema (CO-2): inline
// @deprecated since vX.X.X replacedBy=Yannotation parsing scripts/check-test-tracking.js(CO-7): detects untracked test files across 18 production test paths (CI gate)
Enforcement
scripts/setup-branch-protection.sh(CO-1, idempotentgh apiwrapper) — auto-applied to main:- Required Status Checks:
Contract Test (L1 Frontmatter + L4 Deprecation),Contract Test L5 (Invocation Inventory) strict: true,allow_force_pushes: false,allow_deletions: falseenforce_admins: false(admin override allowed for emergency hotfixes)
- Required Status Checks:
Recovery
docs/06-guide/contract-baseline-rollforward.guide.md: LTS vs Latest policy, decision tree, capture/deprecation stub procedures, PR self-review checklist, incident log (8 sections)docs/06-guide/test-file-tracking-policy.guide.md(CO-6):.gitignorepolicy + PR checklist + incident log (9 sections)docs/06-guide/branch-protection-setup.guide.md(CO-1): admin SOP
Governance
- Agent deprecation governance:
agents/<name>.mdfrontmatter withdeprecatedIn: vX.X.Xbypasses L4 — symmetric with the Skill pattern - 6
pdca-eval-*deprecation tombstones:agents/pdca-eval-{act,check,design,do,plan,pm}.md(permanent tombstones for the 5/12 cleanup) - MCP tool deprecation governance: L4 bypass via baseline JSON
deprecatedInfield — full symmetry across 3 surfaces (Skill / Agent / MCP) - Agent-deprecation isolated test (CO-4):
test/contract/agent-deprecation.test.js, 5 scenario fixture, 5/5 PASS - MCP-deprecation e2e test (CO-2.1):
test/contract/mcp-deprecation.test.js, 6 scenario fixture, 6/6 PASS
Evolution
lib/util/frontmatter.js(CO-5): consolidated 5-site duplication —parseFrontmatter,parseFrontmatterFile,hasDeprecatedInFrontmatter,hasDeprecatedInFrontmatterFile,coerce- v2.1.16 baseline captured (
test/contract/baseline/v2.1.16/, 106 files) - SoT canonical names lists (CO-3.1): added 6 lists to
lib/domain/rules/docs-code-invariants.js—EXPECTED_ACTIVE_AGENT_NAMES,EXPECTED_DEPRECATED_AGENT_NAMES,EXPECTED_SKILL_NAMES,EXPECTED_HOOK_EVENT_NAMES,EXPECTED_PDCA_MCP_TOOLS,EXPECTED_ANALYSIS_MCP_TOOLS
Hygiene
- Removed 12 orphan JSON files from
test/contract/baseline/v2.1.9/(sprint-* agents/MCP tools/skills missing from manifest) - Force-tracked 35+ previously untracked test files:
tests/qa/29 +test/contract/5 +test/e2e/6 +test/integration/3 +test/unit/2 +test/v2110-qa/2 .gitignorenarrowed: removedtest/+tests/*blanket ignore → explicit local-only patternsscripts/check-deadcode.jsEXEMPT pattern broadened (v2.1.13 sprint barrel, 3 files)
Framework Side-Effect Blocking
collect*implicit-write prevention:{ persist: false }option blocks baseline self-mutation--versionpath-injection validation (CO-1.1):regex ^[A-Za-z0-9._-]+$, exits with code 2 on invalid input--project-rootflag: makes contract-test-run.js + contract-baseline-collect.js fixture-aware
📊 Quantitative Results
| Metric | v2.1.16 GA | v2.1.17 | Delta |
|---|---|---|---|
| qa-aggregate PASS | 3,808 | 4,103 | +295 |
| qa-aggregate FAIL | 31 | 0 | -31 |
| qa-aggregate Errors | 4 | 0 | -4 |
| Mandatory workflow steps | 13 | 18 | +5 |
| Baseline snapshots | 1 | 2 (LTS + Latest) | +1 |
| Active agents | 34 | 34 | 0 |
| Deprecation tombstones | 0 | 6 | +6 |
| Frontmatter parse sites | 5 (duplicate) | 1 (lib/util/) |
-4 |
| Hardcoded EXPECTED lists | 7 | 0 (SoT) | -7 |
| Branch protection | ✗ | 2 Required Checks | — |
| Carryover items | 11 | 0 | -11 |
| 5-Axis Matrix | 0/5 | 5/5 ✅ | — |
🗂 11 Carryover Closures
| ID | Item | Status |
|---|---|---|
| CO-1 | Branch protection automation | ✅ Script + applied |
| CO-1.1 | --version path-injection validation | ✅ Regex |
| CO-2 | MCP tool deprecation schema | ✅ parseMCPToolBlocks |
| CO-2.1 | MCP deprecation e2e test | ✅ 6/6 PASS |
| CO-3 | L5 E2E mandatory promotion | ✅ Workflow |
| CO-3.1 | L5 dynamic EXPECTED lists | ✅ SoT integration |
| CO-4 | Agent-deprecation isolated test | ✅ 5/5 PASS |
| CO-5 | frontmatter util extraction | ✅ 5 sites → 1 |
| CO-6 | Tracked file policy | ✅ Narrow + 35+ files |
| CO-7 | tests/qa dependency automation | ✅ check-test-tracking |
| CO-8 | branch-protection apply audit | ✅ admin applied & verified |
🔗 Pull Requests
- PR #97 (7acdd4f): v2.1.17 main scope — 4/5 axes close
- PR #99 (39f89e6): v2.1.17 final — 5 carryover items absorbed + 5/5 axes close
📚 Documentation
docs/01-plan/features/v2117-ci-cd-hardening.plan.md— Plandocs/02-design/features/v2117-ci-cd-hardening.design.md— Designdocs/03-analysis/features/v2117-ci-cd-hardening.analysis.md— Gap analysisdocs/04-report/features/v2117-ci-cd-hardening.report.md— Completion reportdocs/06-guide/contract-baseline-rollforward.guide.md— Baseline SOPdocs/06-guide/branch-protection-setup.guide.md— Branch protection SOPdocs/06-guide/test-file-tracking-policy.guide.md— Test tracking policy
🙏 Origin
The incident class started with commit 967cd8f (refactor v2.1.13, 2026年05月12日) — a 6-agent dead code cleanup combined with an unupdated baseline produced an 8-day red period. This release closes every known framework gap.
🤖 Released with Claude Code
Assets 2
v2.1.16 — Quality Gates & Approval UX + Release Hardening
7330103 bkit v2.1.16 — Quality Gates & Approval UX + Release Hardening
The only Claude Code plugin that verifies AI-generated code against its own design specs.
This is a patch release that closes 4 GitHub issues reported by L2 trust users hitting Sprint quality-gate deadlocks, plus a release hardening sub-sprint that terminates the v2.1.14/15/16 recurring "thoroughly verified but defects shipped" pattern.
✨ Highlights
1. L2 trust deadlock is gone
Default-trust users (L2) can now drive Sprints to completion without escalating trust level or editing state JSON manually. Every quality gate hold now has a user-invokable command:
/sprint phase <id> --to <next> --approve --reason "<text>"— single-use Trust Level scope-boundary escape hatch (Issue #95)/sprint measure <id> --gate <key>— partial-gate manual measurement when the orchestrator missed one (Issue #94)
2. Quality gates auto-document their own failures
When advancePhase is blocked by gate_fail, bkit now automatically writes docs/03-analysis/<sprintId>-gate-fail-<phase>-<timestamp>.md with a detailed gate summary, suggested next commands (including /sprint measure --gate <key> and --approve hints), and persists lastGateFailure to sprint state for /sprint status to surface (Issue #93).
3. sprint-orchestrator measurement responsibility clarified
At design exit, the orchestrator now records both M4_apiComplianceRate and M8_designCompleteness atomically through the new lifecycle.measurePhaseGates() use case. This eliminates the silent "M8=100, M4=null → gate_fail" deadlock reported in Issue #92. The canonical measurement path (lib/application/quality-gates/measure-router.js) is shared between auto-advance and manual /sprint measure so values agree.
4. Release hardening — tests exist but unused anti-pattern terminated
The user asked "are 5 fixes really all? did you verify all bkit features properly?" — this triggered a deeper audit that uncovered 31 stale test failures, 4 orphan test files (modules deleted in v2.1.10), and 3 release metadata drift defects that existing tests had been catching all along but were never wired into CI.
All resolved. CI now mandates two new release gates: bkit-full-system (version sync + structure) and docs-code-sync (counts SoT drift detector).
🎯 User Experience Changes
Before v2.1.16 (the L2 user's journey)
$ /sprint init my-sprint --name "Test" --trust L2 --features f1,f2
$ /sprint start my-sprint
# ... auto-advances prd → plan → design ...
# pauses at design boundary (L2 scope.stopAfter)
$ /sprint phase my-sprint --to do
{
"ok": false,
"reason": "requires_user_approval",
"stopAfter": "design"
}
# 🛑 deadlock — no command to give approval
# workarounds violated bkit philosophy: edit state JSON OR escalate trust
After v2.1.16
$ /sprint phase my-sprint --to do --approve --reason "design review complete"
{
"ok": true,
"phase": "do",
"approvalRecord": {
"sprintId": "my-sprint",
"from": "design",
"to": "do",
"trustLevel": "L2",
"stopAfter": "design",
"approvedBy": "user",
"reason": "design review complete"
}
}
# ✅ proceeds — single-use approval, sprint.autoRun.scope NOT mutated
# next boundary will face the same check (deliberate "controllable AI" design)
Quality gate failure UX
$ /sprint phase my-sprint --to do
{
"ok": false,
"reason": "gate_fail",
"reportPath": "docs/03-analysis/my-sprint-gate-fail-design-2026年05月20日T05-31-22-411Z.md",
"gateResults": { ... }
}
$ cat docs/03-analysis/my-sprint-gate-fail-design-*.md
# Gate Failure Report — my-sprint @ design→do
# ...
| Sprint Phase | Gate | Status | Expected | Actual | Suggested Action |
| design | M4 | FAIL | threshold = 95 | null (not_measured) | /sprint measure --gate M4 |
| design | M8 | FAIL | threshold = 85 | null (not_measured) | /sprint measure --gate M8 |
# ...suggested next commands inline...
/sprint measure — first-class command
$ /sprint measure my-sprint --gate M4
# routes to agents/gap-detector.md via GATE_MEASUREMENT_ROUTES
# writes back to sprint.qualityGates.M4_apiComplianceRate.current
# audit-logged as gate_measured (at L2+ trust)
7 gates supported: M1 M2 M3 M4 M7 M8 S1.
📊 Numbers
| Metric | Before | After |
|---|---|---|
| Open issues blocking L2 users | 4 (#92/93/94/95) | 0 |
| L3 sprint contract tests | 10/10 PASS | 14/14 PASS (+4 new SC) |
| Aggregate test PASS | 3,808 | 3,844 (+36) |
| Aggregate test FAIL | 31 | 0 |
| Test files with errors | 15 | 0 |
| Test files | 151 | 147 (-4 orphan) |
| Release metadata defects | 3 | 0 |
| Mandatory CI gates | 8 | 10 (+2 release gates) |
🐛 Bug Fixes
- #92 —
sprint-orchestratoragent records bothM4_apiComplianceRateandM8_designCompletenessat design exit (atomic dual record vialifecycle.measurePhaseGates). Resolves silent gate_fail deadlock on design → do transition. (Reporter: @pruge, bkit v2.1.14, CC v2.1.140, L2 trust)
✨ Added
- #95 —
/sprint phase <id> --to <next> --approve [--reason "<text>"]— single-use Trust Level scope-boundary escape hatch. Records audit-logger entryscope_boundary_approvedwith full provenance.sprint.autoRun.scopenot mutated (deliberate "controllable AI" design). (Reporter: @pruge) - #94 —
/sprint measure <id> --gate <key>and--gates <K1,K2,...>and--phase <name>— partial-gate manual measurement command. Routes throughlib/application/quality-gates/measure-router.jsto the canonical measurement agent (gap-detectorfor M4, etc.). Supports 7 gates: M1, M2, M3, M4, M7, M8, S1. (Reporter: @pruge) - #93 — Automatic gate-failure report generation.
advancePhasenow invokes afailureReporterdep ongate_fail, writingdocs/03-analysis/<sprintId>-gate-fail-<phase>-<ts>.mdwith gate summary, suggested next commands, and audit trail pointers.sprint.lastGateFailurestate field persists for/sprint statussurfacing. (Reporter: @pruge)
🛠 Release Hardening (sub-sprint)
- Layer A — Release metadata sync (3 files): README badge
Version-2.1.14→Version-2.1.16,hooks/session-start.jsandhooks/startup/session-context.jsversion comments synced. - Layer B — Test maintenance (15 files, 31 stale FAIL → 0): 4 orphan test files removed (modules deleted in v2.1.10 Sprint 6), 11 stale baselines updated to current SoT (skills 43 → 44, agents 36 → 34, mcpTools 16 → 19, lib modules 142 → 177+, status-core exports 17 → 19, L3 contract 10/10 → 14/14, JS floating-point comparison epsilon fix, etc.).
- Layer C — CI gate reinforcement (1 file):
.github/workflows/contract-check.ymladds two mandatory steps —bkit-full-system(version sync + agent/skill structure) anddocs-code-sync(counts SoT drift detector). Future PR/push auto-blocks on release metadata drift.
🧱 Architecture
New modules:
lib/application/quality-gates/measure-router.js— Pure dispatcher mapping gate keys (M1/M2/M3/M4/M7/M8/S1) to agents and result schemaslib/application/quality-gates/failure-reporter.js— Pure markdown builder + side-effectingwriteReport+ factorycreateFailureReporterlib/application/sprint-lifecycle/measure-gate.usecase.js—measureGate/measureGates/measurePhaseGates(atomic phase exit measurement)templates/gate-failure-report.template.md— substitutable template
New audit action: scope_boundary_approved (28th ACTION_TYPE).
L3 contract grew with 4 new structural invariants: SC-11 (#92), SC-12 (#95), SC-13 (#94), SC-14 (#93).
📚 Documentation
Following bkit PDCA methodology (Plan → Design → Do → Check → Act), full documentation chain shipped:
docs/01-plan/features/v2116-issue-fixes.master-plan.md— sprint master plandocs/01-plan/features/v2116-release-hardening.plan.md— hardening plandocs/02-design/features/v2116-release-hardening.design.md— hardening designdocs/04-report/features/v2116-issue-fixes.report.md— feature fix reportdocs/04-report/features/v2116-release-hardening.report.md— hardening reportCHANGELOG.mdv2.1.16 section with full hardening sub-section
🙏 Thanks
Special thanks to @pruge for the 4 well-written reports (#92, #93, #94, #95) that pinpointed the exact L2 deadlock surfaces.
📦 Install / Upgrade
# Claude Code plugin marketplace claude plugin install bkit # Or upgrade existing claude plugin upgrade bkit
Recommended CC version: v2.1.123+ (conservative) or v2.1.144 (balanced) — bkit v2.1.16 verified compatible with 99 consecutive CC releases from v2.1.34 through v2.1.144.
🔗 Links
- PR: #96
- Closed issues: #92, #93, #94, #95
- Plan / Design / Report: see Documentation section above
- Full changelog: CHANGELOG.md
🤖 Generated with Claude Code
Assets 2
v2.1.15 — Issue #89 fix: .pdca-status.json pollution (6-Layer Defense)
b65d336 🎯 What's New in v2.1.15
Patch release — A defense-in-depth fix for #89, reported by @doing27. This release stops .pdca-status.json from growing unbounded with garbage feature entries on every source-file edit.
TL;DR
Before:
.pdca-status.jsoncould grow to 294 KB with 138 of 147featuresentries being garbage and 1,661 of 1,669historyentries being noise, all written silently by thePreToolUse(Write|Edit)hook.After v2.1.15: 6-Layer Defense ensures only PDCA-registered features (with an existing plan or design document) ever enter
.pdca-status.json, history is deduplicated and ring-buffered to 100 entries, and 48 unit tests guard against regressions.
🔍 Root-Cause Analysis (Deep)
Issue #89 surfaced two bugs (extractFeature misextraction + updatePdcaStatus gating absence). During this release we discovered three additional latent bugs in the same code path and fixed all of them together:
| # | Latent bug | File:Line | Symptom |
|---|---|---|---|
| 1 | extractFeature captures filenames as features |
lib/core/file.js:109 |
app/services/broadcast_service.py → 'broadcast_service.py' (a file, not a feature) |
| 2 | extractFeature fallback returns generic dirs |
lib/core/file.js:129 |
apps/cms/v1/users.py → 'cms'/'v1' registered as features |
| 3 | updatePdcaStatus has no validation gate |
lib/pdca/status-core.js:164 |
Every invocation registers a feature unconditionally |
| 4 | pre-write.js reads a non-existent schema field |
scripts/pre-write.js:101 |
Read currentFeature (v1 schema name) instead of primaryFeature (v2/v3 schema). The v2.1.7 "phantom-feature guard" never actually fired |
| 5 | extractFeatureFromContext duplicates the buggy pattern-match |
lib/pdca/status-core.js:320 |
DRY violation — same bug as 1+2 cloned into a second module |
| 6 | updatePdcaStatus.history.push has no limit/dedup |
lib/pdca/status-core.js:210 |
Unbounded growth; addPdcaHistory had a 100-entry cap, but the direct push did not |
🛡️ The 6-Layer Defense
edit event
│
▼
[L1] extractFeature → reject file-like captures + extended GENERIC_NAMES + fallback opt-in
│
▼ (legitimate feature candidate)
[L2] extractFeatureFromContext → delegates to L1 (DRY)
│
▼
[L4] pre-write.js → corrected schema field (primaryFeature)
│
▼
[L3] updatePdcaStatus → plan/design document gate (default ON)
│
▼
[L5] history dedup + ring buffer
│
▼
[L6] 48 unit tests → regression prevention (CI-tracked)
Layer 1 — lib/core/file.js
- Reject filename captures: the pattern-matching loop now skips any candidate that has a file extension (
path.extname(captured)non-empty).app/services/broadcast_service.pyno longer extracts'broadcast_service.py'. - GENERIC_NAMES expanded 19 → 65 entries — covers common backend/frontend layout directories (
api,web,mobile,client,server,backend,frontend,admin,auth,cms,database,config,core,helpers,middleware,plugins,scripts,styles,static,public,assets,tests,tenants,versions,tmp,audit,dashboard), version directories (v1–v9), and Next.js route groups ((dashboard),(auth),(public),(admin),(api)). - Fallback is now explicit opt-in:
extractFeature(filePath, { allowFallback: true }). Existing callers receive the safer default (no parent-directory walk) automatically. Backward-compatible — second argument is optional.
Layer 2 — lib/pdca/status-core.js
extractFeatureFromContext now delegates to extractFeature. The duplicated, buggy pattern-matching loop is removed (DRY).
Layer 3 — lib/pdca/status-core.js
updatePdcaStatus(feature, phase, data, opts = {}) accepts a new opts.requireDocs flag (default true). When enabled, the function performs a findPlanDoc(feature) || findDesignDoc(feature) check; if neither exists, it returns silently (with a debugLog entry for forensics).
- All 16 existing callers continue to work unchanged — PDCA lifecycle always creates a plan document before downstream phases invoke
updatePdcaStatus. - The
shouldUpdate(feature, requireDocs, docCheckFn)helper is extracted as a pure function for unit testability and dependency injection (docCheckFnparameter lets tests inject a stub).
Layer 4 — scripts/pre-write.js
The v2.1.7 phantom-feature guard read currentStatus?.currentFeature — a v1-schema field that does not exist in v2/v3 schemas. The migration code in status-migration.js:31,74 renames currentFeature → primaryFeature, so the guard's comparison was always undefined === <feature> (always false), causing every edit to be skipped silently.
Fixed: now reads currentStatus?.primaryFeature, restoring the guard's intended behavior. Combined with the L3 document gate, this gives two independent layers of protection.
Layer 5 — lib/pdca/status-core.js
A new appendHistoryEntry(history, entry, limit = 100) pure function:
- If the last entry has the same
feature,phase, andaction, only thetimestampis updated — no push. - Otherwise, push and apply a ring buffer of
limitentries (default 100).
Effect: 100 identical edits = 1 history entry (timestamp refreshed each time). Phase transitions and feature changes still produce new entries normally.
Layer 6 — Unit tests (tests/unit/)
| Test file | TCs | Verifies |
|---|---|---|
tests/unit/file-extract-feature.test.js |
20 | L1 (extractFeature) — filename rejection, GENERIC_NAMES enforcement, fallback opt-in |
tests/unit/extract-feature-from-context.test.js |
10 | L2 (delegation correctness) |
tests/unit/pdca-status-gating.test.js |
18 | L3 (shouldUpdate) + L5 (appendHistoryEntry) — dedup, ring buffer, custom-limit, Issue #89 scenario |
| Total | 48 | 48/48 PASS |
.gitignore now tracks tests/unit/** (matching the existing policy for tests/contract/**), so these tests run in CI.
👤 User Experience — What Changes
1. .pdca-status.json stays small and accurate
| Scenario | Before v2.1.15 | After v2.1.15 |
|---|---|---|
Edit app/services/broadcast_service.py 50 times |
features.broadcast_service.py appears + 50 history entries |
No features entry created; no history entries |
Edit apps/cms/v1/users.py |
features.cms appears as phantom |
No entry — cms and v1 are now in GENERIC_NAMES |
Edit a file in docs/01-plan/features/auth.plan.md-registered feature auth |
50 history entries | 1 history entry (timestamp refreshed) |
File path with no matching pattern (e.g. random/path/foo.py) |
'random' registered via fallback |
Returns '' (fallback is now opt-in) |
| Long-running session, hundreds of edits | .pdca-status.json grows to hundreds of KB |
Stays under a few KB, history capped at 100 |
2. PDCA workflow is unchanged for legitimate cycles
/pdca plan billingcreatesdocs/01-plan/features/billing.plan.md→ L3 gate passes →updatePdcaStatus('billing', 'do', ...)registers normally.- All 16 existing
updatePdcaStatuscall sites (hook scripts, lifecycle, batch orchestrator, etc.) continue to work because PDCA's natural ordering guarantees a plan document exists before downstream phases run.
3. Generic directory names are no longer auto-registered
auth, cms, dashboard, admin, api, etc. are now in GENERIC_NAMES. Trade-off: if you want one of these as a real PDCA feature, register it explicitly with /pdca plan auth (which creates the plan document and sets primaryFeature). The L3 gate then accepts it on subsequent edits. The previous behavior of silently auto-creating feature entries for such common directory names was the primary source of garbage.
4. History is now noise-free
The bkit dashboard, audit log, and any tools that read .pdca-status.json.history get meaningful entries only: real phase transitions and real feature changes. No more "updated ×ばつ 1,661" noise drowning out the 8 real cycles.
5. Zero migration required
This is a patch release with no breaking changes. Upgrade by re-installing or running git pull in your bkit plugin directory. The fix is active immediately.
Optional cleanup: if your existing
.pdca-status.jsonis already polluted, back it up and let bkit re-initialize:mv .bkit/state/pdca-status.json .bkit/state/pdca-status.json.bak # bkit will recreate a clean v3 schema on next session
🔁 Compatibility
| Item | Status | Note |
|---|---|---|
| Breaking changes | ✅ None | All 16 existing updatePdcaStatus callers receive default behavior |
| Schema migration | ✅ Not required | v3 schema unchanged |
extractFeature(filePath) → extractFeature(filePath, opts = {}) |
✅ Backward-compatible | Second argument optional |
updatePdcaStatus(feature, phase, data) → updatePdcaStatus(feature, phase, data, opts = {}) |
✅ Backward-compatible | Fourth argument optional |
| Sprint Management (v2.1.13 GA) | ✅ Unaffected | |
| bkit Trust Level (L0–L4) | ✅ Unaffected | |
| Soft change | auth/cms/dashboard are now in GENERIC_NAMES |
Workaround: register explicitly via /pdca plan <feature> |
📁 Files Changed (16)
Code (3)
lib/core/file.js(Layer 1)lib/pdca/status-core.js(Layers 2, 3, 5)scripts/pre-write.js(Layer 4)
Unit tests (3, new)
tests/unit/file-extract-feature.test.js(20 TCs)tests/unit/extract-feature-from-context.test.js(10 TCs)tests/unit/pdca-status-gating.test.js(18 TCs)
Metadata + version sync (6)
bkit.config.json.claude-plugin/plugin.json.claude-plugin/marketplace.jsonhooks/hooks.jsonscripts/unified-bash-post.js.gitignore(added `!t...
Assets 2
v2.1.14 — Differentiation Release (Memory Enforcer + Layer 6 + Sequential Dispatch + Effort-aware + PostToolUse + Heredoc-bypass)
a8091c0 bkit v2.1.14 — Differentiation Release
The only Claude Code plugin that verifies AI-generated code against its own design specs.
bkit v2.1.14 is a product-moat release. Every one of bkit's differentiations against vanilla Claude Code is now physically implemented, contract-tested, and proven by live dogfooding evidence collected during this release cycle.
Verified on Claude Code v2.1.140. 95 consecutive compatible CC releases (v2.1.34 ~ v2.1.141, R-2 skip versions excluded). Zero breaking changes for existing v2.1.13 users — the upgrade is drop-in.
✨ Highlights
Six differentiations, all enforced
The previous "five differentiations" framing is retired. v2.1.14 promotes one new defense and makes the full set physically implemented (no more "advisory") and contract-tested.
| # | Differentiation | Mitigates | Implementation |
|---|---|---|---|
| 1 | Memory Enforcer | CC's CLAUDE.md is advisory — models override it | PreToolUse deny with audit trail |
| 2 | Layer 6 Defense | R-3 safety-hook-ignored regressions (10+ evolved forms tracked) | Post-hoc audit + alarm + auto-rollback |
| 3 | Sequential dispatch | CC #56293 sub-agent caching 10x regression (11-streak unresolved) | 8-state dispatcher: first sequential → warmup sample → restore parallel |
| 4 | Effort-aware adaptive defense | CC's new effort.level hook field exposed but no defense uses it |
Invariant 10 (ADR 0010) routes defense intensity off effort tier |
| 5 | PostToolUse continueOnBlock |
CC silent PostToolUse drop (#57317) — bkit's three PostToolUse hooks survive failures gracefully | Block-aware continuation flag across unified-bash-post, unified-write-post, skill-post |
| 6 | Heredoc-pipe bypass guard | CC #58904 — model can smuggle shell input through $(cmd <<TAG ... TAG) past defense |
Token-aware heredoc-inside-substitution detector |
Live dogfooding caught two events during this release
Two events were captured while shipping v2.1.14 itself, recorded in docs/sprint/v2114/sub-sprint-6-observation.report.md:
- Memory Enforcer deny on an out-of-scope
Writefrom a sub-agent — exact case the feature exists for. - Heredoc Detector classify on the v2.1.14 release commit message itself — the guard correctly fired on
git commit -m "$(cat <<'EOF' ... EOF)", forcing this release to use the safergit commit -F file.txtpath instead. Reviewers can see this in the PR conversation.
Architecture growth (v2.1.13 → v2.1.14)
| v2.1.13 | v2.1.14 | Delta | |
|---|---|---|---|
| Lib subdirs | 19 | 20 | +lib/defense/ |
| Port↔Adapter pairs | 7 | 8 | +caching-cost |
| ADR invariants | 9 | 10 | +0010 effort-aware |
| Lib modules | 163 | 174 | +11 (5 defense + 6 orchestrator/domain/infra) |
| Contract suites | 3 base + 5 v2.1.13 | 3 base + 5 v2.1.13 + 5 v2.1.14 | +22 TC |
| Skills / Agents / Hooks | 44 / 34 / 21 events / 24 blocks | unchanged | — |
Closed work this release
- CARRY-5 — OTEL token-meter zero-entries root cause identified and closed (subprocess env hydration via
lib/infra/otel-env-capturer.js). - F9-120 closure —
claude plugin validate .Exit 0 across 10 consecutive CC releases (v2.1.120/121/123/129/132/133/137/139/140/141). Carryover officially retired.
🧑💻 User Experience Changes
This section explains what changes for you in day-to-day use. TL;DR: bkit gets safer and quieter — you do not need to change your workflow.
What you'll feel differently
1. Some shell commands you used to run will now be blocked
If you (or a sub-agent on your behalf) try to run a shell command that contains a heredoc inside a command substitution — for example:
git commit -m "$(cat <<'EOF' ... message ... EOF )"
...you'll see a clear block message with safer alternatives. Use git commit -F file.txt or git commit -m '...' instead. This is the heredoc-bypass guard (differentiation #6) protecting against CC #58904. The block ships with three suggested alternatives so you can usually fix it in one move.
If you're certain a specific case is safe in your context, the guard's source (lib/defense/heredoc-detector.js) is auditable and the block can be lifted per-tool-call after manual review.
2. Sub-agent batches feel slower for the first spawn — then faster overall
When bkit spawns multiple sub-agents (Plan parallel, Design council, Do swarm, Check council), the first sibling now runs sequentially so its cache warms up before the rest go parallel. You'll see one extra spinner tick at the start of a multi-agent step, but total cost drops dramatically — pre-fix runs were paying ~10x cache_creation_input_tokens per spawn because of CC #56293.
If you trust your context and want the old behavior, set BKIT_SEQUENTIAL_DISPATCH=0 before launching CC. At Trust Level L4 sequential dispatch is forced regardless.
3. Memory rules are now actually enforced, not just suggested
When CLAUDE.md / project memory says "never touch X", bkit's Memory Enforcer now physically blocks the write at PreToolUse and emits an audit log entry — rather than logging an advisory that the model can ignore. If you've been writing memory rules and seeing them silently bypassed, this is the fix.
If you need to override for a one-off, the deny carries an explicit "manual override" instruction in its block message.
4. Defense intensity now adapts to effort.level
CC v2.1.133+ exposes an effort.level field on hook payloads (think "how hard is this turn working"). bkit's defenses now read this field via Invariant 10 (ADR 0010) and dial themselves down for trivial turns and up for heavy turns. You should see fewer false positives on quick edits and stronger checks on long multi-step turns.
5. PostToolUse hooks survive their own failures
CC's silent PostToolUse drop (#57317) used to mean a single failing hook could kill all downstream observability. v2.1.14's continueOnBlock flag means a failure in unified-bash-post no longer prevents skill-post from logging. Your audit trail is more complete with no action on your part.
6. Cache-cost telemetry is now real
The previous "zero entries" you may have seen in the token-meter adapter (CARRY-5) was a subprocess env-propagation bug. lib/infra/otel-env-capturer.js now hydrates OTEL_* into hook subprocesses on SessionStart, so cost dashboards finally show populated data. Combined with the new caching-cost Port↔Adapter pair, the dispatcher can make data-driven decisions about when to go parallel.
What stays the same
- Skills, Agents, Hook events, MCP tool counts — unchanged from v2.1.13.
- PDCA 9-phase enum + Sprint 8-phase enum — frozen (Application Layer pilot, ADR 0005).
- Trust Level L0 ~ L4 semantics +
SPRINT_AUTORUN_SCOPE— unchanged. - Existing
/pdca,/sprint,/control,/bkit-explore,/audit,/rollbackslash commands — unchanged signatures.
Upgrade path
# in your project (Claude Code plugin auto-pulls; no command needed) # verify: claude plugin info bkit # should show 2.1.14
Project memory files, sprint state files, and PDCA documents written by v2.1.13 are read by v2.1.14 unchanged. No migration scripts.
Recommended CC version
| Profile | Recommended CC | Why |
|---|---|---|
| Conservative | v2.1.123 | Stable, OAuth fix, fail-isolated hooks |
| Balanced | v2.1.140 | Latest before the v2.1.141 60-bullet bumper observation window |
| Latest | v2.1.141 | Acceptable — heredoc bypass guard already covers #58904 |
See docs/06-guide/version-policy.guide.md for the full dist-tag 3-Bucket framework.
🧪 Verification
- Tests — 291 existing + 22 new v2.1.14 contract tests = 313 PASS, 0 FAIL.
scripts/verify-full-system.js— 11/11 categories PASS (BKIT_VERSION sync, hooks reachability, domain purity, contract suites, invariants, defenses, observability ports, dist-tag drift, ENH backlog hygiene, dogfooding evidence, release-bundle inventory).- CC plugin validate —
claude plugin validate .Exit 0 (10 consecutive cycles). - Domain layer purity — 12 files, 0 forbidden imports (CI-enforced).
📚 Documents
- Master plan:
docs/sprint/v2114/master-plan.md - PRD / Plan / Design:
docs/sprint/v2114/{prd,plan,design}.md - Sub-sprint reports:
docs/sprint/v2114/sub-sprint-{1..6}-*.report.md - ADR 0010 (Invariant 10 effort-aware):
docs/adr/0010-effort-aware-invariant.md - CC version monitoring guide:
docs/06-guide/cc-version-monitoring.guide.md - Version policy guide:
docs/06-guide/version-policy.guide.md
🙏 Acknowledgements
This release was built using bkit itself across six PDCA-style sub-sprints, and the heredoc-pipe guard caught its own release commit during ship — exactly the kind of dogfooding evidence the differentiation framing is meant to produce.
🤖 Built with Claude Code