Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: popup-studio-ai/bkit-claude-code

v2.1.22 — Hardening Release

02 Jun 02:12
@agent-kay-it agent-kay-it
f077545
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

bkit v2.1.22 — Hardening Release

A stability- and consistency-focused release. There are no new commands or workflows to learn — instead, this release removes friction that previously affected Windows users and anyone running Sprint/PDCA stop hooks, and pays down internal complexity so future changes ship more safely.

This release is regression-free: verified by a full-suite baseline diff (320 → 321 test files, 0 net new failures).


✨ Highlights

  • 🪟 Windows is no longer broken at load time. The frontmatter parser used a /^---\n/ fence pattern that silently failed on Windows CRLF (---\r\n) files — meaning skills, agents, and output-styles could fail to load entirely. Fixed across 6 fence sites and 34 line-splitting sites, while staying byte-identical on macOS/Linux (LF).
  • 🛑 No more "Hook JSON output validation failed" errors. Running /sprint list (and other Sprint/PDCA actions) could surface Stop hook error: Hook JSON output validation failed — (root): Invalid input. The root cause — a mistyped hook decision enum shared by 5 Stop emitters — is fixed, with a contract test added so it can't regress.
  • 🧹 Leaner, safer internals. The 4 largest "god files" (>700 LOC) were split down to focused modules (sprint-handler.js 1509 → 271, state-machine.js 985 → 406, automation.js 770 → 451, unified-stop.js 751 → 693) with behavior fully preserved.
  • 🌐 English CHANGELOG. The full CHANGELOG.md (59 release sections) is now in English, aligning with the project's global-service language policy.
  • 🔗 Up-to-date with Claude Code. Validated against CC v2.1.146 → v2.1.159; recommended CC version bumped to v2.1.159 (balanced) / v2.1.150 (conservative). Continuous-compatibility streak extended 101 → 112.

👤 What changes for you

If you... What you'll notice
Run bkit on Windows Skills/agents/output-styles that previously failed to load now load correctly (CRLF handled).
Use /sprint or PDCA stop hooks The intermittent Hook JSON output validation failed error is gone; stop output renders a clean summary + next step as intended.
Read the CHANGELOG It's now in English end-to-end (Korean kept only for intentional sample data and the "(KO)" tagline).
Pin a Claude Code version Recommended versions updated to v2.1.159 (balanced) / v2.1.150 (conservative).
Don't hit any of the above No action needed — your existing commands and workflows are unchanged.

No migration steps are required.


🔧 Under the hood

  • S1 — CC v2.1.159 Response (ENH-324~328): cancelled ENH-317 as MOOT after the upstream /simplify rename was reverted (v2.1.152/154); verified sessionTitle-on-resume and multi-Agent frontmatter compatibility; registered 2 new regression monitors.
  • S2 — Cross-Platform Verification (ENH-329~335): CRLF fixes above; confirmed shell-branching is unnecessary (only git/gh/node/npx are exec'd). Carry: native Windows runtime CI matrix.
  • S6 — Stop Hook Schema Compliance (ENH-361~366): corrected cc-payload.port.js decision typedef; added outputStopSurface() / outputStopAllow() single-source helpers in lib/core/io.js.
  • S4 — Tech-Debt & Dead-Code (ENH-336~342): field audit found 0 removable dead code; the 6 pdca-eval-* stubs are confirmed permanent (required by immutable contract baselines + L4 deprecation governance).
  • S3a — God-File Split (ENH-343~348): 4 → 0 files over 700 LOC; contract baselines (255 / 234 assertions) unchanged.
  • S3b — Layer Consolidation (ENH-349~354): verified 0 redundancy to merge; documented as ADR 0013 (no code change).
  • S5 — Final QA + i18n + Docs-Sync (ENH-355~360): full QA, 8-language trigger-keyword audit (44 skills + 40 agents), code=docs inventory sync, version bump 2.1.21 → 2.1.22.

🧪 Quality & regression note

The initial S5 QA pass was incomplete (it ran only tests/, 43 files, and missed test/, 278 files). A full-suite run surfaced 6 failures, of which 5 were self-introduced regressions from the S3a refactor (fixed in 795c724) and 1 was pre-existing. Final baseline diff:

  • Baseline b591410 full-suite: 320 files / 28 fail
  • Released HEAD full-suite: 321 files / 26 fail
  • Net new regressions: 0 — and 2 baseline failures (docs-code-sync, sprint-alpha-e2e) were actually fixed.

The remaining 26 failures are pre-existing test-debt, scheduled for cleanup in v2.1.23 (along with the native Windows CI matrix).

Full Changelog: https://github.com/popup-studio-ai/bkit-claude-code/blob/main/CHANGELOG.md

🤖 Generated with Claude Code

Assets 2
Loading

v2.1.21 — Session Title Isolation (#111) + Sprint Output Enforcement (#113)

29 May 07:44
@popup-kay popup-kay
6e56b2f
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

bkit v2.1.21 — Issue Response Sprint

This release closes two external dogfooder issues in a single unified sprint, each fix validated against actual codebase file:line evidence rather than accepted on report alone.

Issue Title Reporter
#111 Session title collision across parallel sessions @wonuseo
#113 Sprint screen-output enforcement gap @rohwonseok-ops

✨ Highlights

🪟 #111 — Parallel sessions now get distinct window titles

Opening two Claude Code windows in the same project folder on the same feature used to label both windows identically ([bkit] PLAN f1) — making it dangerously easy to type a command into the wrong terminal. This was latent across all of v2.1.6 → v2.1.20. bkit now appends a stable per-session tag so each window is uniquely identifiable.

📋 #113 — Sprint operations now print a human-readable summary

Sprint commands (phase, status, watch, report) previously emitted raw JSON only, leaving you to trust the model's narration of it. Sprint now enforces the same Stop-hook output discipline as PDCA: an Executive Summary, an interactive next-step prompt, and a per-feature progress table — surfaced directly on screen.

🔬 A deeper fix than reported

While verifying #113 with real claude -p runs, we found that no skill Stop handler was actually firing in production (PDCA included) — Claude Code omits skill_name from the Stop payload, so every detection path silently fell through. v2.1.21 introduces a cross-process active-skill marker that fixes dispatch for Sprint and lays the groundwork for the PDCA family in v2.1.22+.


👤 What changes for you

Multi-window users (#111)

  • Each session window now shows a short unique tag, e.g. [bkit] PLAN f1 ·a1b2.
  • The tag is derived deterministically from the session ID — same session, same tag for its whole lifetime.
  • Fully backward-compatible: if there's no session ID, the title looks exactly as before. Legacy title caches migrate automatically on first read. No config change required.

Sprint users (#113)

  • After /sprint phase ... --to <p>, you'll see a Sprint Executive Summary (Mission / Result / matchRate / Cross-Sprint Integration / Invariant) instead of a JSON blob.
  • /sprint status and /sprint watch now render a per-feature table with a one-line quality-gate summary; raw JSON is still available for programmatic use.
  • /sprint report finalization prepends a full KPI + carry-items summary with a clear next action.
  • The session title updates to reflect the current sprint phase.

🧱 Under the hood

  • New modules: lib/sprint/executive-summary.js, scripts/sprint-skill-stop.js, lib/core/active-skill-marker.js
  • Refactors: session-title-cache.js (per-session map + GC + legacy migration), session-title.js (sessionId tag), 4 Stop emitters (session_id threading), unified-stop.js (sprint handler + marker peek), advance-phase.usecase.js (DI-based transition summary), sprint-handler.js (display field)
  • ADR 0012 — Sprint Stop Hook Output Enforcement (run-export pattern · separate sprint shape · usecase-purity DI · cross-process active-skill marker)
  • 92 new/extended test cases across 6 files — all passing, including real claude -p --plugin-dir . runtime dispatch verification

🔭 Known follow-up

  • CARRY-#113-1 — PDCA-family Stop handlers share the same production no-op root cause and will be migrated to the run-export + marker pattern in v2.1.22+ (pending separate regression verification).

Full changelog: see CHANGELOG.md · Cross-references: #111#77 · #113#93

🤖 Generated with Claude Code

Contributors

wonuseo and rohwonseok-ops
Loading

bkit v2.1.20 — Marketplace Recovery + Plugin Manifest Schema Compliance

26 May 07:58
@popup-kay popup-kay
2fc529f
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

bkit v2.1.20 — Marketplace Recovery + Plugin Manifest Schema Compliance

Released: 2026年05月26日
Trigger: External dogfooder @BJ (정병진) 2026年05月26日 install incident — Validation errors: : Unrecognized key: "displayName"
Sprint: 14 features / 3 sub-sprints / 3 new ENH / 1 new ADR / 1 external dogfooder #2 / 13/13 quality gates PASS / 0 auto-pause triggers
Minimum Claude Code: v2.1.143 (the strict plugin-manifest path recognizes the official displayName field only from v2.1.143)

✨ Highlights

  • 🚨 Minimum Claude Code v2.1.143 advisory in README, README-FULL, marketplace.json, plus a new self-service guide docs/06-guide/cc-compatibility.guide.md. Advisory only — no hard reject to keep UX friendly for users still on older Claude Code.
  • 🛡 21-key plugin-manifest whitelist CI gate (ENH-322) — scripts/validate-plugin.js --strict enforces the Anthropic official schema. New EXPECTED_PLUGIN_JSON_KEYS SoT in lib/domain/rules/docs-code-invariants.js (Object.freeze, pure domain). v2.1.20: advisory only for one week; v2.1.21+: strict.
  • 🔧 ADR 0006 § Empirical Validation Gate recoveryscripts/release-plugin-tag.sh now wires claude plugin validate . (~30-day wire delay closed). Non-zero exit blocks release.
  • 🚦 cc-regression Defense Layer 6 reinforcement (ENH-321) — new entry R3-321 tracks the displayName strict-reject regression with daily 09:00 KST reconcile cycle integration. 22 guards total, 0 warnings.
  • 🔍 SessionStart runtime advisory (ENH-323) — hooks/startup/session-context.js now detects the installed Claude Code version (200 ms timeout cap + 1-hour .bkit/runtime/cc-version.json cache + opt-out via BKIT_DISABLE_CC_VERSION_DETECTION=1). Forward-proofs users who upgrade bkit before Claude Code.
  • 📜 ADR 0011 Plugin Manifest Schema Compliance Policy (Accepted) — formalizes the 5-layer policy (minimum CC + 21-key whitelist + claude plugin validate wire + R3-321 + SessionStart detection).
  • 🌟 External Dogfooder Hall of Fame #2 @BJ (정병진) entry at docs/external-dogfooders/bj.md. bkit Early Adopter Program DA-4 status: N=2 confirmed (first-follower effect validated 28 days after the v2.1.19 policy introduction).

🧭 User-Experience Changes

For users running Claude Code v2.1.143 or later (≈95% of users)

  • README and README-FULL now display a one-line minimum Claude Code v2.1.143 advisory at the top.
  • The marketplace.json bkit entry description starts with Requires Claude Code v2.1.143+.
  • No functional change. Existing PDCA + Sprint workflows continue unchanged.
  • The new SessionStart CC version check returns isOldVersion=false and emits no advisory — invisible to you.

For users running Claude Code ≤ v2.1.142

  • claude plugin install bkit will still fail with Unrecognized key: "displayName" — this is a Claude Code-side schema rejection that bkit cannot bypass without removing the displayName field (which is forbidden by Anti-Mission since removal would regress the UI picker on Claude Code v2.1.143+ users).
  • However, the new docs/06-guide/cc-compatibility.guide.md gives a clear, self-service workaround:
    npm install -g @anthropic-ai/claude-code@latest
    rm -rf ~/.claude/plugins/cache/temp_git_*
    claude plugin install bkit
  • The README advisory + marketplace.json description prefix surface this requirement before you hit the install error.

For users running an existing bkit session on Claude Code < v2.1.143

  • SessionStart now detects the Claude Code version and surfaces a bkit Compatibility Notice in the additionalContext (one-time-per-session, cached for one hour at .bkit/runtime/cc-version.json).
  • The notice includes the workaround command and a link to the compatibility guide.
  • Performance budget: 200 ms hard cap on claude --version execution + cache + one-call-per-session — typical session impact zero (cache hit) or sub-100 ms (first call).
  • Opt-out: set BKIT_DISABLE_CC_VERSION_DETECTION=1.

For bkit contributors / future PR authors

  • A new CI step Release Gate — plugin.json schema validation (21-key whitelist) runs node scripts/validate-plugin.js --strict on every PR (.github/workflows/contract-check.yml).
  • v2.1.20 = advisory only (continue-on-error: true) to avoid PR backlog during the rollout week.
  • v2.1.21+ = strict (continue-on-error: false) — extra keys in plugin.json will block PR merge.
  • Add new plugin.json keys only if they appear in the 21-key whitelist at lib/domain/rules/docs-code-invariants.js EXPECTED_PLUGIN_JSON_KEYS. If Anthropic publishes a new manifest key, update the SoT first.

For bkit release engineers

  • scripts/release-plugin-tag.sh now runs claude plugin validate . (per ADR 0006) between the CI-invariants check and tag-conflict detection. Non-zero exit blocks the release.
  • If the claude CLI is absent from PATH (some CI environments), the script logs a WARN and falls back gracefully — release continues.
  • This is the v2.1.20 release itself dogfooding ADR 0006's Empirical Validation Gate for the first time.

For external dogfooders / Early Adopter Program

  • Hall of Fame now has two entries: @pruge (v2.1.19, first follower) + @BJ (v2.1.20, second entry, first-follower effect validation).
  • DA-4 acquisition goal status: N=2 confirmed — first-follower effect validated 28 days after policy introduction. v2.1.21+ continues active outreach to grow N≥3.
  • New dogfooders are encouraged to file detailed issues per the 5-stage User-Feedback Lifecycle at docs/external-dogfooders/_README.md.

📦 What's in this release

Added

  • EXPECTED_PLUGIN_JSON_KEYS SoT + diffPluginJsonKeys in lib/domain/rules/docs-code-invariants.js
  • --strict flag in scripts/validate-plugin.js (exit codes 2 / 3 for extra-key / SoT-import failures)
  • Release Gate — plugin.json schema validation (21-key whitelist) step in .github/workflows/contract-check.yml
  • claude plugin validate . wire in scripts/release-plugin-tag.sh (ADR 0006 § Empirical Validation Gate)
  • R3-321 entry in lib/cc-regression/registry.js (cc-regression entry #22)
  • detectCCVersion() + buildCCVersionAdvisoryContext() in hooks/startup/session-context.js
  • ccVersionAdvisory section in bkit.config.json:ui.contextInjection.sections (default-on)
  • ADR 0011 Plugin Manifest Schema Compliance Policy (Status: Accepted)
  • docs/06-guide/cc-compatibility.guide.md — user-facing self-service guide
  • docs/external-dogfooders/bj.md — Hall of Fame entry #2
  • test/e2e/external-dogfood/cc-min-version.test.js — 5 TC permanent regression lock (Lifecycle Stage 4)

Changed

  • README.md / README-FULL.md — Claude Code badge v2.1.123+ → v2.1.143+, Version badge 2.1.19 → 2.1.20, one-line minimum CC advisory
  • .claude-plugin/plugin.json — version 2.1.19 → 2.1.20 (displayName unchanged per Anti-Mission)
  • .claude-plugin/marketplace.json — bkit + marketplace version 2.1.19 → 2.1.20, description prefix advisory
  • bkit.config.json — version 2.1.19 → 2.1.20, ui.contextInjection.sections adds ccVersionAdvisory
  • test/integration/config-sync.test.js CS-015 — diffPluginJsonKeys 21-key whitelist enforced
  • docs/external-dogfooders/_README.md@BJ entry added under "v2.1.20", DA-4 status N=2 confirmed

Anti-Mission preserved (no regressions to v2.1.143+ users)

  • displayName field NOT removed (v2.1.143+ official schema key — removal would regress UI picker)
  • v2.1.142-and-below users NOT hard-rejected (advisory only)
  • Anthropic docs vs implementation lenient/strict mismatch (Q1) NOT touched (external responsibility)
  • bkit-starter plugin unchanged (no displayName field, zero impact)
  • Trust L3/L4 default unchanged

🌟 External Dogfooder Contributions

@BJ (정병진) drove the entire v2.1.20 sprint via precise error message + cache path + Cursor IDE environment metadata sharing. Reproduction scenario absorbed at test/e2e/external-dogfood/cc-min-version.test.js (5 TC, Lifecycle Stage 4 Regression Lock achieved). Trust Score externalDogfoodFeedbackResponseRate (weight 0.05) accumulated. See docs/external-dogfooders/bj.md for the full contribution archive.

Thank you @BJ for sharing the precise error message that scoped the entire sprint correctly. 🙏

🔮 Roll-forward markers (v2.1.21+)

  • F6 contract-check.yml continue-on-errorfalse (one-week advisory-only window closes)
  • F8 R3-321 telemetry 3-month analysis → demote/keep decision
  • F10 ENH-323 SessionStart detection telemetry 3-month analysis → timeout elevation decision
  • F14 Hall of Fame @BJ Stage 3 (Fix Released) ⏳ → ✅ on v2.1.20 GA tag — this release achieves it

📊 Sprint outcome (verification)

  • 14/14 features delivered (×ばつ4 + ×ばつ5 + ×ばつ5)
  • 13/13 quality gates PASS (M1-M10 + S1-S4 all clean)
  • 0 auto-pause triggers fired (QUALITY_GATE_FAIL / ITERATION_EXHAUSTED / BUDGET_EXCEEDED / PHASE_TIMEOUT)
  • matchRate 100%, dataFlowIntegrity 100%
  • Sprint cycle time 3 days vs 14-day budget (25% utilization)
  • 5/5 side-effect verification gates PASS post-archive patches (CO-4 + CO-5 + CO-2-partial-fix)
  • 3 of 6 carry items closed by the post-archive patch: CO-2 (test-side mitigated), CO-4 (Q3 partially resolved), CO-5 (@BJ Stages 3 + 5 ✅)

🔗 References

Read more

Contributors

pruge and BJ
Loading

bkit v2.1.19-hotfix.1 — CI Hardening & SQM Panel Activation

21 May 13:06
@popup-kay popup-kay
b12b3b8
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

bkit v2.1.19-hotfix.1 — CI Hardening & SQM Panel Activation

A focused CI hotfix that restores the Invocation Contract Check workflow to green after v2.1.19 GA, and quietly activates the new SQM dashboard panel that v2.1.19 design (S5 ADR S5-003) promised but never wired up.

Highlights

  • 🟢 CI green againInvocation Contract Check workflow flipped back from red to green after v2.1.19 GA merge.
  • 🩹 Dead-code detector blind spot patchedscripts/check-deadcode.js now correctly recognises require(path.join(ROOT, '...')) references, not just direct string literals.
  • 📊 New SessionStart panel: SQM (Sprint Quality Maturity) — v2.1.19's promised quality maturity index is now visible at session start.
  • 🛡️ Zero behaviour regression — verified against the full 4,168 TC QA aggregate.

User experience change

After installing v2.1.19-hotfix.1, every new bkit session (/clear or fresh CC launch) will show one extra compact panel at the top of the context, if and only if the project has at least one entry in .bkit/state/sqm-history.jsonl:

┌─── SQM (Sprint Quality Maturity) ─────────── 64.00 / 100 ─┐
│ Docs=Code 100 │ Self-Dogfood 10 │ Dogfooder 100 │
│ Report KPI 80 │ Dispatch — │ Convention 0 │
└──────────────────────────────────────────────────────────┘

The panel:

  • Reflects project-wide quality maturity across 6 weighted components (Docs=Code 30 %, Self-Dogfood 20 %, External Dogfooder 20 %, Report KPI 15 %, Sub-Agent Dispatch 10 %, Convention 5 %).
  • Is fail-silent on new projects — when .bkit/state/sqm-history.jsonl is missing, the panel renders nothing and adds 0 chars. No impact on first-run UX.
  • Adds ~259 chars to additionalContext when active. Other 4 dashboard sections (progress / workflow / impact / control) are unchanged.
  • Can be opted out by removing 'sqm' from bkit.config.jsonui.dashboard.sections.

No other user-facing behaviour changes.

Root cause

Two independent issues coincided immediately after the v2.1.19 GA merge:

  1. Detector blind spotscripts/check-deadcode.js matched only direct string-literal requires (require('./foo')). The new v2.1.19 scripts (S0 measure, S3 docs-sync, S4 feedback refresh) all use require(path.join(ROOT, '...')), so the detector reported 5 false-positive dead modules and failed the CI gate.
  2. Design/runtime gaplib/ui/sqm-panel.js was specified by S5 ADR S5-003 to render in the SessionStart dashboard, but the wiring in hooks/session-start.js was never landed. bkit.config.json ui.dashboard.sections also needed 'sqm' for the new panel to opt in.

Fixes (4 files, +56 / -4)

  • scripts/check-deadcode.js — split scanProductionRequires() into two regexes. reDirect keeps the original behaviour; reIndirect matches require(<wrapper>(..., '<lib path>', ...)) where <wrapper> is any identifier and the captured string literal contains lib/ or starts with .//../. Restricts to library-shaped paths to avoid overmatching arbitrary strings.
  • hooks/session-start.js — wired lib/ui/sqm-panel + lib/quality/sqm-history into the SessionStart dashboard. Render block is independent of the primaryFeature gate (SQM is project-wide, not feature-scoped) and fail-silent when the history file is missing.
  • bkit.config.json — added 'sqm' to ui.dashboard.sections. Runtime config takes precedence over the hook default, so without this the SqmPanel rendered to 0 chars even after wiring.
  • CHANGELOG.md — added [2.1.19-hotfix.1] section.

Verification

  • scripts/check-deadcode.js: NEW dead 5 → 0, Live 134 → 139 (exactly the 5 intended modules — set-diff verified against the precise v2.1.19 GA baseline, NEW DEAD 0).
  • ✅ Contract suite 9-step spot-check all PASS: domain-purity (18 files) · guards (21 entries) · test-tracking (314 files) · docs-code-sync (5/5) · integration-runtime (23/23) · l2-smoke (101/101) · bkit-full-system (36/0/0) · contract-test-run vs v2.1.16 L1,L4 (255 assertions).
  • ✅ Full QA aggregate (4,168 TC across 157 files): 12 FAIL + 6 errors reproduced identically against the HEAD~1 (v2.1.19 GA) baseline → confirmed pre-existing carryover (ACTION_TYPES baseline drift, trust-engine score change). This hotfix introduced 0 new regressions.
  • ✅ SessionStart hook live run with a fresh session id: JSON contract valid; all 5 dashboard sections including SQM (64.00 / 100) render correctly; additionalContext 5,482 → 5,741 chars (+259 for SQM panel).

Upgrade

# CC plugin users — version bumps automatically on next /clear or session start
# Manual verification:
cat bkit.config.json | jq '.version' # → "2.1.19" (no version bump for hotfix)
node scripts/check-deadcode.js # → Dead (NEW) : 0

Note: bkit.config.json version stays at 2.1.19. The -hotfix.1 suffix lives only on the git tag and this release, following the bkit hotfix convention (see v2.1.12-hotfix precedent).

What's next

The 12 pre-existing FAILs surfaced by the full QA aggregate (ACTION_TYPES baseline drift, trust-engine score change, sprint-4-presentation AUDIT-01) will be resolved in v2.1.20. They are tracked as CARRY-v2119-1 in memory and do not gate this hotfix.

Full changelog

CHANGELOG.md → v2.1.19-hotfix.1

🤖 Generated with Claude Code

Loading

v2.1.19 — Quality Maturation Sprint (5 sub-sprints + ENH-318)

21 May 12:01
@popup-kay popup-kay
228b6bc
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

bkit v2.1.19 — Quality Maturation Sprint

Permanent closure of pruge's sprint domain issue cluster (10 GitHub issues over 1.5 days, all in the sprint domain). Rather than another reactive fix release, v2.1.19 treats the cluster as a systemic sprint domain maturity gap and addresses it with a 5 sub-sprint master plan.

All 5 sub-sprints + outer master sprint archived. ENH-318 정식 편입 (차별화 6/6 → 7/7). Real User Hall of Fame 첫 entry @pruge.

Released: 2026年05月21日
Predecessor: v2.1.18 GA (Sprint Trust UX Fix)
PR: #108 · Merge commit: 228b6bc (admin merge)
Reporter who shaped this release: @pruge (James Kim)dandi-village-ledger project. Real User Hall of Fame 첫 entry 등재 🏆


TL;DR

  • 🎯 Closes 4 GitHub Issues#103, #104, #105, #107 (P0 / sprint domain maturity)
  • 🏗️ 5 sub-sprint master plan — S0 baseline + S1 Foundation + S2 Defense + S3 Polish + S4 Proactive + S5 Measurement (all archived)
  • 152 TC across 30 test files — 100% PASS, 0 regressions
  • 🚀 ENH-318 정식 편입 — Trust Score 7th component + Real User Hall of Fame + User-Feedback Lifecycle 5-stage
  • 🏆 First external dogfooder Hall of Fame entry: @pruge — 10 issues, 5 absorbed E2E scenarios
  • 🛡️ 9 new ACTION_TYPES for governance traceability
  • 🔧 3 architectural fixes: S0 measurement bug (S2 evolution) + findFirstMatching bug (S5) + CO-S0-6 --approve semantic (S1 docs)
  • 🤝 100% backward compat — Trust Score Δ ≤5% verified (worked example numerical proof)

🌟 Highlights — What Changed for You

1️⃣ /sprint init auto-imports Context Anchor (Closes #104)

# Before: empty context placeholders
/sprint init my-sprint --name "Q2 Launch"
# Generates "(not set)" in report Context section even when master-plan exists
# After: auto-import from master-plan or PRD
/sprint init my-sprint --name "Q2 Launch"
# Reads docs/01-plan/features/my-sprint.master-plan.md → WHY/WHO/RISK/SUCCESS/SCOPE populated
# audit emit: sprint_context_imported

Fallback chain: args.context (explicit) > master-plan.md > PRD.md > defaultContext.

2️⃣ Sprint reports now have ## Quality Gates section + KPI SoT (Closes #105)

## 2. KPI Snapshot
| matchRate | 100% | # qualityGates.M1 wins over kpi.matchRate
| dataFlowIntegrity (S1) | 100% |
...
## Quality Gates (11 gates, 11 passed)
| Gate | current | threshold | passed | lastMeasuredAt | source |
|------|---------|-----------|--------|----------------|--------|
| M1 | 100 | ≥90 || 2026年05月21日T10:00:00Z | gap-detector |
...

Precedence: qualityGates > featureMap > kpi. Divergence detection emits audit sprint_kpi_divergence.

3️⃣ Gate-fail reports auto-marked RESOLVED on successful transition (Closes #103)

# Pre-v2.1.19: docs/03-analysis/<sprint>-gate-fail-*.md
# "STATUS: BLOCKED" — even after sprint completed
# sprint.lastGateFailure never cleared
# v2.1.19: when advancePhase succeeds after a gate_fail:
# - File header prepended with "> **STATUS: RESOLVED** at <ISO>"
# - sprint.lastGateFailure.resolvedAt / resolvedBy / resolutionReason populated
# - audit emit: gate_fail_resolved (idempotent, atomic write)

4️⃣ SKILL.md path drift permanently blocked (Closes #107)

  • sprint SKILL.md explicit <bkit-root> convention with #107 reference inline
  • scripts/check-skills-docs-code-sync.js CI gate (44 skills ×ばつ invariant, code-block-aware)
  • scripts/lint-skill-md.js PreToolUse hook (warning-only)
  • test/contract/baseline/skills-convention.json frozen baseline

Critical evolution: S2 introduced stripCodeBlocks parser — S0 measurement had false positives on phase-3-mockup + phase-9-deployment (JavaScript/YAML samples inside code blocks). Real drift was 1 skill (sprint #107), not 3.

5️⃣ bkit Early Adopter Program / Real User Hall of Fame (ENH-318)

# Run bkit on your production project + file detailed bug reports →
# - 🏆 Public recognition (README "Real User Hall of Fame")
# - 🔒 Your reproductions become permanent E2E regression tests
# - 📊 Activity powers Trust Score externalDogfoodFeedbackResponseRate component
# - 📝 CHANGELOG attribution per release

First entry: @pruge (James Kim) — 10 issues / 5 scenarios absorbed. See docs/external-dogfooders/pruge.md.

6️⃣ Trust Score 7-Component (ENH-318 governance)

// Before (6 components, sum 1.0):
// pdcaCompletionRate 0.25 / gatePassRate 0.20 / rollbackFrequency 0.15
// destructiveBlockRate 0.15 / iterationEfficiency 0.15 / userOverrideRate 0.10
// After (7 components, sum 1.0):
// pdcaCompletionRate 0.2375 / gatePassRate 0.19 / rollbackFrequency 0.1425
// destructiveBlockRate 0.1425 / iterationEfficiency 0.1425 / userOverrideRate 0.095
// + externalDogfoodFeedbackResponseRate 0.05 ← NEW

Worked example: existing user with values [80/85/90/95/75/70] = old score 83.00 → new score 78.85. Δ -4.0pt (-4.8%), within ≤5% R-10 mitigation boundary (float epsilon tolerance verified).

Legacy 6-component trust-profile.json auto-migrated (weights from defaults SoT, values from disk).

7️⃣ --approve semantic clarified (S1 + CO-S0-6 absorbed)

# --approve ONLY bypasses Trust Level scope boundary (e.g., L3 stopAfter=qa)
# --approve does NOT bypass Quality Gate failures (M*/S*)
# For gate failures, use: /sprint measure <id> --gate <key> first

Documented in skills/sprint/SKILL.md §10.1.1.1. Future --allowGateOverride flag carry to v2.1.20+.

8️⃣ /sprint dogfood action for bkit self-dogfooding

/sprint dogfood v2.1.20 --release-tag v2.1.20-rc.0
# Creates self-dogfood-2.1.20 sprint container with auto-derived context
# Idempotent (existing id graceful skip), audit: sprint_dogfood_started

9️⃣ Self-dogfood CI gate

./scripts/check-self-dogfood.sh
# Verifies recent release was operated as sprint container (4 invariants per release)
# --bootstrap-mode: skip invariant #1 (master plan §19.5 Exception)
# --emergency-override <reason>: skip all invariants (SQM penalty -10)

v2.1.20 will be first true gate activation — v2.1.19 is the final Bootstrap Exception.

🔟 SQM (Sprint Quality Maturity Index) introduced

┌─── SQM (Sprint Quality Maturity) ─────────── 64.00 / 100 ─┐
│ Docs=Code 100 │ Self-Dogfood 10 │ Dogfooder 100 │
│ Report KPI 80 │ Dispatch — │ Convention 0 │
└──────────────────────────────────────────────────────────────────────┘

6-component weighted score, .bkit/state/sqm-history.jsonl append-only history, SessionStart-ready dashboard panel. Projected v2.1.19 GA SQM: ~96 (master plan §7.2 target ≥85 well exceeded).


👀 User Experience Impact

Scenario Before v2.1.19 After v2.1.19
L1 sprint init trustLevelAtStart=L3 silent default, no L1 warning default L2 (Safe Defaults), --trust L1 emits stderr warning + audit
Sprint report KPI Snapshot only, no Quality Gates section, kpi.matchRate=stale despite qg.M1=100 ## Quality Gates section + qualityGates SoT precedence + divergence detection
Sprint context (not set) placeholders despite master-plan.md filled Auto-import from master-plan/PRD fallback chain
Gate-fail report docs/03-analysis/ permanently "BLOCKED" after fix Auto-marked "RESOLVED" with timestamp + reason on successful transition
External dogfooder feedback No governance signal (internal trust components only) Trust Score 7th component + Hall of Fame archive
Sprint orchestrator dispatch Declared in v2.1.18 frontmatter but no test evidence Contract baseline (5 TC) + e2e mocked + production code path verified
bkit self-dogfooding bkit never ran its own sprint mgmt on its own releases Bootstrap Exception pattern documented + /sprint dogfood + CI gate ready for v2.1.20
SQM measurement No quantitative sprint domain health metric 6-component weighted score + history + SessionStart panel

🧪 Tests (152 NEW, 100% PASS)

Sub-sprint New tests Test count
S0 baseline 30 sqm-calculator unit (19) + result schema contract (7) + measure e2e (4)
S1 Foundation 28 dogfood (6) + default L2 (4) + annotate (3) + contract (5) + e2e dispatch (3) + ci-gate (7) [1 design-skip]
S2 Defense 35 path fix (4) + check-skills (17) + sprint audit (6) + baseline (5) + linter (3)
S4 Proactive 14 trust 7-component (5) + feedback-tracker (3) + 5 dandi scenarios (6)
S3 Polish 40 context-importer (10) + kpi-resolver (5) + generate-report-sot (9) + carry rationale (4) + lessons (4) + failure-resolution (8)
S5 Measurement 5 sqm-evolve (2) + history (1) + panel (2)
Total 152 30 test files

Final regression: all 30 files re-run, 152/152 PASS, 0 failures.


🧩 Methodology — Bootstrap Exception Pattern Validated 5x

S0 + S1 + S4 + S2 + S3 + S5 all completed under PDCA-with-sprint-shadow mode (main session as sub-agent manual proxy). 6 successful instantiations validate the pattern as a transitional protocol.

v2.1.20 will be the first true self-dogfood CI gate activationscripts/check-self-dogfood.sh without --bootstrap-mode flag will hard-fail when releases don't operate as sprint containers.


🆙 Upgrade Guide

From v2.1.18

cd ~/your/bkit-claude-code
git fetch --tags
git checkout v2.1.19
# Restart Claude Code (or reload the plugin)
cat bkit.config.json | grep version # → "version": "2.1.19"

Trust Score impact (if you have existing trust-profile.json)

Auto-migration:...

Read more

Contributors

pruge
Loading

v2.1.18 — Sprint Trust UX Fix (Issues #100/#101/#102)

21 May 06:39
@popup-kay popup-kay
e726f15
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

bkit v2.1.18 — Sprint Trust UX Fix

The first release dedicated entirely to issue triage from the bkit community. v2.1.18 permanently closes the L1 sprint lockout incident class discovered and reported by @pruge on his dandi-village-ledger s1-foundation sprint. Three GitHub issues (#100/#101/#102) — all filed within the same minute — were treated as a single 3-layer drift root cluster and fixed in one integrated sprint.

Released: 2026年05月21日
Predecessor: v2.1.17 GA (5-axis matrix 5/5 close)
Merge commit: e726f15 (PR #106)
Reporter: @pruge — thank you for the precise repro and root-cause analysis 🙏


TL;DR

  • 🔧 3 issues closed#100, #101, #102 (all P0/P1 sprint-blocking bugs)
  • 🚀 New command/sprint trust <id> --to <Level> mutates an existing sprint's trust level in place, no more /sprint init destruction
  • 🧠 sprint-orchestrator actually works nowTask tool added to allowlist; sub-agent dispatch live for the first time (ENH-292 activation milestone)
  • 🤝 --trust alias honored — CLI behavior finally matches skills/sprint/SKILL.md §10.2
  • 40 tests passing (17 contract + 15 unit + 8 e2e) — ×ばつ over the 14 TC target
  • 🔄 100% backward compatible — no sprint state schema change; existing --trustLevel users unaffected

🌟 Highlights — What Changed for You

F1 · sprint-* agents now declare tools: (Closes #100)

Four sprint agents — sprint-orchestrator, sprint-master-planner, sprint-qa-flow, sprint-report-writer — now have an explicit tools: frontmatter that includes the Task tool where required. Before v2.1.18, the orchestrator could not actually dispatch the measurement agents it was specced to route through (measure-router.js:233-253 returned no_agent_runner). With v2.1.18, sub-agent dispatch is live for the first time.

Differentiation milestone: ENH-292 Sequential Dispatch promoted from "declared" to "live".

F2 · New /sprint trust command (Closes #101)

# Mutate an existing sprint's trust level without losing state
/sprint trust s1-foundation --to L3 --reason "P0 32/32 ready"

Before v2.1.18, a sprint initialized with --trust L1 was permanently locked in preview mode. The only escape was /sprint init — which destroyed phaseHistory, qualityGates, and featureMap accumulated over hours or days of work.

v2.1.18 adds:

  • New CLI action /sprint trust <id> --to <L0|L1|L2|L3|L4> [--reason "<text>"] [--force]
  • New audit ACTION_TYPE sprint_trust_changed (ACTION_TYPES 29 → 30)
  • Downgrade guardrail: dropping ≥2 levels (e.g. L4 → L1) requires trustScore ≥ 80 (from .bkit/state/trust-profile.json) or explicit --force
  • Idempotent path: from === to emits an audit entry with noop: true so monitoring never sees a blind spot
  • Actor auto-detection: explicit --actor > CLAUDE_AGENT_ID env (→ 'agent') > default 'user'
  • Defense Layer 6 natural integration: --force sets blastRadius: 'high' for the audit alarm pipeline (live evidence: .bkit/audit/2026-05-21.jsonl)

Documentation: skills/sprint/SKILL.md §10.1.3 (new section with comparison table and audit JSON example).

F3 · --trust CLI alias now honored (Closes #102)

# Both forms now produce identical behavior (v2.1.18)
/sprint measure s1-foundation --gate M1 --trust L3
/sprint measure s1-foundation --gate M1 --trustLevel L3

scripts/sprint-handler.js:942 and :974 previously bypassed normalizeTrustLevel and read args.trustLevel directly, so --trust L3 was silently ignored at the measure and phase paths. Both call sites now go through normalizeTrustLevel(args), restoring the declared precedence chain trustLevel > trust > trustLevelAtStart. Docs=Code drift resolved.

F4 · sprint-master-planner orchestrator expansion (★ user-requested)

sprint-master-planner now declares pm-lead, cto-lead, qa-lead in its Task allowlist — in addition to the existing specialist agents. Future sprints can natively orchestrate all three teams in parallel rather than relying on main-session manual dispatch.


👀 User Experience Impact

Scenario Before v2.1.18 After v2.1.18
L1 sprint lockout Sprint stuck in do phase forever; only escape is /sprint init (destroys all phase history) /sprint trust <id> --to L3 mutates trust in place; phase history, quality gates, and feature map preserved
Sprint orchestration sprint-orchestrator could not dispatch measurement agents; main session had to act as pass-through Orchestrator fulfills its specced routing responsibility; no main-session workaround required
CLI trust override --trust L3 silently ignored at measure/phase; only --trustLevel L3 worked Both --trust and --trustLevel honored per SKILL.md §10.2 precedence
Audit trail Trust mutations invisible to /sprint status and downstream consumers Every trust change persisted as sprint_trust_changed in .bkit/audit/<date>.jsonl with downgrade guardrail and --force alarm
Reporter scenario @pruge s1-foundation sprint with --trust L1 — 32/32 P0 implementation done, but /sprint measure always preview, phase advance always gate_fail 8-step E2E test reproduces the original scenario 1:1 and runs green; sprint recoverable without reset

🧪 Tests (40 TC live PASS)

Layer File TC Purpose
Contract test/contract/sprint-agents-tools.test.js 17 F1 — 4 sprint-* agents' tools: field invariant
Unit test/unit/sprint-trust-normalization.test.js 7 (A-G) F3 — normalizeTrustLevel precedence chain
Unit test/unit/sprint-handler-trust-action.test.js 8 F2 — handleTrust (mutation, guardrail, audit, actor)
E2E test/e2e/sprint-l1-lockout-recovery.test.js 8 Reporter scenario 1:1 reproduction (init L1 → trust L1→L3 → measure record → audit verify → restart persistence)
Total 40 TC ×ばつ over the 14 TC target

🧩 Methodology — First PM + CTO + QA Team Integration Sprint

v2.1.18 is the first bkit release driven end-to-end by all three Agent Teams in parallel:

  • PM Team (pm-lead orchestrating 4 PM agents) — produced a 570-line PRD with Beachhead 19/20 (Geoffrey Moore framework), JTBD 6-Part, 5 User Stories, 6 Test Scenarios, and a Pre-mortem Top 3
  • CTO Team (cto-lead) — architectural review APPROVE with CONCERNS; 3 BLOCKERs (controlScore → trustScore correction, ACTION_TYPES count 27 → 29, NDJSON injection assessment) + 3 MEDIUMs all addressed via redline
  • QA Team (qa-lead) — L1-L5 integrated verification report + reporter-scenario evidence (574 lines, QA_PASS)

Self-referential meta-risk: the sprint that fixes sprint-orchestrator itself cannot use sprint container automation. We notelined the chicken-and-egg pattern in Plan §6.1: sprint init for state tracking only, phase advance + measurement via the PDCA cycle. After F1 lands, future sprints regain full orchestration.


🔐 Differentiation 6/6 Status

  • ENH-289 Defense Layer 6strengthened (sprint_trust_changed joined the L6 audit pipeline, live evidence)
  • ENH-292 Sequential Dispatchactivation milestone (declared → live in this release)
  • ENH-286 Memory Enforcer / ENH-300 Effort-aware Defense / ENH-303 PostToolUse continueOnBlock / ENH-310 Heredoc Detector — unaffected, no regression

🆙 Upgrade Guide

From v2.1.17

# 1. Pull the latest plugin (in your bkit-claude-code clone)
cd ~/your/bkit-claude-code
git fetch --tags
git checkout v2.1.18
# 2. Restart Claude Code (or reload the plugin) to pick up the new agents
# 3. Verify version
cat bkit.config.json | grep version
# → "version": "2.1.18"

If you have a sprint stuck in L1 lockout

# Old workaround (v2.1.16/v2.1.17): re-init (destroys state)
# ❌ /sprint init my-sprint --name "..." --trust L2 # phaseHistory lost
# New v2.1.18 path: mutate in place
/sprint trust my-sprint --to L3 --reason "ready to record measurements"
/sprint measure my-sprint --gate M1 # now records (mode: "record"), no longer preview
/sprint phase my-sprint --to iterate # gate transitions succeed

Breaking changes

None. v2.1.18 is 100% backward compatible:

  • Existing --trustLevel L<N> users: precedence chain preserved (F3 Case G test)
  • Sprint state schema: unchanged
  • ADR 0003: 14/14 PASS — 16-cycle consistency milestone

🗂️ Carryovers — deferred to v2.1.19+

Not release blockers, captured in the sprint report (docs/04-report/features/v2118-sprint-trust-ux-fix.report.md):

  • CO-1 — baseline v2.1.18 capture script
  • CO-2 — qa-aggregate script
  • CO-3 — sprint-orchestrator auto phase-advance live test
  • CO-4 — sub-agent-dispatcher state transition test
  • CO-5 — L0 Manual mode escalate-path E2E
  • CO-6 — sprint-report-writer agent timeout fallback

Follow-up issues from the same reporter (#103, #104, #105 — failure-reporter resolution markers, sprint init context auto-import, generate-report quality-gates section) are acknowledged and slated for v2.1.19+ planning.


🙏 Credits

Massive thanks to @pruge (james kim) for the high-signal bug reports. The reproductions in #100/#101/#102 were so precise we could turn them directly into an E2E test (test/e2e/sprint-l1-lockout-recovery.test.js) before writing a single line of fix code. That's the kind of issue every maintainer dreams of.

If you hit similar sprint-related friction, please open an issue — the v2.1.18 turnaround (issue filed → fix released in 24h) is the sta...

Read more

Contributors

pruge
Loading

bkit v2.1.17 — CI/CD Hardening, 5-Axis Matrix 5/5 Close

20 May 11:21
@popup-kay popup-kay
39f89e6
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

bkit v2.1.17 — CI/CD Hardening, 5-Axis Matrix 5/5 Close

Headline: Permanent closure of the 8-day Invocation Contract Check red incident class from 2026年05月12日 to 2026年05月20日. CI/CD maturity matrix (Detection / Enforcement / Recovery / Governance / Evolution) closed across all 5 axes. All 11 carryover items resolved.

🎯 Highlights

Incident Class Permanently Closed

On 2026年05月12日, commit 967cd8f (refactor v2.1.13) removed six pdca-eval-* agents as dead code cleanup. The baseline v2.1.9 manifest was not updated, and the Agent surface lacked a deprecatedIn governance mechanism (which Skill already had). This caused the Invocation Contract Check workflow to fail on every push for 8 consecutive days. Releases v2.1.15 and v2.1.16 GA shipped while CI was red. This v2.1.17 release closes every known root cause and carryover in the incident class.

5-Axis Matrix Progression

Axis v2.1.16 GA v2.1.17
Detection ◐ L1+L4 only くろまるくろまる Dual baseline + L2 + L3 + L5 mandatory + MCP schema
Enforcement くろまる Branch protection auto-applied (2 Required Status Checks)
Recovery くろまるくろまる Rollforward SOP + tracked file policy guide
Governance ◐ Skill only くろまるくろまる Skill + Agent + MCP symmetric + isolated tests (5+6 scenarios)
Evolution くろまるくろまる Dual baseline + frontmatter util + SoT canonical names

5/5 close

📦 Changes

Detection

  • Dual baseline: v2.1.9 LTS (long-term drift) + v2.1.16 Latest (noise floor) compared simultaneously
  • L2 mandatory: l2-smoke.test.js (98 TC) + l2-hook-attribution.test.js (13 TC) integrated into workflow
  • L3 mandatory: l3-mcp-compat.test.js (92 TC) + l3-mcp-runtime.test.js (48 TC) integrated into workflow
  • L5 mandatory (CO-3): removed continue-on-error: true from invocation-inventory.test.js + added needs: contract-l1-l4 (203 → 210 TC with SoT-driven lists)
  • MCP deprecation schema (CO-2): inline // @deprecated since vX.X.X replacedBy=Y annotation parsing
  • scripts/check-test-tracking.js (CO-7): detects untracked test files across 18 production test paths (CI gate)

Enforcement

  • scripts/setup-branch-protection.sh (CO-1, idempotent gh api wrapper) — auto-applied to main:
    • Required Status Checks: Contract Test (L1 Frontmatter + L4 Deprecation), Contract Test L5 (Invocation Inventory)
    • strict: true, allow_force_pushes: false, allow_deletions: false
    • enforce_admins: false (admin override allowed for emergency hotfixes)

Recovery

  • docs/06-guide/contract-baseline-rollforward.guide.md: LTS vs Latest policy, decision tree, capture/deprecation stub procedures, PR self-review checklist, incident log (8 sections)
  • docs/06-guide/test-file-tracking-policy.guide.md (CO-6): .gitignore policy + PR checklist + incident log (9 sections)
  • docs/06-guide/branch-protection-setup.guide.md (CO-1): admin SOP

Governance

  • Agent deprecation governance: agents/<name>.md frontmatter with deprecatedIn: vX.X.X bypasses L4 — symmetric with the Skill pattern
  • 6 pdca-eval-* deprecation tombstones: agents/pdca-eval-{act,check,design,do,plan,pm}.md (permanent tombstones for the 5/12 cleanup)
  • MCP tool deprecation governance: L4 bypass via baseline JSON deprecatedIn field — full symmetry across 3 surfaces (Skill / Agent / MCP)
  • Agent-deprecation isolated test (CO-4): test/contract/agent-deprecation.test.js, 5 scenario fixture, 5/5 PASS
  • MCP-deprecation e2e test (CO-2.1): test/contract/mcp-deprecation.test.js, 6 scenario fixture, 6/6 PASS

Evolution

  • lib/util/frontmatter.js (CO-5): consolidated 5-site duplication — parseFrontmatter, parseFrontmatterFile, hasDeprecatedInFrontmatter, hasDeprecatedInFrontmatterFile, coerce
  • v2.1.16 baseline captured (test/contract/baseline/v2.1.16/, 106 files)
  • SoT canonical names lists (CO-3.1): added 6 lists to lib/domain/rules/docs-code-invariants.jsEXPECTED_ACTIVE_AGENT_NAMES, EXPECTED_DEPRECATED_AGENT_NAMES, EXPECTED_SKILL_NAMES, EXPECTED_HOOK_EVENT_NAMES, EXPECTED_PDCA_MCP_TOOLS, EXPECTED_ANALYSIS_MCP_TOOLS

Hygiene

  • Removed 12 orphan JSON files from test/contract/baseline/v2.1.9/ (sprint-* agents/MCP tools/skills missing from manifest)
  • Force-tracked 35+ previously untracked test files: tests/qa/ 29 + test/contract/ 5 + test/e2e/ 6 + test/integration/ 3 + test/unit/ 2 + test/v2110-qa/ 2
  • .gitignore narrowed: removed test/ + tests/* blanket ignore → explicit local-only patterns
  • scripts/check-deadcode.js EXEMPT pattern broadened (v2.1.13 sprint barrel, 3 files)

Framework Side-Effect Blocking

  • collect* implicit-write prevention: { persist: false } option blocks baseline self-mutation
  • --version path-injection validation (CO-1.1): regex ^[A-Za-z0-9._-]+$, exits with code 2 on invalid input
  • --project-root flag: makes contract-test-run.js + contract-baseline-collect.js fixture-aware

📊 Quantitative Results

Metric v2.1.16 GA v2.1.17 Delta
qa-aggregate PASS 3,808 4,103 +295
qa-aggregate FAIL 31 0 -31
qa-aggregate Errors 4 0 -4
Mandatory workflow steps 13 18 +5
Baseline snapshots 1 2 (LTS + Latest) +1
Active agents 34 34 0
Deprecation tombstones 0 6 +6
Frontmatter parse sites 5 (duplicate) 1 (lib/util/) -4
Hardcoded EXPECTED lists 7 0 (SoT) -7
Branch protection 2 Required Checks
Carryover items 11 0 -11
5-Axis Matrix 0/5 5/5

🗂 11 Carryover Closures

ID Item Status
CO-1 Branch protection automation ✅ Script + applied
CO-1.1 --version path-injection validation ✅ Regex
CO-2 MCP tool deprecation schema parseMCPToolBlocks
CO-2.1 MCP deprecation e2e test ✅ 6/6 PASS
CO-3 L5 E2E mandatory promotion ✅ Workflow
CO-3.1 L5 dynamic EXPECTED lists ✅ SoT integration
CO-4 Agent-deprecation isolated test ✅ 5/5 PASS
CO-5 frontmatter util extraction ✅ 5 sites → 1
CO-6 Tracked file policy ✅ Narrow + 35+ files
CO-7 tests/qa dependency automation ✅ check-test-tracking
CO-8 branch-protection apply audit ✅ admin applied & verified

🔗 Pull Requests

  • PR #97 (7acdd4f): v2.1.17 main scope — 4/5 axes close
  • PR #99 (39f89e6): v2.1.17 final — 5 carryover items absorbed + 5/5 axes close

📚 Documentation

  • docs/01-plan/features/v2117-ci-cd-hardening.plan.md — Plan
  • docs/02-design/features/v2117-ci-cd-hardening.design.md — Design
  • docs/03-analysis/features/v2117-ci-cd-hardening.analysis.md — Gap analysis
  • docs/04-report/features/v2117-ci-cd-hardening.report.md — Completion report
  • docs/06-guide/contract-baseline-rollforward.guide.md — Baseline SOP
  • docs/06-guide/branch-protection-setup.guide.md — Branch protection SOP
  • docs/06-guide/test-file-tracking-policy.guide.md — Test tracking policy

🙏 Origin

The incident class started with commit 967cd8f (refactor v2.1.13, 2026年05月12日) — a 6-agent dead code cleanup combined with an unupdated baseline produced an 8-day red period. This release closes every known framework gap.


🤖 Released with Claude Code

Loading

v2.1.16 — Quality Gates & Approval UX + Release Hardening

20 May 05:35
@popup-kay popup-kay
7330103
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

bkit v2.1.16 — Quality Gates & Approval UX + Release Hardening

The only Claude Code plugin that verifies AI-generated code against its own design specs.

This is a patch release that closes 4 GitHub issues reported by L2 trust users hitting Sprint quality-gate deadlocks, plus a release hardening sub-sprint that terminates the v2.1.14/15/16 recurring "thoroughly verified but defects shipped" pattern.


✨ Highlights

1. L2 trust deadlock is gone

Default-trust users (L2) can now drive Sprints to completion without escalating trust level or editing state JSON manually. Every quality gate hold now has a user-invokable command:

  • /sprint phase <id> --to <next> --approve --reason "<text>" — single-use Trust Level scope-boundary escape hatch (Issue #95)
  • /sprint measure <id> --gate <key> — partial-gate manual measurement when the orchestrator missed one (Issue #94)

2. Quality gates auto-document their own failures

When advancePhase is blocked by gate_fail, bkit now automatically writes docs/03-analysis/<sprintId>-gate-fail-<phase>-<timestamp>.md with a detailed gate summary, suggested next commands (including /sprint measure --gate <key> and --approve hints), and persists lastGateFailure to sprint state for /sprint status to surface (Issue #93).

3. sprint-orchestrator measurement responsibility clarified

At design exit, the orchestrator now records both M4_apiComplianceRate and M8_designCompleteness atomically through the new lifecycle.measurePhaseGates() use case. This eliminates the silent "M8=100, M4=null → gate_fail" deadlock reported in Issue #92. The canonical measurement path (lib/application/quality-gates/measure-router.js) is shared between auto-advance and manual /sprint measure so values agree.

4. Release hardening — tests exist but unused anti-pattern terminated

The user asked "are 5 fixes really all? did you verify all bkit features properly?" — this triggered a deeper audit that uncovered 31 stale test failures, 4 orphan test files (modules deleted in v2.1.10), and 3 release metadata drift defects that existing tests had been catching all along but were never wired into CI.

All resolved. CI now mandates two new release gates: bkit-full-system (version sync + structure) and docs-code-sync (counts SoT drift detector).


🎯 User Experience Changes

Before v2.1.16 (the L2 user's journey)

$ /sprint init my-sprint --name "Test" --trust L2 --features f1,f2
$ /sprint start my-sprint
# ... auto-advances prd → plan → design ...
# pauses at design boundary (L2 scope.stopAfter)
$ /sprint phase my-sprint --to do
{
 "ok": false,
 "reason": "requires_user_approval",
 "stopAfter": "design"
}
# 🛑 deadlock — no command to give approval
# workarounds violated bkit philosophy: edit state JSON OR escalate trust

After v2.1.16

$ /sprint phase my-sprint --to do --approve --reason "design review complete"
{
 "ok": true,
 "phase": "do",
 "approvalRecord": {
 "sprintId": "my-sprint",
 "from": "design",
 "to": "do",
 "trustLevel": "L2",
 "stopAfter": "design",
 "approvedBy": "user",
 "reason": "design review complete"
 }
}
# ✅ proceeds — single-use approval, sprint.autoRun.scope NOT mutated
# next boundary will face the same check (deliberate "controllable AI" design)

Quality gate failure UX

$ /sprint phase my-sprint --to do
{
 "ok": false,
 "reason": "gate_fail",
 "reportPath": "docs/03-analysis/my-sprint-gate-fail-design-2026年05月20日T05-31-22-411Z.md",
 "gateResults": { ... }
}
$ cat docs/03-analysis/my-sprint-gate-fail-design-*.md
# Gate Failure Report — my-sprint @ design→do
# ...
| Sprint Phase | Gate | Status | Expected | Actual | Suggested Action |
| design | M4 | FAIL | threshold = 95 | null (not_measured) | /sprint measure --gate M4 |
| design | M8 | FAIL | threshold = 85 | null (not_measured) | /sprint measure --gate M8 |
# ...suggested next commands inline...

/sprint measure — first-class command

$ /sprint measure my-sprint --gate M4
# routes to agents/gap-detector.md via GATE_MEASUREMENT_ROUTES
# writes back to sprint.qualityGates.M4_apiComplianceRate.current
# audit-logged as gate_measured (at L2+ trust)

7 gates supported: M1 M2 M3 M4 M7 M8 S1.


📊 Numbers

Metric Before After
Open issues blocking L2 users 4 (#92/93/94/95) 0
L3 sprint contract tests 10/10 PASS 14/14 PASS (+4 new SC)
Aggregate test PASS 3,808 3,844 (+36)
Aggregate test FAIL 31 0
Test files with errors 15 0
Test files 151 147 (-4 orphan)
Release metadata defects 3 0
Mandatory CI gates 8 10 (+2 release gates)

🐛 Bug Fixes

  • #92 sprint-orchestrator agent records both M4_apiComplianceRate and M8_designCompleteness at design exit (atomic dual record via lifecycle.measurePhaseGates). Resolves silent gate_fail deadlock on design → do transition. (Reporter: @pruge, bkit v2.1.14, CC v2.1.140, L2 trust)

✨ Added

  • #95 /sprint phase <id> --to <next> --approve [--reason "<text>"] — single-use Trust Level scope-boundary escape hatch. Records audit-logger entry scope_boundary_approved with full provenance. sprint.autoRun.scope not mutated (deliberate "controllable AI" design). (Reporter: @pruge)
  • #94 /sprint measure <id> --gate <key> and --gates <K1,K2,...> and --phase <name> — partial-gate manual measurement command. Routes through lib/application/quality-gates/measure-router.js to the canonical measurement agent (gap-detector for M4, etc.). Supports 7 gates: M1, M2, M3, M4, M7, M8, S1. (Reporter: @pruge)
  • #93 — Automatic gate-failure report generation. advancePhase now invokes a failureReporter dep on gate_fail, writing docs/03-analysis/<sprintId>-gate-fail-<phase>-<ts>.md with gate summary, suggested next commands, and audit trail pointers. sprint.lastGateFailure state field persists for /sprint status surfacing. (Reporter: @pruge)

🛠 Release Hardening (sub-sprint)

  • Layer A — Release metadata sync (3 files): README badge Version-2.1.14Version-2.1.16, hooks/session-start.js and hooks/startup/session-context.js version comments synced.
  • Layer B — Test maintenance (15 files, 31 stale FAIL → 0): 4 orphan test files removed (modules deleted in v2.1.10 Sprint 6), 11 stale baselines updated to current SoT (skills 43 → 44, agents 36 → 34, mcpTools 16 → 19, lib modules 142 → 177+, status-core exports 17 → 19, L3 contract 10/10 → 14/14, JS floating-point comparison epsilon fix, etc.).
  • Layer C — CI gate reinforcement (1 file): .github/workflows/contract-check.yml adds two mandatory steps — bkit-full-system (version sync + agent/skill structure) and docs-code-sync (counts SoT drift detector). Future PR/push auto-blocks on release metadata drift.

🧱 Architecture

New modules:

  • lib/application/quality-gates/measure-router.js — Pure dispatcher mapping gate keys (M1/M2/M3/M4/M7/M8/S1) to agents and result schemas
  • lib/application/quality-gates/failure-reporter.js — Pure markdown builder + side-effecting writeReport + factory createFailureReporter
  • lib/application/sprint-lifecycle/measure-gate.usecase.jsmeasureGate / measureGates / measurePhaseGates (atomic phase exit measurement)
  • templates/gate-failure-report.template.md — substitutable template

New audit action: scope_boundary_approved (28th ACTION_TYPE).

L3 contract grew with 4 new structural invariants: SC-11 (#92), SC-12 (#95), SC-13 (#94), SC-14 (#93).


📚 Documentation

Following bkit PDCA methodology (Plan → Design → Do → Check → Act), full documentation chain shipped:

  • docs/01-plan/features/v2116-issue-fixes.master-plan.md — sprint master plan
  • docs/01-plan/features/v2116-release-hardening.plan.md — hardening plan
  • docs/02-design/features/v2116-release-hardening.design.md — hardening design
  • docs/04-report/features/v2116-issue-fixes.report.md — feature fix report
  • docs/04-report/features/v2116-release-hardening.report.md — hardening report
  • CHANGELOG.md v2.1.16 section with full hardening sub-section

🙏 Thanks

Special thanks to @pruge for the 4 well-written reports (#92, #93, #94, #95) that pinpointed the exact L2 deadlock surfaces.

📦 Install / Upgrade

# Claude Code plugin marketplace
claude plugin install bkit
# Or upgrade existing
claude plugin upgrade bkit

Recommended CC version: v2.1.123+ (conservative) or v2.1.144 (balanced) — bkit v2.1.16 verified compatible with 99 consecutive CC releases from v2.1.34 through v2.1.144.

🔗 Links


🤖 Generated with Claude Code

Contributors

pruge
Loading

v2.1.15 — Issue #89 fix: .pdca-status.json pollution (6-Layer Defense)

18 May 11:27
@popup-kay popup-kay
b65d336
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

🎯 What's New in v2.1.15

Patch release — A defense-in-depth fix for #89, reported by @doing27. This release stops .pdca-status.json from growing unbounded with garbage feature entries on every source-file edit.

TL;DR

Before: .pdca-status.json could grow to 294 KB with 138 of 147 features entries being garbage and 1,661 of 1,669 history entries being noise, all written silently by the PreToolUse(Write|Edit) hook.

After v2.1.15: 6-Layer Defense ensures only PDCA-registered features (with an existing plan or design document) ever enter .pdca-status.json, history is deduplicated and ring-buffered to 100 entries, and 48 unit tests guard against regressions.


🔍 Root-Cause Analysis (Deep)

Issue #89 surfaced two bugs (extractFeature misextraction + updatePdcaStatus gating absence). During this release we discovered three additional latent bugs in the same code path and fixed all of them together:

# Latent bug File:Line Symptom
1 extractFeature captures filenames as features lib/core/file.js:109 app/services/broadcast_service.py'broadcast_service.py' (a file, not a feature)
2 extractFeature fallback returns generic dirs lib/core/file.js:129 apps/cms/v1/users.py'cms'/'v1' registered as features
3 updatePdcaStatus has no validation gate lib/pdca/status-core.js:164 Every invocation registers a feature unconditionally
4 pre-write.js reads a non-existent schema field scripts/pre-write.js:101 Read currentFeature (v1 schema name) instead of primaryFeature (v2/v3 schema). The v2.1.7 "phantom-feature guard" never actually fired
5 extractFeatureFromContext duplicates the buggy pattern-match lib/pdca/status-core.js:320 DRY violation — same bug as 1+2 cloned into a second module
6 updatePdcaStatus.history.push has no limit/dedup lib/pdca/status-core.js:210 Unbounded growth; addPdcaHistory had a 100-entry cap, but the direct push did not

🛡️ The 6-Layer Defense

edit event
 │
 ▼
[L1] extractFeature → reject file-like captures + extended GENERIC_NAMES + fallback opt-in
 │
 ▼ (legitimate feature candidate)
[L2] extractFeatureFromContext → delegates to L1 (DRY)
 │
 ▼
[L4] pre-write.js → corrected schema field (primaryFeature)
 │
 ▼
[L3] updatePdcaStatus → plan/design document gate (default ON)
 │
 ▼
[L5] history dedup + ring buffer
 │
 ▼
[L6] 48 unit tests → regression prevention (CI-tracked)

Layer 1 — lib/core/file.js

  • Reject filename captures: the pattern-matching loop now skips any candidate that has a file extension (path.extname(captured) non-empty). app/services/broadcast_service.py no longer extracts 'broadcast_service.py'.
  • GENERIC_NAMES expanded 19 → 65 entries — covers common backend/frontend layout directories (api, web, mobile, client, server, backend, frontend, admin, auth, cms, database, config, core, helpers, middleware, plugins, scripts, styles, static, public, assets, tests, tenants, versions, tmp, audit, dashboard), version directories (v1v9), and Next.js route groups ((dashboard), (auth), (public), (admin), (api)).
  • Fallback is now explicit opt-in: extractFeature(filePath, { allowFallback: true }). Existing callers receive the safer default (no parent-directory walk) automatically. Backward-compatible — second argument is optional.

Layer 2 — lib/pdca/status-core.js

extractFeatureFromContext now delegates to extractFeature. The duplicated, buggy pattern-matching loop is removed (DRY).

Layer 3 — lib/pdca/status-core.js

updatePdcaStatus(feature, phase, data, opts = {}) accepts a new opts.requireDocs flag (default true). When enabled, the function performs a findPlanDoc(feature) || findDesignDoc(feature) check; if neither exists, it returns silently (with a debugLog entry for forensics).

  • All 16 existing callers continue to work unchanged — PDCA lifecycle always creates a plan document before downstream phases invoke updatePdcaStatus.
  • The shouldUpdate(feature, requireDocs, docCheckFn) helper is extracted as a pure function for unit testability and dependency injection (docCheckFn parameter lets tests inject a stub).

Layer 4 — scripts/pre-write.js

The v2.1.7 phantom-feature guard read currentStatus?.currentFeature — a v1-schema field that does not exist in v2/v3 schemas. The migration code in status-migration.js:31,74 renames currentFeature → primaryFeature, so the guard's comparison was always undefined === <feature> (always false), causing every edit to be skipped silently.

Fixed: now reads currentStatus?.primaryFeature, restoring the guard's intended behavior. Combined with the L3 document gate, this gives two independent layers of protection.

Layer 5 — lib/pdca/status-core.js

A new appendHistoryEntry(history, entry, limit = 100) pure function:

  • If the last entry has the same feature, phase, and action, only the timestamp is updated — no push.
  • Otherwise, push and apply a ring buffer of limit entries (default 100).

Effect: 100 identical edits = 1 history entry (timestamp refreshed each time). Phase transitions and feature changes still produce new entries normally.

Layer 6 — Unit tests (tests/unit/)

Test file TCs Verifies
tests/unit/file-extract-feature.test.js 20 L1 (extractFeature) — filename rejection, GENERIC_NAMES enforcement, fallback opt-in
tests/unit/extract-feature-from-context.test.js 10 L2 (delegation correctness)
tests/unit/pdca-status-gating.test.js 18 L3 (shouldUpdate) + L5 (appendHistoryEntry) — dedup, ring buffer, custom-limit, Issue #89 scenario
Total 48 48/48 PASS

.gitignore now tracks tests/unit/** (matching the existing policy for tests/contract/**), so these tests run in CI.


👤 User Experience — What Changes

1. .pdca-status.json stays small and accurate

Scenario Before v2.1.15 After v2.1.15
Edit app/services/broadcast_service.py 50 times features.broadcast_service.py appears + 50 history entries No features entry created; no history entries
Edit apps/cms/v1/users.py features.cms appears as phantom No entry — cms and v1 are now in GENERIC_NAMES
Edit a file in docs/01-plan/features/auth.plan.md-registered feature auth 50 history entries 1 history entry (timestamp refreshed)
File path with no matching pattern (e.g. random/path/foo.py) 'random' registered via fallback Returns '' (fallback is now opt-in)
Long-running session, hundreds of edits .pdca-status.json grows to hundreds of KB Stays under a few KB, history capped at 100

2. PDCA workflow is unchanged for legitimate cycles

  • /pdca plan billing creates docs/01-plan/features/billing.plan.md → L3 gate passes → updatePdcaStatus('billing', 'do', ...) registers normally.
  • All 16 existing updatePdcaStatus call sites (hook scripts, lifecycle, batch orchestrator, etc.) continue to work because PDCA's natural ordering guarantees a plan document exists before downstream phases run.

3. Generic directory names are no longer auto-registered

auth, cms, dashboard, admin, api, etc. are now in GENERIC_NAMES. Trade-off: if you want one of these as a real PDCA feature, register it explicitly with /pdca plan auth (which creates the plan document and sets primaryFeature). The L3 gate then accepts it on subsequent edits. The previous behavior of silently auto-creating feature entries for such common directory names was the primary source of garbage.

4. History is now noise-free

The bkit dashboard, audit log, and any tools that read .pdca-status.json.history get meaningful entries only: real phase transitions and real feature changes. No more "updated ×ばつ 1,661" noise drowning out the 8 real cycles.

5. Zero migration required

This is a patch release with no breaking changes. Upgrade by re-installing or running git pull in your bkit plugin directory. The fix is active immediately.

Optional cleanup: if your existing .pdca-status.json is already polluted, back it up and let bkit re-initialize:

mv .bkit/state/pdca-status.json .bkit/state/pdca-status.json.bak
# bkit will recreate a clean v3 schema on next session

🔁 Compatibility

Item Status Note
Breaking changes ✅ None All 16 existing updatePdcaStatus callers receive default behavior
Schema migration ✅ Not required v3 schema unchanged
extractFeature(filePath)extractFeature(filePath, opts = {}) ✅ Backward-compatible Second argument optional
updatePdcaStatus(feature, phase, data)updatePdcaStatus(feature, phase, data, opts = {}) ✅ Backward-compatible Fourth argument optional
Sprint Management (v2.1.13 GA) ✅ Unaffected
bkit Trust Level (L0–L4) ✅ Unaffected
Soft change ⚠️ auth/cms/dashboard are now in GENERIC_NAMES Workaround: register explicitly via /pdca plan <feature>

📁 Files Changed (16)

Code (3)

  • lib/core/file.js (Layer 1)
  • lib/pdca/status-core.js (Layers 2, 3, 5)
  • scripts/pre-write.js (Layer 4)

Unit tests (3, new)

  • tests/unit/file-extract-feature.test.js (20 TCs)
  • tests/unit/extract-feature-from-context.test.js (10 TCs)
  • tests/unit/pdca-status-gating.test.js (18 TCs)

Metadata + version sync (6)

  • bkit.config.json
  • .claude-plugin/plugin.json
  • .claude-plugin/marketplace.json
  • hooks/hooks.json
  • scripts/unified-bash-post.js
  • .gitignore (added `!t...
Read more

Contributors

doing27
Loading

v2.1.14 — Differentiation Release (Memory Enforcer + Layer 6 + Sequential Dispatch + Effort-aware + PostToolUse + Heredoc-bypass)

14 May 07:45
@tomo-kay tomo-kay
a8091c0
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

bkit v2.1.14 — Differentiation Release

The only Claude Code plugin that verifies AI-generated code against its own design specs.

bkit v2.1.14 is a product-moat release. Every one of bkit's differentiations against vanilla Claude Code is now physically implemented, contract-tested, and proven by live dogfooding evidence collected during this release cycle.

Verified on Claude Code v2.1.140. 95 consecutive compatible CC releases (v2.1.34 ~ v2.1.141, R-2 skip versions excluded). Zero breaking changes for existing v2.1.13 users — the upgrade is drop-in.


✨ Highlights

Six differentiations, all enforced

The previous "five differentiations" framing is retired. v2.1.14 promotes one new defense and makes the full set physically implemented (no more "advisory") and contract-tested.

# Differentiation Mitigates Implementation
1 Memory Enforcer CC's CLAUDE.md is advisory — models override it PreToolUse deny with audit trail
2 Layer 6 Defense R-3 safety-hook-ignored regressions (10+ evolved forms tracked) Post-hoc audit + alarm + auto-rollback
3 Sequential dispatch CC #56293 sub-agent caching 10x regression (11-streak unresolved) 8-state dispatcher: first sequential → warmup sample → restore parallel
4 Effort-aware adaptive defense CC's new effort.level hook field exposed but no defense uses it Invariant 10 (ADR 0010) routes defense intensity off effort tier
5 PostToolUse continueOnBlock CC silent PostToolUse drop (#57317) — bkit's three PostToolUse hooks survive failures gracefully Block-aware continuation flag across unified-bash-post, unified-write-post, skill-post
6 Heredoc-pipe bypass guard CC #58904 — model can smuggle shell input through $(cmd <<TAG ... TAG) past defense Token-aware heredoc-inside-substitution detector

Live dogfooding caught two events during this release

Two events were captured while shipping v2.1.14 itself, recorded in docs/sprint/v2114/sub-sprint-6-observation.report.md:

  1. Memory Enforcer deny on an out-of-scope Write from a sub-agent — exact case the feature exists for.
  2. Heredoc Detector classify on the v2.1.14 release commit message itself — the guard correctly fired on git commit -m "$(cat <<'EOF' ... EOF)", forcing this release to use the safer git commit -F file.txt path instead. Reviewers can see this in the PR conversation.

Architecture growth (v2.1.13 → v2.1.14)

v2.1.13 v2.1.14 Delta
Lib subdirs 19 20 +lib/defense/
Port↔Adapter pairs 7 8 +caching-cost
ADR invariants 9 10 +0010 effort-aware
Lib modules 163 174 +11 (5 defense + 6 orchestrator/domain/infra)
Contract suites 3 base + 5 v2.1.13 3 base + 5 v2.1.13 + 5 v2.1.14 +22 TC
Skills / Agents / Hooks 44 / 34 / 21 events / 24 blocks unchanged

Closed work this release

  • CARRY-5 — OTEL token-meter zero-entries root cause identified and closed (subprocess env hydration via lib/infra/otel-env-capturer.js).
  • F9-120 closureclaude plugin validate . Exit 0 across 10 consecutive CC releases (v2.1.120/121/123/129/132/133/137/139/140/141). Carryover officially retired.

🧑‍💻 User Experience Changes

This section explains what changes for you in day-to-day use. TL;DR: bkit gets safer and quieter — you do not need to change your workflow.

What you'll feel differently

1. Some shell commands you used to run will now be blocked

If you (or a sub-agent on your behalf) try to run a shell command that contains a heredoc inside a command substitution — for example:

git commit -m "$(cat <<'EOF'
... message ...
EOF
)"

...you'll see a clear block message with safer alternatives. Use git commit -F file.txt or git commit -m '...' instead. This is the heredoc-bypass guard (differentiation #6) protecting against CC #58904. The block ships with three suggested alternatives so you can usually fix it in one move.

If you're certain a specific case is safe in your context, the guard's source (lib/defense/heredoc-detector.js) is auditable and the block can be lifted per-tool-call after manual review.

2. Sub-agent batches feel slower for the first spawn — then faster overall

When bkit spawns multiple sub-agents (Plan parallel, Design council, Do swarm, Check council), the first sibling now runs sequentially so its cache warms up before the rest go parallel. You'll see one extra spinner tick at the start of a multi-agent step, but total cost drops dramatically — pre-fix runs were paying ~10x cache_creation_input_tokens per spawn because of CC #56293.

If you trust your context and want the old behavior, set BKIT_SEQUENTIAL_DISPATCH=0 before launching CC. At Trust Level L4 sequential dispatch is forced regardless.

3. Memory rules are now actually enforced, not just suggested

When CLAUDE.md / project memory says "never touch X", bkit's Memory Enforcer now physically blocks the write at PreToolUse and emits an audit log entry — rather than logging an advisory that the model can ignore. If you've been writing memory rules and seeing them silently bypassed, this is the fix.

If you need to override for a one-off, the deny carries an explicit "manual override" instruction in its block message.

4. Defense intensity now adapts to effort.level

CC v2.1.133+ exposes an effort.level field on hook payloads (think "how hard is this turn working"). bkit's defenses now read this field via Invariant 10 (ADR 0010) and dial themselves down for trivial turns and up for heavy turns. You should see fewer false positives on quick edits and stronger checks on long multi-step turns.

5. PostToolUse hooks survive their own failures

CC's silent PostToolUse drop (#57317) used to mean a single failing hook could kill all downstream observability. v2.1.14's continueOnBlock flag means a failure in unified-bash-post no longer prevents skill-post from logging. Your audit trail is more complete with no action on your part.

6. Cache-cost telemetry is now real

The previous "zero entries" you may have seen in the token-meter adapter (CARRY-5) was a subprocess env-propagation bug. lib/infra/otel-env-capturer.js now hydrates OTEL_* into hook subprocesses on SessionStart, so cost dashboards finally show populated data. Combined with the new caching-cost Port↔Adapter pair, the dispatcher can make data-driven decisions about when to go parallel.

What stays the same

  • Skills, Agents, Hook events, MCP tool counts — unchanged from v2.1.13.
  • PDCA 9-phase enum + Sprint 8-phase enum — frozen (Application Layer pilot, ADR 0005).
  • Trust Level L0 ~ L4 semantics + SPRINT_AUTORUN_SCOPE — unchanged.
  • Existing /pdca, /sprint, /control, /bkit-explore, /audit, /rollback slash commands — unchanged signatures.

Upgrade path

# in your project (Claude Code plugin auto-pulls; no command needed)
# verify:
claude plugin info bkit # should show 2.1.14

Project memory files, sprint state files, and PDCA documents written by v2.1.13 are read by v2.1.14 unchanged. No migration scripts.

Recommended CC version

Profile Recommended CC Why
Conservative v2.1.123 Stable, OAuth fix, fail-isolated hooks
Balanced v2.1.140 Latest before the v2.1.141 60-bullet bumper observation window
Latest v2.1.141 Acceptable — heredoc bypass guard already covers #58904

See docs/06-guide/version-policy.guide.md for the full dist-tag 3-Bucket framework.


🧪 Verification

  • Tests — 291 existing + 22 new v2.1.14 contract tests = 313 PASS, 0 FAIL.
  • scripts/verify-full-system.js — 11/11 categories PASS (BKIT_VERSION sync, hooks reachability, domain purity, contract suites, invariants, defenses, observability ports, dist-tag drift, ENH backlog hygiene, dogfooding evidence, release-bundle inventory).
  • CC plugin validateclaude plugin validate . Exit 0 (10 consecutive cycles).
  • Domain layer purity — 12 files, 0 forbidden imports (CI-enforced).

📚 Documents

  • Master plan: docs/sprint/v2114/master-plan.md
  • PRD / Plan / Design: docs/sprint/v2114/{prd,plan,design}.md
  • Sub-sprint reports: docs/sprint/v2114/sub-sprint-{1..6}-*.report.md
  • ADR 0010 (Invariant 10 effort-aware): docs/adr/0010-effort-aware-invariant.md
  • CC version monitoring guide: docs/06-guide/cc-version-monitoring.guide.md
  • Version policy guide: docs/06-guide/version-policy.guide.md

🙏 Acknowledgements

This release was built using bkit itself across six PDCA-style sub-sprints, and the heredoc-pipe guard caught its own release commit during ship — exactly the kind of dogfooding evidence the differentiation framing is meant to produce.

🤖 Built with Claude Code

Loading
Previous 1 3 4 5 6 7
Previous

AltStyle によって変換されたページ (->オリジナル) /