-
Notifications
You must be signed in to change notification settings - Fork 6
fix(audit): stamp seq on tool_exec + session_end events#173
Merged
Merged
Conversation
Three audit-emit sites inside the per-A2A-invocation scope were using plain AuditLogger.Emit() instead of EmitFromContext(ctx, ...), which meant the per-invocation sequence counter installed on ctx at request entry never reached those events. Consumers relying on FWS-8's "every event from a given correlation_id exposes a monotonically increasing seq" guarantee saw gaps in their gap-detection pipeline exactly at tool_exec and the outbound-guardrail-failure session_end branch. Reported on v0.15.0: a Slack-triggered code-review task's audit stream had session_start (seq=1), guardrail_check (seq=2), llm_call (seq=3), then tool_exec start/end with NO seq, then llm_call (seq=4), session_end (seq=5), invocation_complete (seq=6) — confirming the counter advances but tool_exec emits skip it. Three sites fixed: - registerAuditHooks BeforeToolExec (runner.go:~1990) — was `func(_ context.Context, ...)` then plain Emit; now takes ctxStart and calls EmitFromContext. - registerAuditHooks AfterToolExec (runner.go:~2016) — same. - JSON-RPC tasks/send handler outbound-guardrail-failure session_end (runner.go:~1546) — was plain Emit; now EmitFromContext(ctx, ...) matching the parallel success-path SSE branch right below it. Sites left as plain Emit (intentional, documented inline): - makeAuthAuditCallback (auth_verify / auth_fail) — auth middleware runs BEFORE the runner installs the seq counter on ctx. - egress proxy OnAttempt with source=proxy — subprocess HTTP CONNECT has no Go ctx tying it back to the active A2A request. - MCP tool conflict at startup — pre-invocation, no scope. - Schedule dispatcher (schedule_fire / schedule_complete) — scheduler tick runs outside any A2A request scope. Regression test: TestToolExecAudit_CarriesSequenceFromContext fires the registered hooks against a context carrying a SequenceCounter, parses the emitted NDJSON, and asserts BeforeToolExec.seq=1 + AfterToolExec.seq=2.
@initializ-mk
initializ-mk
force-pushed
the
fix/audit-tool-exec-seq
branch
from
June 16, 2026 16:11
2bf4007 to
d857bdd
Compare
Merged
6 tasks
initializ-mk
added a commit
that referenced
this pull request
Jun 16, 2026
...seq=1 (closes #174) Pre-fix: auth_verify / auth_fail audit events lacked the seq field because the SequenceCounter was installed downstream (in the runner's per-A2A-request setup) of the auth middleware. The auth chain's OnAuth callback used plain AuditLogger.Emit and couldn't reach a counter that didn't exist yet on req.Context(). Symptom (user-visible on v0.15.0): auth_verify (no seq) ← BUG session_start seq=1 guardrail_check seq=2 llm_call seq=3 ... The FWS-8 contract promises every event under a given correlation_id exposes a monotonically increasing seq for gap detection. auth_verify sitting outside the sequence breaks that. Fix: install the SequenceCounter on r.Context() BEFORE the auth chain runs, via a thin middleware wrapper around auth.Middleware in the runner. The auth callback now uses EmitFromContext(req.Context(), ...) and stamps seq=1 on auth_verify. The runner's per-A2A-request setup calls coreruntime.EnsureSequenceCounter (new helper) which detects the existing counter and reuses it — so session_start lands seq=2 and the sequence is gap-free. Three pieces: - forge-core/runtime/audit_schema.go gains EnsureSequenceCounter: returns ctx unchanged when a counter is present, installs a fresh one when absent. Used at every invocation-entry point that may run downstream of an upstream middleware. - forge-cli/runtime/sequence_counter_middleware.go (new) is the thin wrapper: composes (next) → authMW(next) → install counter → serve. Cost: ~24B counter allocation per request, even on skip paths — simpler than threading skip-path knowledge into the wrapper, and the alloc is in the same ballpark as the request struct. - forge-cli/runtime/runner.go: * AuthMiddleware: auth.Middleware(authCfg) → installSequenceCounterMiddleware(auth.Middleware(authCfg)) * makeAuthAuditCallback: Emit → EmitFromContext(req.Context()) * Three in-request WithSequenceCounter(ctx, new(...)) calls → EnsureSequenceCounter(ctx). Counter is reused from the auth wrapper (auth ON path) or installed fresh (--no-auth path). * Scheduler dispatcher's WithSequenceCounter call is left alone — it runs outside any HTTP request, no upstream counter. Three other plain-Emit sites stay as documented in PR #173: - source=proxy egress: subprocess HTTP CONNECT has no Go ctx - mcp_tool_conflict: startup-time, pre-invocation - schedule_fire / schedule_complete: scheduler tick, no A2A scope End-to-end smoke (--no-auth path): session_start seq=1 session_end seq=2 invocation_complete seq=3 End-to-end smoke (auth on, internal token): auth_verify seq=1 ← FIXED session_start seq=2 session_end seq=3 invocation_complete seq=4 Tests: - TestAuthAudit_SeqStampedWhenCounterInstalled — the #174 pin: with a counter on req.Context(), auth_verify lands seq=1 - TestAuthAudit_NoSeqWhenCounterAbsent — back-compat: no counter → omitempty drops the field (legacy embedders without the wrapper) - TestSequenceCounterMiddleware_InstallsCounterBeforeNext — the wrapper installs the counter before delegating - TestEnsureSequenceCounter_ReusesExisting — runner must not clobber the auth-installed counter - TestEnsureSequenceCounter_InstallsFresh — --no-auth path: install a fresh counter when none is present Schema impact: none. The events were already declared to carry seq.
3 tasks
initializ-mk
added a commit
that referenced
this pull request
Jun 16, 2026
Reflects PRs #173 and #174 (both merged to main) in the canonical audit doc and the comprehensive knowledge skill. docs/security/audit-logging.md — under "Sequence numbers", added: - Counter installation order — the SequenceCounter is installed by installSequenceCounterMiddleware (wraps the auth chain) so auth_verify / auth_fail land seq=1; the runner's request entry uses EnsureSequenceCounter to reuse the wrapper-installed counter and only allocate fresh on the --no-auth path. Pinned by TestAuthAudit_SeqStampedWhenCounterInstalled + TestEnsureSequenceCounter_ReusesExisting. - Emit invariant — per-invocation events MUST emit via EmitFromContext (or a typed helper); plain Emit skips the counter and the trace cross-link. The regression behind #173 (tool_exec + session_end) and #174 (auth_verify). A 4-row table names the sites that intentionally stay on plain Emit (egress proxy subprocess CONNECT, MCP startup events, scheduler tick, startup banners) and why. Issue #175 is named as the lint follow-up. .claude/skills/forge.md — three sections updated: - § 3 path trace: middleware order now shows installSequenceCounterMiddleware as outermost; runner request entry uses EnsureSequenceCounter. - § 12.4 FWS-8 section: added the Emit invariant paragraph naming the regression pins and exception list. - § 18 FWS-8 row: appended #173 / #174 follow-up summary + #175 lint tracker. No code change. TOC anchors unchanged. Broken-link sweep clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
A user on v0.15.0 reported `tool_exec` rows in the audit stream were missing the `seq` field even though they fired inside an A2A invocation. Their Slack-triggered task showed:
```
session_start seq=1 ✓
guardrail_check seq=2 ✓
llm_call seq=3 ✓
tool_exec (start) (none) ← BUG
tool_exec (end) (none) ← BUG
llm_call seq=4 ✓
session_end seq=5 ✓
invocation_complete seq=6 ✓
```
This breaks the FWS-8 guarantee that every event from a given `correlation_id` exposes a monotonically increasing `seq` consumers use for gap detection. `tool_exec` events also lose `trace_id` / `span_id` cross-link (Phase 4 / #105).
Root cause
`registerAuditHooks` registered the tool-exec hooks with `func(_ context.Context, ...)` — discarding the per-invocation ctx — and called `auditLogger.Emit(...)` instead of `EmitFromContext(ctx, ...)`. The per-invocation `SequenceCounter` lives on ctx; plain `Emit` can't see it.
Same pattern existed at one more site: the JSON-RPC `tasks/send` handler's outbound-guardrail-failure `session_end` branch at `runner.go:~1546` — the parallel success-path SSE branch right below it correctly used `EmitFromContext`, this one had drifted.
Fix
Three sites changed from `Emit(...)` to `EmitFromContext(ctx, ...)`:
Sites left alone (intentional, documented inline)
The in-process egress enforcer's `OnAttempt` at `runner.go:~496` already uses `EmitFromContext` and stamps seq correctly.
Test plan
Schema impact
None. The events were already declared to carry `seq` — the field's `omitempty` JSON tag just dropped it when the value was zero. After the fix, consumers see the field they were already expecting; consumers ignoring unknown keys see no change.
Suggested follow-up (not in this PR)
The auth_verify / auth_fail seq gap is structurally fixable by moving the seq-counter installation earlier in the request pipeline (before the auth middleware). Worth filing as its own issue — same for adding a "audit emit must be EmitFromContext when inside an invocation scope" lint or linter pragma so this kind of drift gets caught at PR time.