Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix(audit): stamp seq on tool_exec + session_end events#173

Merged
initializ-mk merged 1 commit into
main from
fix/audit-tool-exec-seq
Jun 16, 2026
Merged

fix(audit): stamp seq on tool_exec + session_end events #173
initializ-mk merged 1 commit into
main from
fix/audit-tool-exec-seq

Conversation

@initializ-mk

@initializ-mk initializ-mk commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Symptom

A user on v0.15.0 reported `tool_exec` rows in the audit stream were missing the `seq` field even though they fired inside an A2A invocation. Their Slack-triggered task showed:

```
session_start seq=1 ✓
guardrail_check seq=2 ✓
llm_call seq=3 ✓
tool_exec (start) (none) ← BUG
tool_exec (end) (none) ← BUG
llm_call seq=4 ✓
session_end seq=5 ✓
invocation_complete seq=6 ✓
```

This breaks the FWS-8 guarantee that every event from a given `correlation_id` exposes a monotonically increasing `seq` consumers use for gap detection. `tool_exec` events also lose `trace_id` / `span_id` cross-link (Phase 4 / #105).

Root cause

`registerAuditHooks` registered the tool-exec hooks with `func(_ context.Context, ...)` — discarding the per-invocation ctx — and called `auditLogger.Emit(...)` instead of `EmitFromContext(ctx, ...)`. The per-invocation `SequenceCounter` lives on ctx; plain `Emit` can't see it.

Same pattern existed at one more site: the JSON-RPC `tasks/send` handler's outbound-guardrail-failure `session_end` branch at `runner.go:~1546` — the parallel success-path SSE branch right below it correctly used `EmitFromContext`, this one had drifted.

Fix

Three sites changed from `Emit(...)` to `EmitFromContext(ctx, ...)`:

Site Was Now
`registerAuditHooks` BeforeToolExec `func(_ context.Context, ...)` + `Emit` `func(ctxStart context.Context, ...)` + `EmitFromContext`
`registerAuditHooks` AfterToolExec `func(_ context.Context, ...)` + `Emit` `func(ctxEnd context.Context, ...)` + `EmitFromContext`
JSON-RPC tasks/send outbound-guardrail-failure session_end `Emit` `EmitFromContext(ctx, ...)`

Sites left alone (intentional, documented inline)

Site Why `Emit` is correct
`makeAuthAuditCallback` (auth_verify / auth_fail) Auth middleware runs before the runner installs the seq counter on ctx.
Egress proxy `OnAttempt` with `source: proxy` Subprocess HTTP CONNECT through the proxy has no Go ctx tying it back to the active A2A request.
MCP tool conflict at startup Pre-invocation, no scope.
Schedule dispatcher (`schedule_fire` / `schedule_complete`) Scheduler tick runs outside any A2A request scope.

The in-process egress enforcer's `OnAttempt` at `runner.go:~496` already uses `EmitFromContext` and stamps seq correctly.

Test plan

  • New regression test `TestToolExecAudit_CarriesSequenceFromContext` fires the registered hooks against a context carrying a `SequenceCounter`, parses the emitted NDJSON, and asserts `BeforeToolExec.seq=1` + `AfterToolExec.seq=2`. Pre-fix this test fails with `seq=0` on both events.
  • `go test -count=1 ./...` clean in forge-cli.
  • `golangci-lint run ./forge-cli/...` → 0 issues.
  • `gofmt -w` applied.

Schema impact

None. The events were already declared to carry `seq` — the field's `omitempty` JSON tag just dropped it when the value was zero. After the fix, consumers see the field they were already expecting; consumers ignoring unknown keys see no change.

Suggested follow-up (not in this PR)

The auth_verify / auth_fail seq gap is structurally fixable by moving the seq-counter installation earlier in the request pipeline (before the auth middleware). Worth filing as its own issue — same for adding a "audit emit must be EmitFromContext when inside an invocation scope" lint or linter pragma so this kind of drift gets caught at PR time.

Three audit-emit sites inside the per-A2A-invocation scope were
using plain AuditLogger.Emit() instead of EmitFromContext(ctx, ...),
which meant the per-invocation sequence counter installed on ctx at
request entry never reached those events. Consumers relying on FWS-8's
"every event from a given correlation_id exposes a monotonically
increasing seq" guarantee saw gaps in their gap-detection pipeline
exactly at tool_exec and the outbound-guardrail-failure session_end
branch.
Reported on v0.15.0: a Slack-triggered code-review task's audit
stream had session_start (seq=1), guardrail_check (seq=2), llm_call
(seq=3), then tool_exec start/end with NO seq, then llm_call (seq=4),
session_end (seq=5), invocation_complete (seq=6) — confirming the
counter advances but tool_exec emits skip it.
Three sites fixed:
 - registerAuditHooks BeforeToolExec (runner.go:~1990) — was
 `func(_ context.Context, ...)` then plain Emit; now takes
 ctxStart and calls EmitFromContext.
 - registerAuditHooks AfterToolExec (runner.go:~2016) — same.
 - JSON-RPC tasks/send handler outbound-guardrail-failure
 session_end (runner.go:~1546) — was plain Emit; now
 EmitFromContext(ctx, ...) matching the parallel success-path
 SSE branch right below it.
Sites left as plain Emit (intentional, documented inline):
 - makeAuthAuditCallback (auth_verify / auth_fail) — auth
 middleware runs BEFORE the runner installs the seq counter on
 ctx.
 - egress proxy OnAttempt with source=proxy — subprocess HTTP
 CONNECT has no Go ctx tying it back to the active A2A request.
 - MCP tool conflict at startup — pre-invocation, no scope.
 - Schedule dispatcher (schedule_fire / schedule_complete) —
 scheduler tick runs outside any A2A request scope.
Regression test: TestToolExecAudit_CarriesSequenceFromContext fires
the registered hooks against a context carrying a SequenceCounter,
parses the emitted NDJSON, and asserts BeforeToolExec.seq=1 +
AfterToolExec.seq=2.
@initializ-mk initializ-mk merged commit 16cfcf8 into main Jun 16, 2026
9 checks passed
initializ-mk added a commit that referenced this pull request Jun 16, 2026
...seq=1 (closes #174)
Pre-fix: auth_verify / auth_fail audit events lacked the seq field
because the SequenceCounter was installed downstream (in the runner's
per-A2A-request setup) of the auth middleware. The auth chain's
OnAuth callback used plain AuditLogger.Emit and couldn't reach a
counter that didn't exist yet on req.Context().
Symptom (user-visible on v0.15.0):
 auth_verify (no seq) ← BUG
 session_start seq=1
 guardrail_check seq=2
 llm_call seq=3
 ...
The FWS-8 contract promises every event under a given correlation_id
exposes a monotonically increasing seq for gap detection. auth_verify
sitting outside the sequence breaks that.
Fix: install the SequenceCounter on r.Context() BEFORE the auth chain
runs, via a thin middleware wrapper around auth.Middleware in the
runner. The auth callback now uses EmitFromContext(req.Context(), ...)
and stamps seq=1 on auth_verify. The runner's per-A2A-request setup
calls coreruntime.EnsureSequenceCounter (new helper) which detects
the existing counter and reuses it — so session_start lands seq=2
and the sequence is gap-free.
Three pieces:
 - forge-core/runtime/audit_schema.go gains EnsureSequenceCounter:
 returns ctx unchanged when a counter is present, installs a fresh
 one when absent. Used at every invocation-entry point that may
 run downstream of an upstream middleware.
 - forge-cli/runtime/sequence_counter_middleware.go (new) is the
 thin wrapper: composes (next) → authMW(next) → install counter
 → serve. Cost: ~24B counter allocation per request, even on
 skip paths — simpler than threading skip-path knowledge into the
 wrapper, and the alloc is in the same ballpark as the request
 struct.
 - forge-cli/runtime/runner.go:
 * AuthMiddleware: auth.Middleware(authCfg) →
 installSequenceCounterMiddleware(auth.Middleware(authCfg))
 * makeAuthAuditCallback: Emit → EmitFromContext(req.Context())
 * Three in-request WithSequenceCounter(ctx, new(...)) calls →
 EnsureSequenceCounter(ctx). Counter is reused from the auth
 wrapper (auth ON path) or installed fresh (--no-auth path).
 * Scheduler dispatcher's WithSequenceCounter call is left alone
 — it runs outside any HTTP request, no upstream counter.
Three other plain-Emit sites stay as documented in PR #173:
 - source=proxy egress: subprocess HTTP CONNECT has no Go ctx
 - mcp_tool_conflict: startup-time, pre-invocation
 - schedule_fire / schedule_complete: scheduler tick, no A2A scope
End-to-end smoke (--no-auth path):
 session_start seq=1
 session_end seq=2
 invocation_complete seq=3
End-to-end smoke (auth on, internal token):
 auth_verify seq=1 ← FIXED
 session_start seq=2
 session_end seq=3
 invocation_complete seq=4
Tests:
 - TestAuthAudit_SeqStampedWhenCounterInstalled — the #174 pin: with
 a counter on req.Context(), auth_verify lands seq=1
 - TestAuthAudit_NoSeqWhenCounterAbsent — back-compat: no counter →
 omitempty drops the field (legacy embedders without the wrapper)
 - TestSequenceCounterMiddleware_InstallsCounterBeforeNext — the
 wrapper installs the counter before delegating
 - TestEnsureSequenceCounter_ReusesExisting — runner must not
 clobber the auth-installed counter
 - TestEnsureSequenceCounter_InstallsFresh — --no-auth path: install
 a fresh counter when none is present
Schema impact: none. The events were already declared to carry seq.
initializ-mk added a commit that referenced this pull request Jun 16, 2026
Reflects PRs #173 and #174 (both merged to main) in the canonical
audit doc and the comprehensive knowledge skill.
docs/security/audit-logging.md — under "Sequence numbers", added:
 - Counter installation order — the SequenceCounter is installed by
 installSequenceCounterMiddleware (wraps the auth chain) so
 auth_verify / auth_fail land seq=1; the runner's request entry
 uses EnsureSequenceCounter to reuse the wrapper-installed counter
 and only allocate fresh on the --no-auth path. Pinned by
 TestAuthAudit_SeqStampedWhenCounterInstalled +
 TestEnsureSequenceCounter_ReusesExisting.
 - Emit invariant — per-invocation events MUST emit via
 EmitFromContext (or a typed helper); plain Emit skips the counter
 and the trace cross-link. The regression behind #173 (tool_exec +
 session_end) and #174 (auth_verify). A 4-row table names the
 sites that intentionally stay on plain Emit (egress proxy
 subprocess CONNECT, MCP startup events, scheduler tick, startup
 banners) and why. Issue #175 is named as the lint follow-up.
.claude/skills/forge.md — three sections updated:
 - § 3 path trace: middleware order now shows
 installSequenceCounterMiddleware as outermost; runner request
 entry uses EnsureSequenceCounter.
 - § 12.4 FWS-8 section: added the Emit invariant paragraph naming
 the regression pins and exception list.
 - § 18 FWS-8 row: appended #173 / #174 follow-up summary +
 #175 lint tracker.
No code change. TOC anchors unchanged. Broken-link sweep clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

AltStyle によって変換されたページ (->オリジナル) /