Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: Conalh/AgentPulse

v0.8.0: Headless --once parity (hide-idle/max-sessions), fail-closed CI gate, and hardening

29 May 16:14
@Conalh Conalh

Choose a tag to compare

A correctness-and-hardening pass driven by an external code review. The headless agentpulse live --once path (the form the GitHub Action runs) is brought in line with the TUI and the inputs the Action already documented; the strict CI gate no longer passes when its own analysis crashed; the macOS notifier escaping is fixed; and discovery gains cost bounds while the report gains an opt-in redaction layer. Detector scope, the Action's supply-chain model, and the CI output's privacy surface are now documented honestly instead of being overstated or left implicit.

Fixed — live --once silently ignored --hide-idle and --max-sessions

Both flags are parsed into LiveOptions by the CLI and forwarded by the GitHub Action (max-sessions defaults to 20), and the interactive TUI honours both — but runOnceMode never read them, so in headless/CI mode they were no-ops despite the Action documenting them. --once now applies both: --hide-idle drops zero-activity sessions from the report (error rows are kept; idle sessions never gate, so this can't change the gate), and --max-sessions caps the printed per-session listing. The cap is deliberately display-only — bucketCounts, hasGatingFinding, and the JSON sessions[] still reflect every analyzed session, so a drifting/stuck session below the cap still fails the build — and the text report announces the truncation (showing freshest N of M sessions) rather than hiding it. SessionSnapshot now carries toolInvocationCount (sourced from recap.enriched) for the idle filter and JSON consumers.

Fixed — strict CI gate failed open on transcript/parser errors

In --once, a session whose pulse() threw (unreadable or corrupt transcript, parser failure) becomes a snapshot with bucket error. The gating set is {drifting, stuck} and --strict exited 1 only on hasGatingFinding, so a run consisting solely of broken transcripts exited 0 under --strict — a governance gate passing green though its own analysis never ran. New --fail-on-error (and Action input fail-on-error, defaulting to true) makes --strict also fail when any session is in the error bucket. Bare --strict without the flag keeps the prior advisory-on-error behaviour so local use isn't disrupted. The exit-code decision is now a pure, exported gateExitCode() so the gate contract is unit-tested directly.

Fixed — macOS notification could break out of the AppleScript string literal

The --notify os macOS branch assembles a single display notification "..." with title "..." expression from transcript-derived text (the session label is ultimately basename(cwd), unsanitized) but escaped only double-quotes. A label ending in a backslash turned the closing quote into an escaped quote so the literal never terminated — an AppleScript-injection/breakage vector — and a raw newline made the one-line expression a syntax error. The escaper now strips control characters (CR/LF included) and escapes the backslash before the quote, and all platforms clamp label length. The Windows PowerShell branch was already sound (every value lands in a single-quoted literal with ''' doubling, the complete escaping there) and is unchanged apart from the length clamp; Linux uses pure argv. New direct tests fuzz the escaper with quotes, backslashes, newlines, and an injection-shaped payload.

Changed — narrowed the text-only verification failure heuristic

classifyResultText (consulted only when a verification command has no exit code — callers prefer the exit code) included a bare /\berror\b/i in its failure table, so any result text merely mentioning the word "error" was classified as a failure, which could bias the stuck/converging/sequence verdicts. The bare match is replaced with runner-shaped error contexts (Error: at line start, npm ERR!, error TS####, N error(s), compilation/build/command failed); genuinely-red output still matches, an incidental "error" no longer does. A stale comment in sequences.ts (claiming indeterminate verifications count as failures for stuck_loop, when the detector only counts clear failures) is corrected.

Added — bounded discovery (--max-depth, --exclude)

Discovery and the polling watcher walked transcript roots with an unbounded recursive descent and applied staleness only after every file was found and stat'd, so a broad --roots (or a CI artifact dir with deep history) could incur unnecessary I/O. --max-depth <N> caps how far below each root the walk descends and --exclude <d1,d2,...> prunes directory names (e.g. node_modules,.git) before recursing; both are threaded through the one-shot discovery and the live watcher, and exposed as Action inputs. Defaults are unchanged (unbounded / no exclusions), so existing behaviour is preserved.

Added — opt-in redaction for --once output (--redact)

The Action streams the report into the GitHub step summary and an optional sticky PR comment; narratives, signals, and the transcript path embed transcript-derived absolute paths, path clusters, and file names, with no way to scrub them. New --redact <none|paths|all> (and Action input redact): paths reduces file paths / path clusters in narratives, signals, and the transcript path to basenames; all additionally drops narratives and signals, leaving label, bucket, confidence, and drift count. Applies to both text and json.

Documented — detector scope, supply chain, and CI privacy

  • Drift-detector scope. The README now states that drifting means "a known risky pattern fired," not "this session is safe," and lists the three rule families the detector actually implements (privileged-path access; curl/wget piped to a shell; write outside the repo root) plus the notable patterns it does not cover. The "network execution" wording that overstated the single shell-pipe rule is corrected.
  • Action supply chain. The Action executes the published npm package @conalh/agentpulse@<version>, not the checked-out action source; pinning the git ref pins only the version string while the code and its semver-compatible dependencies resolve from npm at run time. This trust boundary is now documented in the README and action.yml.
  • CI privacy. The README documents exactly what transcript-derived content reaches the step summary / PR comment (and the additional fields in the JSON snapshot), with guidance to use redact and not to upload the raw JSON publicly.
Assets 2
Loading

v0.7.3: Normalize agent-gov-core dependency to ^1.3.0

29 May 03:01
@Conalh Conalh

Choose a tag to compare

Internal

  • Bumped agent-gov-core dependency ^1.2.1^1.3.0 to align with the rest of the suite (all five detectors are on ^1.3.0).
  • No behavior change. AgentPulse uses core's transcript parsers and the Finding/Report contract — all unchanged across 1.2.1→1.3.0. The 1.3.0 additions (diff-input guards) are for PR-diff detectors and aren't on AgentPulse's path. Verdicts and report output are unchanged.
  • README + example workflow self-pins bumped to v0.7.3.
  • Takes effect for the GitHub Action once @conalh/agentpulse@0.7.3 is published to npm (the Action npx-installs the published package).
Loading

v0.7.2: Classifier + parse-cache correctness pass (multibyte fix, dedup, fewer false positives)

29 May 02:17
@Conalh Conalh

Choose a tag to compare

A correctness-and-consistency pass across the classifier pipeline and the TUI. Several rules silently mis-fired on real (non-synthetic) sessions — duplicated verification vocabulary across three layers, a stuck_loop that never triggered on actual transcripts, prose-only sessions flagged as stuck, and an incremental parser that double-counted events on any non-ASCII transcript — alongside internal de-duplication of constants/namespaces and two TUI render/label fixes.

Fixed — verification vocabulary was duplicated across three layers

The set of commands that count as a "verification" (running tests, a type checker, a linter, or a build) and the pass/fail result-text tables were maintained as three divergent copies — one each in enrich.ts, sequences.ts, and trajectory.ts. The lists had genuinely drifted (go vet, cargo clippy, mypy, ruff, gradle, mvn, npm run, playwright, prettier were recognized by the sequence detector but not the verification-trend computation), so a command could count toward the edit→verify cycle but not the pass/fail trend — or vice versa — silently. All three layers now share src/verification.ts (isVerificationCommand + classifyResultText); the trend layer gains the tools it was missing and switches from a loose substring match to the precise word-boundary regex the other layers already used.

Fixed — stuck_loop never fired on real transcripts (only synthetic fixtures)

Layer 2.5's detectStuckLoop read the verification exit code off the Bash tool_use event, but the parser emits tool_use and tool_result as separate events with the exit code on the tool_result (keyed by toolUseId). So every real session's verification outcome resolved to indeterminate, no failures were counted, and a wedged edit→fail→edit→fail loop silently degraded to a healthy tdd_loop. analyzeSequences now links tool_result outcomes by id — the same fix computeVerificationTrend got in v0.6.1 — so stuck_loop fires on real transcripts. Locked in by a new expectedSequence assertion on the golden corpus.

Fixed — prose-only sessions false-positived as stuck

A documentation session (5+ .md edits, no test command) tripped refuse_to_verify and flipped to stuck, because the detector counted every edit toward its ≥4 threshold. Editing a doc set legitimately has nothing to verify, so detectRefuseToVerify now counts only code edits — prose extensions (.md/.mdx/.markdown/.txt/.rst/.adoc/.org) don't count, and unknown paths still count as code (conservative toward firing). New doc-edits-no-tests corpus fixture guards the converging outcome; a .ssh-keys cwd fixture guards that the privileged-path detector doesn't misfire on hyphenated lookalikes.

Fixed — incremental parse cache duplicated events on any non-ASCII session

The incremental tail-reader (parseCache.ts) tracks a byte offset of the last complete line, but it derived how far it advanced from a UTF-16 string index (raw.lastIndexOf('\n')). Any multibyte character — emoji, accents, CJK, all common in agent prose, code, and tool output — made the two diverge, so the next read rewound before the true position, re-parsed already-consumed lines, and appended them as duplicate events. Those duplicates inflate the edit/verification counts the classifier runs on, so the live dashboard could land on the wrong bucket for essentially any real session. (The whole-file recap path was unaffected; this was incremental-only — the live TUI and orchestrator.) The reader now finds the newline in the raw buffer (0x0A can't appear inside a UTF-8 multibyte sequence) so the offset is exact. New regression fixture reproduces the duplicate.

Fixed — narrative privileged-path lede claimed coverage the detector lacked

renderDrifting's privileged-path matcher tested for gnupg and credential, but PRIVILEGED_PATH_RULES only ever emits ssh_path / aws_path / kube_path / etc_shadow / private_var — so those terms were dead and the lede over-promised "credentials." The matcher and wording now reflect what's actually flagged (SSH/AWS/Kube/system paths).

Changed — internal duplication cleanup

Default timing constants (DEFAULT_WINDOW_MS, DEFAULT_REFRESH_INTERVAL_MS, DEFAULT_STALE_MS) were re-spelled as literals across five files with two different spellings of the same value; they now live in src/defaults.ts and are imported everywhere. The agent_pulse.live_drift_<slug> namespace convention was likewise built in one place and parsed back with two divergent copies (one using a magic +7); a new src/drift.ts owns the prefix plus both directions (driftKind / driftSlug). No behaviour change. Also corrected stale comments (orchestrator coalescing "exactly once" → "~2 pulses max", the exploring bucket's "no edits yet", stacked JSDoc on the exceptions loader, the DiscoverOptions.staleMs default documented as "24 hours" when it is 1 hour, and evictParseCache's "called by the watcher" — it's the orchestrator reacting to a watcher remove event).

Fixed — the TUI re-rendered every second even with nothing to animate

App's 1 s wall-clock interval drives only the whitelist-preview countdown banner, but it ran unconditionally for the whole session — so the root reconciled every second (children bail out via React.memo, but the parent render itself doesn't), the exact thrash the v0.5.3 self-clocking-footer refactor set out to remove. The interval now runs only while a preview banner is on screen.

Fixed — session detail header disagreed with its own list row

The detail pane resolved its title as projectName ?? id, while the session list, the sort key, and the headless --once snapshot all use fallbackLabel(). So the same session could read as a 12-char hex id in the detail header but as the path-inferred name (e.g. "AgentPulse") in the list, an empty projectName rendered as a blank header, and — most visibly — a user's rename alias updated the list row but not the detail header. The detail pane now shares fallbackLabel() and applies the same alias prefix.

Loading

v0.7.1: session-noise + cross-runtime accuracy pass

29 May 02:05
@Conalh Conalh

Choose a tag to compare

[0.7.1] — 2026年05月28日

Session-noise and cross-runtime accuracy pass, driven by real multi-terminal session sets. The TUI had quietly accumulated fixes that the headless agentpulse live --once path never inherited, so machine-readable output and --notify diverged from what the dashboard showed.

Fixed — session noise the headless --once snapshot missed (#11)

Three dashboard-hygiene rules lived only inside the TUI (src/tui/App.tsx, src/tui/SessionList.tsx) where the --once path couldn't reach them. They are now in shared modules (src/sessions/subagents.ts, src/labels.ts) and wired into both surfaces:

  • SDK subagents leaked into the snapshot. The TUI filtered agent-<hex>.jsonl subagent transcripts by default (the --show-subagents opt-in, since v0.2.3), but --once did not. A real session with one lab parent + four subagents reported five identical lab rows in JSON. once.ts now uses the shared isSubagentTranscript() predicate — same default-off behaviour, same --show-subagents opt-in. The predicate also catches the structural <parent>/subagents/agent-<hex>.jsonl layout where the parent's project name leaks into the child.
  • Co-named sessions were indistinguishable in JSON/notify. SessionList has suffixed colliding <project>|<runtime> rows with · <hex> since v0.4.4, but --once emitted bare projectName. Two parallel AgentPulse sessions (e.g. one Claude Code, one Codex) couldn't be told apart in output or notifications. A new label field on the snapshot carries the disambiguated string via the shared computeDisambigSuffixes() helper; projectName is preserved unchanged for existing programmatic consumers, and --notify bodies now use label.

Fixed — project names were systematically wrong on multi-word Claude Code projects (#11)

decodeSlug's last-hyphen-segment rule is structurally lossy: a slug like C--FullStee-nutrition-experiment-lab can't be split back into prefix + project because hyphens are both path-separator encodings and legal name characters, so every multi-word project decoded to its last segment (lab, brief, ...). The transcript's first line carries the authoritative cwd, so naming now prefers basename(cwd) and falls back to slug-decode only when cwd is absent. Compounding it, extractCwdFromFirstLine was strictly first-line-only — but Claude Code transcripts open with permission-mode / file-history-snapshot lines before the first cwd-bearing message, so the cwd-first path never ran. It now scans up to 25 lines / 64 KB. A subagent-shaped cwd basename (agent-<hex>) is rejected so a project is never mislabelled as a tooling artifact.

Fixed — exploring narrative contradicted its own signals (#11)

The low-confidence exploring fallback (trajectory.ts:799) fires regardless of edit count, but the narrative renderer unconditionally appended "No edits yet — it's still figuring out the shape." A session with 20 edits in the window directly contradicted its Signals line. renderExploring now branches on edit count and emits an honest hedge ("Mixed signal: N edits in the window but no clear direction yet") when the fallback fires with edits present.

Fixed — current Codex desktop session format

Recognizes the current Codex desktop transcript shape, including custom_tool_call / custom_tool_call_output events that the substrate parser doesn't yet model, so Codex desktop sessions classify (editing / verification / drift) instead of falling through to other. Adds codex-desktop-* golden corpus fixtures.

Bumped — agent-gov-core ^1.2.1

v0.7.0 pinned ^1.2.0 against an unpublished substrate version, so npm ci hit ETARGET in CI. Substrate has since shipped v1.2.0 + v1.2.1 (streaming reader patch on top of the Antigravity parser); the caret now matches the registry.

Tests

240 tests pass (up from 228 at v0.7.0). New coverage for subagent filtering (both directions), co-named disambiguation, cwd-first naming across the metadata prelude, the exploring-with-edits narrative branch, and Codex desktop classification. Windows CI flake fixed by adding maxRetries/retryDelay to ~60 recursive rmSync teardowns (lazy handle release intermittently threw ENOTEMPTY/EBUSY).

Loading

v0.7.0: Native Antigravity session tracking support

25 May 14:43
@Conalh Conalh

Choose a tag to compare

Added — Native Antigravity Session Tracking

Added first-class support for Google DeepMind's Antigravity transcript format (~/.gemini/antigravity/brain). Integrates seamlessly with the thread-safe agent-gov-core@1.2.0 substrate parser to provide fully aligned session tracking and live TUI verdict support.

  • Stateless Sequential toolUseId Linkage: Uses the new threadable activeToolCalls parameter from the substrate parser to map Antigravity's asymmetrical replace_file_content $\leftrightarrow$ REPLACE_FILE_CONTENT tools sequentially while avoiding any module-level state.
  • Empty assistant_message Conditionally Pruned: Automatically checks for empty planner responses and suppresses empty placeholder messages.
  • CommandLine $\to$ command Normalization: Auto-translates Antigravity parameters inside unwrapArgs so that existing verifiers and checkers require zero modification downstream.
  • Cwd-level Path Resolution: Scans early lines in the session JSONL to locate and extract Cwd / DirectoryPath / SearchPath, and populates the event-level cwd for robust drift and relative path analyses.
  • Exit Code Extraction: Grounded parsing of the verified RUN_COMMAND exit-code shapes (failure, completed successfully, and status error).
  • Orphan Cleanup: Integrates memory-safe cleanup inside parseCache.ts to prune active tool call mappings once their corresponding events fall out of the cache's active window.
  • Golden Integration scenario: Added antigravity-converging.jsonl golden replay case to the test corpus to ensure regression protection.
Loading

v0.6.2: Codex external inspection fixes (cross-runtime file-paths + relative path drift)

25 May 14:43
@Conalh Conalh

Choose a tag to compare

Fixed — multi-runtime adapter drift (Codex external inspection)

The README claimed Claude Code / Cursor / Codex support, and the substrate parsers in agent-gov-core@1.1.0 do parse all three line formats correctly. But the consumer layers in this package (enrichment, sequences, trajectory) were written against the Anthropic tool vocabulary and silently misclassified Cursor + Codex events.

A Codex inspection (post-v0.6.1) confirmed: running a real Codex fixture through pulse() counted both Codex tool calls as other — zero editing, zero verification, no primary files. Cursor's toolInput.path (vs Anthropic's toolInput.file_path) was equally invisible to the enrichment and sequence layers (the trajectory layer already handled both — partial coverage that made the gap easy to miss).

The shape of the fix:

  • New src/normalize.ts centralizes two helpers used wherever a consumer reaches into a TranscriptEvent:

    • canonicalToolName(name) maps Codex's shellBash and apply_patchEdit. Unknown names pass through verbatim.
    • extractFilePath(toolInput) tries file_path / path / filePath / notebook_path in order, plus an apply_patch fallback that parses *** Add File: <path> / *** Update File: <path> markers from the Codex patch body.
  • src/enrich.ts:classifyToolUse routes the toolName through canonicalToolName before the switch. getFilePath delegates to the shared normalizer.

  • src/sequences.ts:classifyEvent + extractFilePath — same treatment.

  • src/trajectory.ts:isVerificationEvent accepts canonical Bash (so Codex shell lands). The Write-shaped drift detector also accepts canonical Edit (so Codex apply_patch participates).

  • MCP tool detection stays keyed off the raw name (canonicalization only rewrites known runtime aliases, not MCP-prefix patterns).

Fixed — relative paths false-tripped outside-repo drift

isPathOutsideNormalizedRoot compared the raw file path against the normalized root. A relative path like src/utils/date.ts never starts with c:/dev/agentpulse/, so every relative Write triggered the outside-repo detector — a false positive on legitimate edits. Cursor (and Codex's apply_patch) emit relative paths constantly.

  • src/trajectory.ts:isPathOutsideNormalizedRoot now accepts an optional cwd parameter. New isAbsolutePath helper detects Unix /... and Windows C:/... shapes; relative paths get resolved against the event's cwd before comparison. When cwd is unavailable, the function defensively returns false (NOT outside) — a false negative on drift detection is strictly better than a false positive on legitimate edits.

Fixed — watcher derived project name before extracting cwd

src/sessions/watcher.ts called deriveProjectName(absPath, root.path) BEFORE extractCwdFromFirstLine, missing the cwd-aware fallback that src/sessions/discovery.ts uses. For Codex's date-shaped slugs (~/.codex/sessions/2026/05/23/...), a newly-watched session showed worse labels (date stub) until the next discovery cycle picked it up.

  • Swapped the call order to match discovery: extract cwd first, pass it into deriveProjectName. Three-line fix.

Fixed — npm tarball missed the README hero image

package.json files whitelist included dist, LICENSE, README.md, CHANGELOG.md — but not assets/. The v0.6.1 README references ./assets/dashboard.svg, so anyone viewing the README on npmjs.com saw a broken image. Added assets/dashboard.svg to the whitelist.

Added — cross-runtime corpus fixtures

The golden corpus (added in v0.6.1) was Claude Code-only, which is exactly why the multi-runtime gap stayed hidden through 222 tests. v0.6.2 adds:

  • test/corpus/cursor-converging.jsonl — Anthropic envelope with toolInput.path instead of file_path. Locks in the extractFilePath fix.
  • test/corpus/codex-converging.jsonl — Codex response_item shape with shell + apply_patch. Locks in the canonical-name + apply_patch-path-extraction fixes.

Both expect converging with confidence ≥ 0.6. Pre-fix they would have landed as exploring or idle.

Added — targeted relative-path drift tests

Three new tests in trajectory.test.mjs:

  • Relative Write resolved against cwd inside repoRoot → no drift
  • Relative Write without cwd → no drift (defensive default)
  • Absolute Write outside repo still trips drift (regression check for the v0.6.2 narrowing)

Tests

227 (was 222). Five new: two corpus scenarios + three trajectory tests.

Bumped

  • Action ref in README.md + examples/agentpulse-pr-check.yml: @v0.6.1@v0.6.2.
Loading

v0.6.1: Regression armor — property tests + golden corpus + classifier bug fix

24 May 02:25
@Conalh Conalh

Choose a tag to compare

The "regression armor" batch — property tests (Gemini #9) + golden replay corpus (Gemini #3). Both items closed the inspection backlog. Both invisible to users; both immediately earned their keep.

Added — golden replay corpus (Gemini #3)

test/corpus/ is a small, labelled collection of representative transcript JSONLs paired with the bucket each must classify into. test/corpus.test.mjs reads manifest.json, runs the full pulse() pipeline against each fixture, and asserts the verdict. Six scenarios cover the six trajectory buckets — converging, stuck, exploring, done, drifting (shell_exfil), idle.

Adding a scenario: drop <name>.jsonl into test/corpus/, add a manifest entry with { file, endAt, windowMs, expectedBucket, minConfidence, expectedDriftCount }, run npm test. New scenarios pick up automatically.

Fixed — verification trend was silently no_data for ALL real parsed transcripts

The corpus surfaced this on its first run. computeVerificationTrend in src/trajectory.ts looked for toolResultExitCode on the tool_use event — the synthetic test fixture shape. But the real parser emits tool_use and tool_result as SEPARATE TranscriptEvent objects linked by toolUseId; exit codes live on the tool_result. So every real Claude Code / Cursor / Codex transcript yielded verificationTrend === 'no_data', which meant the converging bucket effectively never fired in production — only in synthetic tests.

  • Fix: computeVerificationTrend now builds a toolUseId → tool_result map up front and falls back to the linked tool_result when the tool_use itself has no exit code. Backward-compatible — the synthetic shape (exit code on the tool_use) still works.
  • The corpus catches the symptom: converging.jsonl has a fail→pass test sequence and asserts the bucket is converging with confidence ≥ 0.6. Pre-fix that asserted, post-fix it passes.

This is the single biggest classifier-quality fix since v0.3.5. Anyone running v0.5.x against a real session was getting a downgraded verdict (likely exploring or idle instead of converging) anytime the agent did TDD-shaped work. v0.6.1 restores the intended behaviour.

Added — property tests on the pure-core layers (Gemini #9)

test/property.test.mjs runs randomized-input fuzz loops (200 iterations each) over the pure-function surfaces and asserts invariants that must hold for ANY input:

  • classifyTrajectory: confidence always in [0, 1], bucket always one of the six known values, deterministic (same inputs → same verdict), drifts array only non-empty on drifting
  • sparkline: output length always equals requested width, empty-input fallback respected
  • parseDuration: linearity (N * parseDuration('1m') === parseDuration('Nm')), malformed inputs reject without throwing
  • applyHysteresis: input state never mutated, stable bucket only flips after two consecutive agreements
  • analyzeSequences: pattern always in the known set, confidence in [0, 1]
  • readOutcomeSignal: idleGapMs never negative

Hand-rolled Math.random() generator + seeded PRNG (set AGENTPULSE_PROPERTY_SEED=<n> to replay a specific run). Zero deps — fast-check would be a single-purpose addition and these invariants are simple enough.

Fixed — CLI module side-effected on import

src/cli.ts called main() unconditionally at module load. Importing parseDuration from it (which the property tests needed) ran the CLI's main function, printed usage, and exited — breaking any consumer that wanted to use the file as a library. Guarded main() with the standard ESM idiom (process.argv[1] === fileURLToPath(import.meta.url)).

Tests

222 (was 204). Eighteen new tests:

  • 12 property tests covering invariants across trajectory, sequences, sparkline, cli (parseDuration), hysteresis, outcome
  • 6 corpus scenarios (one per bucket) running end-to-end through the full pipeline

Bumped

  • Action ref in README.md + examples/agentpulse-pr-check.yml: @v0.6.0@v0.6.1.
Loading

v0.6.0: Incremental transcript parsing — biggest perf win

24 May 02:25
@Conalh Conalh

Choose a tag to compare

Added — incremental transcript parsing (Gemini #1)

The biggest perf win on the inspection backlog. Pre-v0.6, every live-mode refresh re-read the full transcript file end-to-end: a 2-hour Claude Code session with a 10 MB JSONL ate ~10 MB of disk I/O and N thousand JSON.parse calls every 30 s, of which most events were immediately dropped by the windowing filter. With N sessions in the dashboard, multiply by N.

v0.6.0 reads only what's new. Per-path state tracks the byte offset of the last complete line we've parsed; subsequent pulses tail-read [lastOffset, currentSize) instead of the whole file.

How it works (src/parseCache.ts)

On each call to readWindowFromCache(path, since, until):

  • stat the file. Compare mtimeMs + size against the cached values.
  • No change: return window-filtered cached events. Zero I/O beyond the stat.
  • Grew: open the file, read(buf, 0, size - lastOffset, lastOffset) — exactly the new bytes. Parse appended lines, append to cached event list, advance lastOffset past the last complete \n. Bytes after the last \n (an in-progress line) get deferred to the next pulse.
  • Shrank or mtime regressed: file was rotated/truncated. Evict the cache entry and full-re-read on this pulse.
  • Missing file: throws (matches parseTranscriptDir's contract; the orchestrator captures this into SessionState.error).

Cached events are pruned with a 1-hour margin past the window start so a 24-hour session doesn't grow memory without bound, while still tolerating --window resizes without a full re-read.

Wiring

  • PulseOptions.events (new) — caller-supplied pre-parsed events that bypass pulse()'s own parser entirely. CLI callers (recap, live --once) leave this unset; the orchestrator sets it.
  • src/orchestrator.ts:runPulse — calls readWindowFromCache(path, startAt, endAt, { silent: true }) first, then pulse({ events, endAt, ... }). The explicit endAt keeps the window math consistent between the cache filter and pulse's internal startAt computation.
  • evictParseCache(path) wired into the orchestrator's remove(sessionId) so the cache doesn't hang onto state for a deleted session.

Substrate

No agent-gov-core change needed for this release. The substrate's v1.1.0 per-runtime line parsers (parseAnthropicLine, parseCodexLine, isCodexLine, etc.) and timestamp helpers (coerceTimestamp, interpolateTimestamps, isRecord) cover everything the cache needs. The cache is pure AgentPulse logic on top of the existing public surface.

Expected speedup

For a 10 MB transcript on a 30 s refresh:

  • Pre-v0.6: ~10 MB read + N thousand JSON.parse calls per pulse.
  • Post-v0.6: typical pulse reads only what the agent appended in the last 30 s (often a few KB) + parses ~5–50 lines.

The bigger the transcript, the bigger the gain. CI / live --once mode is unaffected — that path keeps using parseTranscriptDir (whole-file, batch).

Why a minor bump (v0.6.0)

PulseOptions.events is an additive optional field — fully backward-compatible. The behaviour change is internal (live-mode orchestrator routes through the cache); external API is unchanged. Calling this v0.6.0 instead of v0.5.7 to mark a meaningful internal performance shift, not because anything broke.

Tests

204 (was 194). Ten new tests in parseCache.test.mjs:

  • First-call full parse
  • Second-call cache hit (no re-read)
  • Tail-read only of appended bytes
  • Window filter applied at each call (sliding window)
  • Incomplete final line deferred to next read
  • Rotation/truncation (file shrinks) evicts cache and re-reads
  • Missing file throws (matches parseTranscriptDir contract)
  • File disappearing between pulses throws on the next call
  • evictParseCache(path) forces a full re-read
  • Old events past the prune threshold drop out of the cached set

The orchestrator's existing 7 tests cover the integration; one test had to verify that "errors are captured" still works — the cache's new "throw on missing file" behaviour preserves it.

Bumped

  • Action ref in README.md + examples/agentpulse-pr-check.yml: @v0.5.6@v0.6.0.
Loading

v0.5.6: JSON read cache + AGENTPULSE_PROFILE dev mode

24 May 02:25
@Conalh Conalh

Choose a tag to compare

Added — mtime-keyed JSON file cache (Gemini #4)

loadAliases and loadExceptions are called on every pulse — with 10 sessions on a 30 s refresh, that's 20+ small JSON file reads per cycle, most returning identical content. v0.5.6 wraps both behind a shared process-wide cache that compares mtime before re-reading.

  • New src/jsonCache.ts exports readJsonCached(path, parse, whenMissing) and invalidateJsonCache(path). On every call: stat the file, compare mtime against cached value; mtime match → return cached parse output, mismatch (or ENOENT vs. present) → re-read + reparse + cache.
  • aliases.ts:readAliasFile + exceptions.ts:loadExceptions now route through it. Same observable behaviour, fewer disk reads.
  • setAlias and appendExceptions invalidate the cache after a successful write so the next read picks up the new content even if mtime resolution lags (Windows + some network filesystems can be ~1 s late).
  • Failure modes (parse error, read error) are cached as null with an mtime sentinel of 0 — so a permanently-broken file doesn't get reparsed every pulse, but a fixed file is detected on the next mtime change.
  • Most visible on Windows where small-file stat+read latency adds up. On a 10-session refresh, this is roughly a ×ばつ reduction in alias/exception disk operations per cycle.

Added — dev-only per-layer profiling (Gemini #10)

Set AGENTPULSE_PROFILE=1 in the environment to log per-layer pulse timings to stderr:

[agentpulse:profile] parse: 142.3ms · enrich: 3.1ms · sequence: 0.4ms · exceptions: 0.2ms · classify: 1.1ms · narrative: 0.1ms
  • Off by default — one env-var read per pulse, zero allocation in the hot path. Reads dynamically (not at module-load) so tests can toggle freely.
  • Goes to stderr because stdout is in the TUI's alt-screen buffer during agentpulse live — any write to stdout there causes whole-window flicker on cmd.exe.
  • Pairs naturally with the incremental-parse work coming in v0.6.0+: the per-layer numbers tell you immediately whether parse: actually dropped after a refactor.

Tests

194 (was 183). Eleven new tests across two new files:

  • jsonCache.test.mjs (9 tests): missing-file → fallback, valid-file → parsed, parser-not-re-invoked-on-cache-hit, mtime-change-forces-reread, malformed-JSON → fallback + cached as missing, missing-then-appearing detected, invalidate() forces re-read, clear() drops everything.
  • cli.test.mjs (2 tests): AGENTPULSE_PROFILE=1 produces the expected stderr line shape AND stdout JSON stays valid; absence of the env var produces no profile line.

Bumped

  • Action ref in README.md + examples/agentpulse-pr-check.yml: @v0.5.5@v0.5.6.
Loading

v0.5.5: CI speedup — parallel --once + Action uses the npm package

24 May 02:25
@Conalh Conalh

Choose a tag to compare

CI speedup batch

Two compounding changes from the external inspection backlog (Gemini #5 + #6). No user-facing behaviour change in the live TUI; the GitHub Action and live --once mode both get visibly faster.

Changed — live --once now runs sessions in parallel

Pre-fix, runOnceMode awaited each session's pulse() sequentially in a for loop. On CI runs against artifact directories with dozens of historical transcripts, this scaled linearly — and CI runs against artifact directories with dozens of historical transcripts is the whole point of --once.

  • New runConcurrent(tasks, concurrency) helper in src/once.ts (exported for direct unit testing). Hand-rolled — 12 lines, zero deps. Spawns up to concurrency workers that pull tasks off the list in order; per-index result assignment preserves the original input order so the text report iterates sessions in their discovery order, same as the pre-fix loop.
  • Default concurrency is 6. Tunable via AP_ONCE_CONCURRENCY=<N> env var for users with unusual CI runner shapes (giant artifacts on slow disks → lower; fast NVMe + many sessions → higher).
  • On a typical run with 24 historical transcripts, CI wall-clock for the analysis step drops from ~24 ×ばつ per-session latency to ~ceil(24/6) ×ばつ per-session latency — roughly ×ばつ faster for that step.

Changed — GitHub Action invokes the published npm package

The composite action used to npm ci --omit=dev && npm run build on every workflow run before invoking AgentPulse. That's 5-15 s of pure overhead and one extra failure mode (a transient npm registry hiccup mid-ci surfaces as "AgentPulse build failed" rather than a real signal).

  • The action now resolves the version from its own checked-out package.json and invokes npx --yes @conalh/agentpulse@${VERSION} live --once .... First call populates the npm cache; second is a cache hit. The action's git ref (e.g. Conalh/AgentPulse@v0.5.5) and the npm version stay in lockstep automatically — no hardcoded version string in action.yml that could drift.
  • The Install AgentPulse dependencies + build step is gone. The actions/setup-node@v4 step stays — npx still needs node, and the runner-default node version isn't guaranteed across runners.
  • The action ref in README.md and examples/agentpulse-pr-check.yml is bumped to @v0.5.5.

Tests

183 (was 179). Four new tests on the runConcurrent helper:

  • Preserves input order in the results array even when tasks resolve in reverse order
  • Caps in-flight tasks at the configured concurrency value
  • Empty task list returns an empty array
  • concurrency = 1 is strictly sequential (degenerate but legal — useful for debugging)

The action.yml change is verified by inspection — the workflow shape doesn't have a unit-test harness, but the script logic is straightforward shell + the existing live --once CLI surface is exhaustively tested.

Loading
Previous 1 3 4 5
Previous

AltStyle によって変換されたページ (->オリジナル) /