Releases: kirder24-code/ai-agent-manager

v0.3.1 — loop detection gated on response signature

10 Jun 00:33

@kirder24-code kirder24-code

v0.3.1

e2f52af

v0.3.1 — loop detection gated on response signature Latest

Loop detection now distinguishes circling from convergence. A run is only flagged as looping when prompts are near-identical AND the upstream response did not move (same error/output). When the error changes turn to turn, that's progress and the run is left alone.

Also fixes two bugs caught by a new HTTP e2e test through the live gateway: the response signature was recorded after the client reply (lost to a concurrent next turn), and startEphemeralGateway was not exported.

Assets 2

v0.3.0 - loop detection

07 Jun 01:15

@kirder24-code kirder24-code

v0.3.0

5491781

v0.3.0 - loop detection

What's new

Loop detection - catches the agent that looks productive but is circling the same failure.

The hard case in stuck-detection is the agent that keeps producing output yet is really retrying the same dead end, just reworded each time. Plain hashing misses it: the prompt is similar but never byte-identical between loops, so the hash changes every turn and nothing trips.

Runcap now closes that gap from the one place that can see it - the gateway, which observes every request the agent sends in real time.

How it works

Each request's conversation shape is compared against the recent run with a line-similarity ratio (the same line-diff primitive the v0.2.2 delta-encoder already uses).
When N prompts in a row are near-identical (default: 3 prompts at 92%+ similarity) while the conversation never moves forward, the run is flagged loop.looping.
It surfaces a warning in runcap status, attaches a loop field to every gateway event, and fires an alert - so you can step in before the loop burns more budget.
Pure Node, no model call, single-digit ms. Tune or disable with AIM_LOOP_DETECT=off.

Proven end to end (not estimated)

Four reworded "let me try X instead" prompts pushed through the live gateway:

Prompt #	repeats	similarity	looping
1	0	0%	false
2	1	97.7%	false
3	2	97.7%	false
4	3	97.7%	true

The signal escalates (0 → 1 → 2 → 3) instead of firing on a single slow step, and genuine progress or one long legit step never trips it.

How this is better than before

detectStuck in v0.2.2 was outcome-based: it scored a run after it ended (non-zero exit code, parsed errors, zero git diff). That catches obvious dead ends but is blind to an in-flight agent that is still "working" while going nowhere. Loop detection adds the missing behavioral, in-flight signal on top of it - you learn the agent is circling during the run, not in the post-mortem.

Honesty note

This is a calculated signal, not a proven dollar-saving like the delta-encoder. It tells you "the agent has sent N near-identical prompts in a row with no progress" so you can intervene. The token savings claim from v0.2.2 (37.9% on a real call) is unchanged and still the proven number.

Tests

5 new tests in scripts/loop-test.mjs, wired into npm test:

reworded same-failure attempts flagged as a loop
genuine progress NOT flagged
single long step NOT flagged
threshold boundary (2 repeats under the bar, not yet a loop)
OpenAI / Anthropic request-shape normalization

Install

npm install -g runcap@0.3.0

Full changelog: v0.2.2...v0.3.0

Assets 2

v0.2.2 - delta-encoding compression

06 Jun 01:57

@kirder24-code kirder24-code

v0.2.2

f7a334d

v0.2.2 - delta-encoding compression

What's new

Delta-encoding of near-duplicate context blocks - the compression layer no other proxy has.

When an agent reads a file, edits one line, and re-reads it, the block is similar but not identical, so plain identical-dedup saves nothing. Runcap now sends a lossless line-diff against the version the model already saw, and the model reconstructs the current file from it.

Proven on a real call (not estimated)

Two identical requests through the gateway to OpenAI gpt-4o-mini, where the answer depends on the one changed line:

Compression	prompt_tokens (billed by OpenAI)	Answer
OFF (baseline)	1186	"...returns status code 401"
Delta ON	737	"...returns status code 401"

449 tokens saved = 37.9% on a single edited-file re-read. Identical answer. The model never received the full re-read, only the diff, and still answered correctly about the changed line.

How it stays safe

Lossless by construction: the compressor refuses to emit a delta unless it reconstructs the original byte-for-byte.
No false positives: unrelated blocks are left verbatim; identical re-reads still collapse to the cheaper stub.
Hot-path guard: LCS line-diff is O(n*m), so blocks over 2500 lines are skipped rather than stalling the gateway.
6 tests in scripts/delta-test.mjs, wired into npm test (including a regression test for a crash found and fixed during this work).

Proof and reproduction steps: docs/delta-encoding-evidence.md

Install

npm install -g runcap@0.2.2

Assets 2

v0.2.1 - launch-ready

05 Jun 03:55

@kirder24-code kirder24-code

v0.2.1

7911659

v0.2.1 - launch-ready

First public release. Runcap caps every coding-agent run before it spends, runs 100% on your machine, MIT.

npm install -g runcap

What's in this release

Hard cap, enforced before the call. Point your agent at a local gateway. It prices each request from its own tokens and returns a 429 the moment the projected spend would cross your ceiling, instead of forwarding the call.
Pre-call budget guard. Catches both slow accumulated overspend and a single oversized call in one window.
Cost estimate before you start. runcap plan outputs a USD cost range and a recommended cap.
Built-in token compressor. Compresses JSON, logs, and stack traces in each request before sending. Never touches your prose or code.
Rescue prompt when an agent gets stuck.
Zero-config run. runcap run auto-manages the cap gateway, no manual gateway or base-URL setup.
Guided first run. runcap with no args explains the tool and shows one next step.

Fixes since early builds

Fixed doubled /v1 in the OpenAI upstream URL that made the gateway return 404.
Dropped em-dashes and now show sub-cent spend correctly in user-facing output.
Fixed false "High-risk" flag on small tasks.

Pure Node, ESM, zero runtime dependencies. Nothing leaves your laptop.

Assets 2

Releases: kirder24-code/ai-agent-manager

v0.3.1 — loop detection gated on response signature

Uh oh!

v0.3.0 - loop detection

What's new

How it works

Proven end to end (not estimated)

How this is better than before

Honesty note

Tests

Install

Uh oh!

v0.2.2 - delta-encoding compression

What's new

Proven on a real call (not estimated)

How it stays safe

Install

Uh oh!

v0.2.1 - launch-ready

What's in this release

Fixes since early builds

Uh oh!