Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: kirder24-code/ai-agent-manager

v0.3.1 — loop detection gated on response signature

10 Jun 00:33
@kirder24-code kirder24-code

Choose a tag to compare

Loop detection now distinguishes circling from convergence. A run is only flagged as looping when prompts are near-identical AND the upstream response did not move (same error/output). When the error changes turn to turn, that's progress and the run is left alone.

Also fixes two bugs caught by a new HTTP e2e test through the live gateway: the response signature was recorded after the client reply (lost to a concurrent next turn), and startEphemeralGateway was not exported.

Assets 2
Loading

v0.3.0 - loop detection

07 Jun 01:15
@kirder24-code kirder24-code

Choose a tag to compare

What's new

Loop detection - catches the agent that looks productive but is circling the same failure.

The hard case in stuck-detection is the agent that keeps producing output yet is really retrying the same dead end, just reworded each time. Plain hashing misses it: the prompt is similar but never byte-identical between loops, so the hash changes every turn and nothing trips.

Runcap now closes that gap from the one place that can see it - the gateway, which observes every request the agent sends in real time.

How it works

  • Each request's conversation shape is compared against the recent run with a line-similarity ratio (the same line-diff primitive the v0.2.2 delta-encoder already uses).
  • When N prompts in a row are near-identical (default: 3 prompts at 92%+ similarity) while the conversation never moves forward, the run is flagged loop.looping.
  • It surfaces a warning in runcap status, attaches a loop field to every gateway event, and fires an alert - so you can step in before the loop burns more budget.
  • Pure Node, no model call, single-digit ms. Tune or disable with AIM_LOOP_DETECT=off.

Proven end to end (not estimated)

Four reworded "let me try X instead" prompts pushed through the live gateway:

Prompt # repeats similarity looping
1 0 0% false
2 1 97.7% false
3 2 97.7% false
4 3 97.7% true

The signal escalates (0 → 1 → 2 → 3) instead of firing on a single slow step, and genuine progress or one long legit step never trips it.

How this is better than before

detectStuck in v0.2.2 was outcome-based: it scored a run after it ended (non-zero exit code, parsed errors, zero git diff). That catches obvious dead ends but is blind to an in-flight agent that is still "working" while going nowhere. Loop detection adds the missing behavioral, in-flight signal on top of it - you learn the agent is circling during the run, not in the post-mortem.

Honesty note

This is a calculated signal, not a proven dollar-saving like the delta-encoder. It tells you "the agent has sent N near-identical prompts in a row with no progress" so you can intervene. The token savings claim from v0.2.2 (37.9% on a real call) is unchanged and still the proven number.

Tests

5 new tests in scripts/loop-test.mjs, wired into npm test:

  • reworded same-failure attempts flagged as a loop
  • genuine progress NOT flagged
  • single long step NOT flagged
  • threshold boundary (2 repeats under the bar, not yet a loop)
  • OpenAI / Anthropic request-shape normalization

Install

npm install -g runcap@0.3.0

Full changelog: v0.2.2...v0.3.0

Loading

v0.2.2 - delta-encoding compression

06 Jun 01:57
@kirder24-code kirder24-code

Choose a tag to compare

What's new

Delta-encoding of near-duplicate context blocks - the compression layer no other proxy has.

When an agent reads a file, edits one line, and re-reads it, the block is similar but not identical, so plain identical-dedup saves nothing. Runcap now sends a lossless line-diff against the version the model already saw, and the model reconstructs the current file from it.

Proven on a real call (not estimated)

Two identical requests through the gateway to OpenAI gpt-4o-mini, where the answer depends on the one changed line:

Compression prompt_tokens (billed by OpenAI) Answer
OFF (baseline) 1186 "...returns status code 401"
Delta ON 737 "...returns status code 401"

449 tokens saved = 37.9% on a single edited-file re-read. Identical answer. The model never received the full re-read, only the diff, and still answered correctly about the changed line.

How it stays safe

  • Lossless by construction: the compressor refuses to emit a delta unless it reconstructs the original byte-for-byte.
  • No false positives: unrelated blocks are left verbatim; identical re-reads still collapse to the cheaper stub.
  • Hot-path guard: LCS line-diff is O(n*m), so blocks over 2500 lines are skipped rather than stalling the gateway.
  • 6 tests in scripts/delta-test.mjs, wired into npm test (including a regression test for a crash found and fixed during this work).

Proof and reproduction steps: docs/delta-encoding-evidence.md

Install

npm install -g runcap@0.2.2

Loading

v0.2.1 - launch-ready

05 Jun 03:55
@kirder24-code kirder24-code

Choose a tag to compare

First public release. Runcap caps every coding-agent run before it spends, runs 100% on your machine, MIT.

npm install -g runcap

What's in this release

  • Hard cap, enforced before the call. Point your agent at a local gateway. It prices each request from its own tokens and returns a 429 the moment the projected spend would cross your ceiling, instead of forwarding the call.
  • Pre-call budget guard. Catches both slow accumulated overspend and a single oversized call in one window.
  • Cost estimate before you start. runcap plan outputs a USD cost range and a recommended cap.
  • Built-in token compressor. Compresses JSON, logs, and stack traces in each request before sending. Never touches your prose or code.
  • Rescue prompt when an agent gets stuck.
  • Zero-config run. runcap run auto-manages the cap gateway, no manual gateway or base-URL setup.
  • Guided first run. runcap with no args explains the tool and shows one next step.

Fixes since early builds

  • Fixed doubled /v1 in the OpenAI upstream URL that made the gateway return 404.
  • Dropped em-dashes and now show sub-cent spend correctly in user-facing output.
  • Fixed false "High-risk" flag on small tasks.

Pure Node, ESM, zero runtime dependencies. Nothing leaves your laptop.

Loading

AltStyle によって変換されたページ (->オリジナル) /