Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: mbachaud/helix-context

v0.7.1

09 Jun 23:22
@mbachaud mbachaud
23c65f5
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Patch release: the v0.7.0 headless tray failed to bind the dashboard port under a truly detached pythonw start (sys.stdout/stderr = None kill uvicorn's logging before bind). _ensure_streams() routes stdio into --log-file / devnull (#199). Caught and verified during a live v0.7.0 install. Full details in CHANGELOG.md.

Assets 2
Loading

v0.7.0

09 Jun 21:16
@mbachaud mbachaud
e79d16d
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Dashboard + UX release. The tray-hosted web dashboard graduates from a read-only status page to the primary control surface, and the launcher grows a dev/configuration dual-port mode.

Highlights:

  • Dev mode dual ports (#197): [server] bench_enabled/bench_port/bench_genome_path runs a second supervised helix on :11439 pinned to the bench genome - chat stays on main while a subagent drives the bench-harness. Default off; --bench / HELIX_BENCH_ENABLED=1; final deployments get exactly one server.
  • Startup UX (#197): loading spinner while encoders warm up; first-boot select-or-create database dialog when no genome exists; start-helix-tray.bat goes fully headless (pythonw + --log-file).
  • Dashboard wiring sweep (#195): genome select/create from the web UI; Observability panel with live sidecar service dots + direct links to all six Grafana dashboards and Prometheus.
  • Fresh-checkout boot fix (#195): KnowledgeStore mkdirs the genome parent chain (clean clones no longer die with 'unable to open database file').
  • Grafana sidecar hardening (#195): 127.0.0.1-pinned, anonymous local access (links land on dashboards, not a login wall), phone-home disabled; 'Helix - Overview' is now a real entry-point dashboard.

Full details in CHANGELOG.md.

Loading

v0.6.5

09 Jun 20:09
@mbachaud mbachaud
b3d0c5e
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Eleven PRs landed on top of 0.6.4: the open-PR merge train (#182-#190), the full-suite QA de-flake (#191), and the #165 fingerprint-index audit fixes (#192 MAX_PATH guard, #193 path_key_index Option-B compaction — ~21% corpus storage back, score-invariant).

Highlights:

  • path_key_index: WITHOUT ROWID for new DBs, dead idx_pki_lookup dropped, POST /admin/compact-pki converts existing DBs (then /admin/vacuum). Confirmed on a real v2 Onyx shard: 3.92 GB -> 2.99 GB, 25/25 queries score-identical.
  • Fixture builder: file-level resume + SIGINT pause-then-resume (#183), auto-subshard of oversized roots (#186), BenchServer import-identity guard (#184).
  • Retrieval/bench: SPLADE size-aware auto-toggle knobs (#189), dense_additive_weight sweep harness (#188), GENE src source-type prefix fix (#185).
  • Hardware: GB10/Grace+Blackwell launch-blocking shim, default-off (#190, @addiplus). Docs: <=12 GB VRAM dense-ingest runbook (#182).
  • Test suite de-flaked on Windows dev rigs: GPU-livelock parity test, 200K-write metrics test, README-v3 doc contracts, WSL bash probe (#191).

Full details in CHANGELOG.md.

Contributors

addiplus
Loading

v0.6.4

09 Jun 07:27
@mbachaud mbachaud
27fe5fc
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Master-bound sibling of the v0.6.3 Onyx external-validation snapshot. v0.6.3 stays a tag-only artifact (sibling branch perf/dense-prefilter-via-splade-candidates, not on PyPI); 0.6.4 ships the master-bound subset to the wheel.

Highlights

  • #177 perf(dense): bound CUDA VRAM during batch ingest via periodic empty_cache. HELIX_DENSE_VRAM_RELEASE_EVERY=256 default; holds dense ingest at ~6 GB plateau on a 12 GB 3080 Ti (was climbing to 11.7 GB and spilling to shared-mem).
  • #180 fix(config): wire semantic_dense_additive_weight + semantic_broaden_routing — env-gated by HELIX_SEMANTIC_ARM=1 + query_type == "semantic", default-off so stock retrieval is byte-identical to v0.6.2. Lets downstream consumers running the v0.6.3 fixed-pipeline TOML load on a master-derived build without their config keys silently vanishing.
  • #179 feat(launcher): Manage Database tray + Switchboard + Pipeline viewer. Tray submenu discovers genome .db files, switches via HELIX_GENOME_PATH + restart, with confirmation via Win32 MessageBoxW (avoids tkinter-on-pystray deadlock). Dashboard adds an 11-knob Switchboard panel and a per-request Pipeline viewer (HELIX_PIPELINE_RING=0 to disable). New GET /debug/pipeline/recent + GET /admin/genome.

Full per-PR notes in CHANGELOG.md.

Loading

v0.6.2 — dynamic RAM scaling

30 May 09:40
@mbachaud mbachaud
3173c47
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

RAM-aware SQLite budget — makes the v0.6.1 memory posture host-aware instead of unconditionally conservative.

v0.6.1 hard-coded mmap_size=0 + 2/4 MB page caches on every host (a 100-shard fan-out guard); the BGE-M3 model singleton was the actual 120 GB → 7 GB fix, so that I/O posture only over-throttled RAM-rich hosts. v0.6.2 scales per-shard mmap_size/cache_size from available RAM: budget = (available − 25% reserve) / n_shards.

  • HELIX_MEM_PROFILE: auto (default) · aggressive · conservative (byte-identical to v0.6.1 — escape hatch) · <N>gb.
  • Budget is a fraction of free RAM → can never OOM; self-throttles at high shard count / low RAM.
  • Largest win on RAM-rich hosts (128 GB unified → near-whole 46 GB corpus mmap-resident).
  • 212 unit tests green. End-to-end 100-shard RAM/latency delta validated separately on the bench fixture.

See CHANGELOG.md and docs/prds/2026-05-30-dynamic-ram-scaling.md.

Loading

v0.6.1 — parallel fan-out + daemon RAM collapse

30 May 08:39
@mbachaud mbachaud
b7af5b0
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Performance release. Verified end-to-end on the EnterpriseRAG-Bench v2 fixture (850K genes / 100 shards, HELIX_SHARD_WORKERS=8): resident RAM ~120 GB → 7.1 GB peak, per-query 125s → 57.6s median, 0 daemon deaths, ranked output byte-identical to the serial path.

Highlights

  • perf(memory) #173 — share ONE BGE-M3 model process-wide (get_shared_codec) instead of ~100 per-shard copies; the dominant 100-shard RAM driver. Default on; HELIX_SHARE_DENSE_CODEC=0 reverts. Plus opt-in fp16 dense matrix (HELIX_DENSE_MATRIX_DTYPE=float16), SQLite cache_size caps, explicit mmap_size=0.
  • perf(retrieval) #172 — concurrent shard fan-out via ThreadPoolExecutor (BLAS-pinned), gated by HELIX_SHARD_WORKERS (default 1 = serial). Byte-identical ranked output. Plus a WAL-checkpoint fix on shard close.

Operator notes

  • Both default-safe — 0.6.0 behavior is unchanged until you set HELIX_SHARD_WORKERS>1.
  • Enable the two together: A1 (model singleton) is the prerequisite that lets the fan-out reach full speedup on a memory-bound rig (without it, per-shard model duplication thrashes the pagefile and caps the win at ~1.5x).

Full notes in CHANGELOG.md.

Loading

v0.6.0

28 May 21:12
@mbachaud mbachaud

Choose a tag to compare

0.6.0 — 2026年05月28日

Substantial release covering corpus-scale retrieval, bench rebuilding,
storage audits, and a stack of stability + portability fixes. Headline
work: EnterpriseRAG-Bench (Layer 3) shipped with 100q variant-A
results (recall@10 = 28% on the 850K-gene v2 fixture); two scaling-wall
bugs (regex ReDoS at ingest, SQL-variable cap at retrieval) fixed and
cross-validated on x86 + ARM64 hardware; ~400 lines of new bench docs.

  • fix(tagger): eliminate catastrophic backtracking in _KV_PAIR_PATTERN
    (PR #155 / PR #162).
    Pre-fix, (\w+(?:_\w+)*) had redundant
    nested-quantifier ambiguity that triggered catastrophic backtracking on
    underscore-heavy content. A single worker spinning on
    tagger.py:439's _KV_PAIR_PATTERN.finditer(content[:5000]) hung the
    EnterpriseRAG-Bench-Onyx-full corpus build for 60+ minutes on a single
    google-drive shared-drives file (underscore-rich JSON keys like
    expected_doc_ids, data_source_id). The fix (\w+) is functionally
    identical (same match set, since \w includes _) but has no nested
    quantifier. Verified on the 3 worst-offender files from the bench corpus:
    0.40-0.52 ms each, down from >60 min hung. 200-underscore stress test:
    0.02 ms. Cross-validated under two independent py-spy investigations
    on different hardware classes (x86 Ryzen + RTX 3080 Ti on 2026年05月19日,
    ARM64 Grace + GB10 on 2026年05月27日) — same line, same root cause, same fix.

  • perf(ddl): skip FTS5 orphan cleanup when delta is < 5% of gene count
    (PR #156 / PR #162).
    The previous cleanup ran
    DELETE FROM genes_fts WHERE gene_id NOT IN (SELECT gene_id FROM genes)
    — an O(N·M) correlated subquery that hung the daemon's first-query
    response for hours on the 850K-gene / 105-shard EnterpriseRAG-Bench
    fixture. On a single 18K-gene shard with ~40 orphans (0.2% noise) the
    NOT IN scan pegged a core for 5-10 minutes against a cold OS-cache
    page set. Orphan FTS5 entries are harmless at query time (downstream
    gene_id joins return NULL and filter out before delivery), so
    skipping cleanup for trivial deltas costs nothing in retrieval quality
    and unblocks first-query latency entirely. For the rare significant-drift
    case (delta ≥ 5%), the rewritten query uses indexed NOT EXISTS instead
    of NOT IN, turning O(N2) into O(N log N). Daemon /health response
    on the 850K-gene fixture went from "hangs forever" to milliseconds.

  • feat(build): salvage already-complete shards on rebuild (PR #157 /
    PR #162).
    Adds _try_salvage_complete_shard() which opens an existing
    shard .db read-only, verifies the genes table has 100% dense
    backfill coverage and no live WAL sidecar, and returns the same
    result-dict shape that _build_one_shard would normally produce —
    letting the parent's _commit_shard_result re-register the shard via
    INSERT OR REPLACE. Designed for the kill+restart cycle: if
    build_fixture_matrix.py is interrupted (Ctrl+C, OOM, planned restart),
    fully-complete shards on disk are re-registered in seconds instead of
    rebuilt from scratch. Verified at scale: 21 of 22 already-complete
    shards re-registered into a fresh main.genome.db in 2 min 19 sec
    (vs ~13 hours to rebuild from scratch) during a mid-build restart of
    the EnterpriseRAG-Bench-Onyx-full 850K-gene build.

  • fix(knowledge_store): batch IN-clause queries to stay under SQLite cap
    (PR #163).
    SQLite caps WHERE col IN (?, ?, ...) placeholders at
    SQLITE_LIMIT_VARIABLE_NUMBER (999 legacy, 2000 on the Python 3.12 /
    SQLite 3.50 builds we ship to, 32766 on newer compile defaults). Four
    call sites on the gene_scores fan-out path in query_docs
    (_apply_authority_boosts, sema-boost embedding lookup, party-attribution
    lookup, access-rate epigenetics lookup) build the IN clause from a
    caller-determined candidate set that can exceed the cap in production.
    Observed in the 2026年05月28日 v2-fixture 100q bench: 3 of 29 queries had a
    per-shard query raise OperationalError: too many SQL variables, which
    the daemon's per-shard try/except swallowed as "shard X query failed;
    skipping" — biasing recall@K by silently dropping shards. Variants where
    SPLADE or the prefilter narrows gene_scores don't hit this; only the
    no-filter SPLADE-off path produces sets large enough to blow up. Adds
    _iter_in_batches(items, batch_size=500) helper and refactors the four
    hot sites. Includes TDD'd regression test at
    tests/test_knowledge_store_batched_in.py that probes the runtime cap
    via conn.getlimit(SQLITE_LIMIT_VARIABLE_NUMBER) and exercises at
    cap, cap + 1, and 4*cap + 7 boundaries.

  • fix(mcp): registry handshake is best-effort, don't kill subprocess on
    failure (PR #169).
    On Windows, claude -p MCP attempts were failing
    with "Connection closed" after ~2 s even when helix was alive on
    http://127.0.0.1:11437. Root cause: _register_with_registry() was
    called synchronously before mcp.run() entered the stdio handshake; an
    exception from register_participant() (auto-heartbeat thread init,
    etc.) propagated out of main() and killed the MCP subprocess before
    the host could complete its handshake. The registry is not load-bearing
    for tool calls — tool calls proxy directly to the helix HTTP API.
    Registry is only used by helix_announce + dashboards. This patch
    wraps _register_with_registry() in a try/except inside main():
    happy path unchanged, failure path logs the exception and continues
    to mcp.run() rather than exiting. Closes #167.

  • feat(bench): add --isolated flag to bench_claude_matrix for
    leak-free measurement (PR #170).
    When set, the claude -p sub-agent
    is launched with --tools "" (all built-in tools disabled),
    --strict-mcp-config, and --mcp-config '{"mcpServers":{}}' (no MCP
    servers). Pair with a sterile --cwd (e.g. F:/tmp/bench_sandbox) to
    also block CLAUDE.md auto-discovery. Isolates retrieval-driven answer
    quality from filesystem-tool access. Records isolated + claude_cwd
    in the per-run JSON so post-hoc analysis can distinguish leak-free runs
    from contaminated runs. Brings shipped code into agreement with
    shipped docs (docs/benchmarks/BENCHMARKS.md §"Layer 3 —
    EnterpriseRAG-Bench" and BENCHMARK_RATIONALE.md addendum already
    described this isolation mode). Closes #168.

  • docs(benchmarks): add Layer 3 (EnterpriseRAG-Bench) + EnterpriseRAG
    fixtures (PR #166).
    ~400 lines across four files. BENCHMARKS.md
    gets a new "Layer 3 — EnterpriseRAG-Bench" section covering the
    2026年05月20日→21 bench investigation rebuild (isolated=True mode,
    +32.4 pp helix lift, 65% hallucination reduction), cross-corpus results
    (60% recall@10 @ 10K → 71% @ 50K → 28% @ 850K), the expression-budget
    clamp fix (4%→43% correctness), Wall-1 / Wall-2 scaling-wall framing,
    the v2 100q variant-A result table, and cross-host validation of the
    tagger fix. GENOME_FIXTURE_MATRIX.md gets a new EnterpriseRAG-Bench
    fixtures section (5-row fixture table, shared 9-source-root scope,
    excluded-from-ingest list, auto-subsharding behavior, path-portability
    gotcha, branch/PR routing). BENCHMARK_RATIONALE.md gets an addendum
    on how Layer 3 answered the rationale's NIAH-doesn't-fit problems.
    MULTI_VALID_GOLD.md gets an EnterpriseRAG-Bench gold-path matching
    section (schema diff, _rel_after_sources normalization, prefix-tolerant
    match fix).

Loading

v0.5.0 — Retrieval Stack Upgrade + README v2

09 May 00:02
@mbachaud mbachaud

Choose a tag to compare

What's new in v0.5.0

All new retrieval features are dark-shipped — feature-flagged off by default. Enable them individually as you need them.

Retrieval improvements

  • BM25 pre-filter (tier-0): restricts the scoring corpus to FTS5 top-200 before the 9-tier scorer runs. ×ばつ cheaper SEMA cosine scan, better noise rejection. Enable: bm25_prefilter_enabled = true in helix.toml

  • Sub-query decomposition: broad queries (multi_hop/default) are decomposed into 3 point-fact sub-queries, run in parallel, merged with cross-query hit weighting. Converts 12-gene diluted BROAD results into targeted TIGHT/FOCUSED results. Enable: query_decomposition_enabled = true

  • D8 complete — intent taxonomy + entity graph:

    • IntentClass enum on PromoterTags with heuristic _classify_intent() at ingest
    • intent_router.py for LLM-free template decomposition
    • Entity graph wired as Tier 5b (+0.5 score boost per matched entity node)
    • SR gate-benched at N=50, confirmed live
    • Enable entity graph: entity_graph_retrieval_enabled = true
  • BGE-M3 dense vectors + ANN threshold: bgem3_codec.py with asymmetric query/passage encoding and Matryoshka 256D truncation; query_genes_ann() replaces TIGHT/FOCUSED/BROAD step function with cosine similarity threshold gate. Enable: dense_embedding_enabled = true (run scripts/backfill_bgem3.py first)

Infrastructure

  • helix_context/_asgi.py entry point — server.py is now importable without opening a database connection (fixes pytest collection failures in worktrees and fresh clones)

Documentation

  • README v2: benchmark-led structure, hardware-specified bench data, full 9-tier Mermaid pipeline diagram, collapsible terminal walkthrough
  • docs/api/ — HTTP endpoint and MCP tool reference
  • docs/archive/ — internal research/sprint/design docs reorganised

Benchmarks (Ryzen 7 5800x · 48 GB DDR4 · RTX 3080 Ti 12 GB · gemma4:e4b)

  • Token savings: ×ばつ on WAL checkpoint queries · ×ばつ on port lookups · ×ばつ median across 15 query types
  • GPQA diamond: +4 pp accuracy (Helix ON 26% vs OFF 22%, N=100)
  • Dim-lock axis-4: 34% recall@1 vs 8% single-axis

Breaking changes: none. All new flags default to their previous values.


🤖 Generated with Claude Code

Loading

v0.4.0b1 — pathway identity + packet mode

18 Apr 21:52
@mbachaud mbachaud

Choose a tag to compare

Pathway-layer reframe — Helix weighs, doesn't retrieve

0.4.0b1 is a meaningful identity shift. Helix is no longer framed as a
knowledge store that competes with vector DBs; it's a coordinate
index layer
that emits confidence so agents can decide know-vs-go
without being coaxed into it. Composes on top of the bundled SQLite
genome today, stacks on any content store tomorrow.

Two product surfaces

  • /context/packet — agent-safe index. Returns pointers +
    verified / stale_risk / needs_refresh verdict + refresh plan.
    Caller fetches content. Task-sensitive (plan / explain / review /
    edit / debug / ops / quote).
  • /context — decoder path. Helix assembles + compresses the
    context window. Downstream LLM consumes directly. Unchanged
    behavior.

Weighing layer (the conceptual center of gravity)

×ばつ (freshness ×ばつ authority ×ばつ specificity) = is-it-safe-to-act">
coord_conf ×ばつ (freshness ×ばつ authority ×ばつ specificity) = is-it-safe-to-act
  • freshness_scoreexp(-age / half_life[volatility_class]) with
    stable=7d / medium=12h / hot=15min half-lives
  • authority_score — primary=1.0, derived=0.75, inferred=0.45
  • specificity_score — literal=1.0, span=0.9, doc=0.75,
    assertion=0.45
  • coord confidence — path_token_coverage between query signals
    and delivered gene source paths (hit mean 1.00 vs miss mean 0.52 on
    the 10-needle bench)

Validated via Phase 5 packet bench — 10/10 scenarios pass across 5
families (stale_by_age, coordinate_mismatch, task_sensitivity,
authority_downgrade, clean_verified). See
`benchmarks/bench_packet.py`.

Ingest-time provenance

New columns on `Gene` (auto-populated from file extension at
ingest): `source_kind`, `volatility_class`, `observed_at`,
`last_verified_at`. No backfill needed for new ingests. Existing
genomes can run the one-time `scripts/backfill_gene_provenance.py`
sweep.

New endpoints

  • `POST /context/packet` — the agent-safe index surface
  • `POST /context/refresh-plan` — just the reread plan
  • `POST /fingerprint` — navigation-first retrieval with `score_floor`
    • honest accounting (`evaluated_total`, `above_floor_total`,
      `filtered_by_floor`, `truncated_by_cap`)

New MCP tools

  • `helix_context_packet`
  • `helix_refresh_targets`

Plus the full existing suite (`helix_context`, `helix_stats`,
`helix_ingest`, `helix_resonance`, session/HITL toolkit).

Dep additions

  • `[mcp]` extra — required for `python -m helix_context.mcp_server`
    (closes an import-error gap)
  • `[nli]` extra — standalone torch + transformers for DeBERTa/NLI
    backends
  • `[all]` extra now genuinely complete

Docs v2

  • README v2 — lead with pathway identity, two-surface layout,
    launch modes table, LLM-free pipeline as load-bearing framing
  • docs/architecture/PIPELINE_LANES.md v2 — adds /context/packet,
    /context/refresh-plan, /fingerprint lanes; weighing layer as
    first-class concept
  • New docs/specs/2026-04-17-agent-context-index-build-spec.md
    657-line authoritative packet-mode spec

Migration notes

  • Existing `/context` callers: no action needed. The new
    confidence fields appear on `ContextHealth` additively.
  • MCP hosts: run `pip install helix-context[mcp]` if you use
    `python -m helix_context.mcp_server`.
  • Existing genomes: optional one-time run of
    `python scripts/backfill_gene_provenance.py` to populate
    provenance fields on legacy rows (so packet mode returns
    `verified` instead of `stale_risk`).

Validation

  • 680+ tests pass (one pre-existing test fixed in this release)
  • Phase 5 packet bench: 10/10 across 5 families
  • Dry-run install verified from clean resolution: `.[all]` pulls
    mcp, torch, transformers, sentence-transformers, spacy, tree-sitter,
    opentelemetry stack, headroom-ai[proxy,code].

Powered by Agentome.

Loading

v0.3.0b3 — Ribosome pause + learn() timeout

10 Apr 08:39
@mbachaud mbachaud

Choose a tag to compare

Patch release — VRAM contention survival

Two fixes born from a real incident: an external benchmark run (1000-needle test against qwen3:4b) crashed the Helix server when it competed with the ribosome model (gemma4:e4b) for GPU VRAM and triggered a cascade of httpx timeouts.

What's New

/admin/ribosome/pause — unload the ribosome without killing Helix

New admin endpoints let you disable the ribosome's LLM calls at runtime without restarting the server or losing state:

Endpoint Purpose
POST /admin/ribosome/pause Monkey-patch backend.complete() to raise
POST /admin/ribosome/resume Restore the original backend method
GET /admin/ribosome/status Check if currently paused

How it works: Ribosome.replicate() already has a fallback path that synthesizes a minimal gene from the raw exchange if the LLM call fails. Pausing forces that fallback to engage every time. The ribosome instance stays in memory, the backend connection stays alive — only the complete() method is swapped.

Workflow for benchmark VRAM rescue:
```bash

Free the ribosome model from Ollama

curl -X POST localhost:11437/admin/ribosome/pause
curl -X POST localhost:11434/api/generate
-d '{"model": "gemma4:e4b", "keep_alive": 0, "prompt": ""}'

Run your benchmark against qwen3:4b (or any other model)

python benchmarks/bench_needle_1000.py

Restore normal operation

curl -X POST localhost:11437/admin/ribosome/resume
```

Why this matters: Before this release, pausing the ribosome required either a config change + restart, or manually killing the Python process and relaunching. Both strategies drop in-flight /context requests and break the Continue integration. The pause endpoint is instantaneous and non-disruptive — /context queries continue to work unchanged because they don't currently route through the ribosome (ingestion uses CpuTagger, rerank is disabled, splice is a no-op in the current code path).

learn() timeout wrapper

HelixContextManager.learn() now wraps the ribosome.replicate() call in a ThreadPoolExecutor with a 15-second timeout.

Root cause of the original crash: During the n1000 benchmark, a background learn() task fired for each completed proxy request. learn() called ribosome.replicate() which went through httpx to Ollama, which was busy serving the benchmark's qwen3:4b inference requests. The learn() call sat in Ollama's request queue for over 120 seconds and eventually hit httpx's ReadTimeout, which propagated up and crashed the server.

Fix: If replicate() doesn't return within 15 seconds, learn() cancels the future, synthesizes the same minimal gene that Ribosome.replicate()'s existing fallback path produces, and moves on. The background task drops cleanly instead of blocking indefinitely.

Signature change: learn(query, response, timeout_s: float = 15.0). Backward-compatible — old two-arg callers still work with the default timeout.

Validation

  • 179 tests passing (full suite)
  • Server restarted cleanly on new code during live session
  • /admin/ribosome/pause confirmed working against the live genome (7,380 genes)
  • gemma4:e4b unloaded from Ollama mid-session, VRAM freed
  • /context queries continued returning correct results with the ribosome paused (coverage=1.0, ellipticity=0.645)
  • /admin/ribosome/resume not yet tested end-to-end (will be on next benchmark completion)

Migration notes

No migration needed. The new endpoints are additive, the learn() signature change is backward-compatible, and the default behavior is unchanged unless you explicitly call /admin/ribosome/pause.

🤖 Generated with Claude Code

Loading
Previous 1
Previous

AltStyle によって変換されたページ (->オリジナル) /