-
Notifications
You must be signed in to change notification settings - Fork 1
Releases: mbachaud/helix-context
v0.7.1
23c65f5 Patch release: the v0.7.0 headless tray failed to bind the dashboard port under a truly detached pythonw start (sys.stdout/stderr = None kill uvicorn's logging before bind). _ensure_streams() routes stdio into --log-file / devnull (#199). Caught and verified during a live v0.7.0 install. Full details in CHANGELOG.md.
Assets 2
v0.7.0
e79d16d Dashboard + UX release. The tray-hosted web dashboard graduates from a read-only status page to the primary control surface, and the launcher grows a dev/configuration dual-port mode.
Highlights:
- Dev mode dual ports (#197): [server] bench_enabled/bench_port/bench_genome_path runs a second supervised helix on :11439 pinned to the bench genome - chat stays on main while a subagent drives the bench-harness. Default off; --bench / HELIX_BENCH_ENABLED=1; final deployments get exactly one server.
- Startup UX (#197): loading spinner while encoders warm up; first-boot select-or-create database dialog when no genome exists; start-helix-tray.bat goes fully headless (pythonw + --log-file).
- Dashboard wiring sweep (#195): genome select/create from the web UI; Observability panel with live sidecar service dots + direct links to all six Grafana dashboards and Prometheus.
- Fresh-checkout boot fix (#195): KnowledgeStore mkdirs the genome parent chain (clean clones no longer die with 'unable to open database file').
- Grafana sidecar hardening (#195): 127.0.0.1-pinned, anonymous local access (links land on dashboards, not a login wall), phone-home disabled; 'Helix - Overview' is now a real entry-point dashboard.
Full details in CHANGELOG.md.
Assets 2
v0.6.5
b3d0c5e Eleven PRs landed on top of 0.6.4: the open-PR merge train (#182-#190), the full-suite QA de-flake (#191), and the #165 fingerprint-index audit fixes (#192 MAX_PATH guard, #193 path_key_index Option-B compaction — ~21% corpus storage back, score-invariant).
Highlights:
- path_key_index: WITHOUT ROWID for new DBs, dead idx_pki_lookup dropped, POST /admin/compact-pki converts existing DBs (then /admin/vacuum). Confirmed on a real v2 Onyx shard: 3.92 GB -> 2.99 GB, 25/25 queries score-identical.
- Fixture builder: file-level resume + SIGINT pause-then-resume (#183), auto-subshard of oversized roots (#186), BenchServer import-identity guard (#184).
- Retrieval/bench: SPLADE size-aware auto-toggle knobs (#189), dense_additive_weight sweep harness (#188), GENE src source-type prefix fix (#185).
- Hardware: GB10/Grace+Blackwell launch-blocking shim, default-off (#190, @addiplus). Docs: <=12 GB VRAM dense-ingest runbook (#182).
- Test suite de-flaked on Windows dev rigs: GPU-livelock parity test, 200K-write metrics test, README-v3 doc contracts, WSL bash probe (#191).
Full details in CHANGELOG.md.
Assets 2
v0.6.4
27fe5fc Master-bound sibling of the v0.6.3 Onyx external-validation snapshot. v0.6.3 stays a tag-only artifact (sibling branch perf/dense-prefilter-via-splade-candidates, not on PyPI); 0.6.4 ships the master-bound subset to the wheel.
Highlights
- #177 perf(dense): bound CUDA VRAM during batch ingest via periodic
empty_cache.HELIX_DENSE_VRAM_RELEASE_EVERY=256default; holds dense ingest at ~6 GB plateau on a 12 GB 3080 Ti (was climbing to 11.7 GB and spilling to shared-mem). - #180 fix(config): wire
semantic_dense_additive_weight+semantic_broaden_routing— env-gated byHELIX_SEMANTIC_ARM=1+query_type == "semantic", default-off so stock retrieval is byte-identical to v0.6.2. Lets downstream consumers running the v0.6.3 fixed-pipeline TOML load on a master-derived build without their config keys silently vanishing. - #179 feat(launcher): Manage Database tray + Switchboard + Pipeline viewer. Tray submenu discovers genome
.dbfiles, switches viaHELIX_GENOME_PATH+ restart, with confirmation via Win32MessageBoxW(avoids tkinter-on-pystray deadlock). Dashboard adds an 11-knob Switchboard panel and a per-request Pipeline viewer (HELIX_PIPELINE_RING=0to disable). NewGET /debug/pipeline/recent+GET /admin/genome.
Full per-PR notes in CHANGELOG.md.
Assets 2
v0.6.2 — dynamic RAM scaling
3173c47 RAM-aware SQLite budget — makes the v0.6.1 memory posture host-aware instead of unconditionally conservative.
v0.6.1 hard-coded mmap_size=0 + 2/4 MB page caches on every host (a 100-shard fan-out guard); the BGE-M3 model singleton was the actual 120 GB → 7 GB fix, so that I/O posture only over-throttled RAM-rich hosts. v0.6.2 scales per-shard mmap_size/cache_size from available RAM: budget = (available − 25% reserve) / n_shards.
HELIX_MEM_PROFILE:auto(default) ·aggressive·conservative(byte-identical to v0.6.1 — escape hatch) ·<N>gb.- Budget is a fraction of free RAM → can never OOM; self-throttles at high shard count / low RAM.
- Largest win on RAM-rich hosts (128 GB unified → near-whole 46 GB corpus mmap-resident).
- 212 unit tests green. End-to-end 100-shard RAM/latency delta validated separately on the bench fixture.
See CHANGELOG.md and docs/prds/2026-05-30-dynamic-ram-scaling.md.
Assets 2
v0.6.1 — parallel fan-out + daemon RAM collapse
b7af5b0 Performance release. Verified end-to-end on the EnterpriseRAG-Bench v2 fixture (850K genes / 100 shards, HELIX_SHARD_WORKERS=8): resident RAM ~120 GB → 7.1 GB peak, per-query 125s → 57.6s median, 0 daemon deaths, ranked output byte-identical to the serial path.
Highlights
- perf(memory) #173 — share ONE BGE-M3 model process-wide (
get_shared_codec) instead of ~100 per-shard copies; the dominant 100-shard RAM driver. Default on;HELIX_SHARE_DENSE_CODEC=0reverts. Plus opt-in fp16 dense matrix (HELIX_DENSE_MATRIX_DTYPE=float16), SQLitecache_sizecaps, explicitmmap_size=0. - perf(retrieval) #172 — concurrent shard fan-out via ThreadPoolExecutor (BLAS-pinned), gated by
HELIX_SHARD_WORKERS(default 1 = serial). Byte-identical ranked output. Plus a WAL-checkpoint fix on shard close.
Operator notes
- Both default-safe — 0.6.0 behavior is unchanged until you set
HELIX_SHARD_WORKERS>1. - Enable the two together: A1 (model singleton) is the prerequisite that lets the fan-out reach full speedup on a memory-bound rig (without it, per-shard model duplication thrashes the pagefile and caps the win at ~1.5x).
Full notes in CHANGELOG.md.
Assets 2
v0.6.0
0.6.0 — 2026年05月28日
Substantial release covering corpus-scale retrieval, bench rebuilding,
storage audits, and a stack of stability + portability fixes. Headline
work: EnterpriseRAG-Bench (Layer 3) shipped with 100q variant-A
results (recall@10 = 28% on the 850K-gene v2 fixture); two scaling-wall
bugs (regex ReDoS at ingest, SQL-variable cap at retrieval) fixed and
cross-validated on x86 + ARM64 hardware; ~400 lines of new bench docs.
-
fix(tagger): eliminate catastrophic backtracking in
_KV_PAIR_PATTERN
(PR #155 / PR #162). Pre-fix,(\w+(?:_\w+)*)had redundant
nested-quantifier ambiguity that triggered catastrophic backtracking on
underscore-heavy content. A single worker spinning on
tagger.py:439's_KV_PAIR_PATTERN.finditer(content[:5000])hung the
EnterpriseRAG-Bench-Onyx-full corpus build for 60+ minutes on a single
google-drive shared-drives file (underscore-rich JSON keys like
expected_doc_ids,data_source_id). The fix(\w+)is functionally
identical (same match set, since\wincludes_) but has no nested
quantifier. Verified on the 3 worst-offender files from the bench corpus:
0.40-0.52 ms each, down from >60 min hung. 200-underscore stress test:
0.02 ms. Cross-validated under two independent py-spy investigations
on different hardware classes (x86 Ryzen + RTX 3080 Ti on 2026年05月19日,
ARM64 Grace + GB10 on 2026年05月27日) — same line, same root cause, same fix. -
perf(ddl): skip FTS5 orphan cleanup when delta is < 5% of gene count
(PR #156 / PR #162). The previous cleanup ran
DELETE FROM genes_fts WHERE gene_id NOT IN (SELECT gene_id FROM genes)
— an O(N·M) correlated subquery that hung the daemon's first-query
response for hours on the 850K-gene / 105-shard EnterpriseRAG-Bench
fixture. On a single 18K-gene shard with ~40 orphans (0.2% noise) the
NOT INscan pegged a core for 5-10 minutes against a cold OS-cache
page set. Orphan FTS5 entries are harmless at query time (downstream
gene_idjoins return NULL and filter out before delivery), so
skipping cleanup for trivial deltas costs nothing in retrieval quality
and unblocks first-query latency entirely. For the rare significant-drift
case (delta ≥ 5%), the rewritten query uses indexedNOT EXISTSinstead
ofNOT IN, turning O(N2) into O(N log N). Daemon/healthresponse
on the 850K-gene fixture went from "hangs forever" to milliseconds. -
feat(build): salvage already-complete shards on rebuild (PR #157 /
PR #162). Adds_try_salvage_complete_shard()which opens an existing
shard.dbread-only, verifies thegenestable has 100% dense
backfill coverage and no live WAL sidecar, and returns the same
result-dict shape that_build_one_shardwould normally produce —
letting the parent's_commit_shard_resultre-register the shard via
INSERT OR REPLACE. Designed for the kill+restart cycle: if
build_fixture_matrix.pyis interrupted (Ctrl+C, OOM, planned restart),
fully-complete shards on disk are re-registered in seconds instead of
rebuilt from scratch. Verified at scale: 21 of 22 already-complete
shards re-registered into a freshmain.genome.dbin 2 min 19 sec
(vs ~13 hours to rebuild from scratch) during a mid-build restart of
the EnterpriseRAG-Bench-Onyx-full 850K-gene build. -
fix(knowledge_store): batch IN-clause queries to stay under SQLite cap
(PR #163). SQLite capsWHERE col IN (?, ?, ...)placeholders at
SQLITE_LIMIT_VARIABLE_NUMBER(999 legacy, 2000 on the Python 3.12 /
SQLite 3.50 builds we ship to, 32766 on newer compile defaults). Four
call sites on thegene_scoresfan-out path inquery_docs
(_apply_authority_boosts, sema-boost embedding lookup, party-attribution
lookup, access-rate epigenetics lookup) build the IN clause from a
caller-determined candidate set that can exceed the cap in production.
Observed in the 2026年05月28日 v2-fixture 100q bench: 3 of 29 queries had a
per-shard query raiseOperationalError: too many SQL variables, which
the daemon's per-shard try/except swallowed as "shard X query failed;
skipping" — biasing recall@K by silently dropping shards. Variants where
SPLADE or the prefilter narrowsgene_scoresdon't hit this; only the
no-filter SPLADE-off path produces sets large enough to blow up. Adds
_iter_in_batches(items, batch_size=500)helper and refactors the four
hot sites. Includes TDD'd regression test at
tests/test_knowledge_store_batched_in.pythat probes the runtime cap
viaconn.getlimit(SQLITE_LIMIT_VARIABLE_NUMBER)and exercises at
cap,cap + 1, and4*cap + 7boundaries. -
fix(mcp): registry handshake is best-effort, don't kill subprocess on
failure (PR #169). On Windows,claude -pMCP attempts were failing
with "Connection closed" after ~2 s even when helix was alive on
http://127.0.0.1:11437. Root cause:_register_with_registry()was
called synchronously beforemcp.run()entered the stdio handshake; an
exception fromregister_participant()(auto-heartbeat thread init,
etc.) propagated out ofmain()and killed the MCP subprocess before
the host could complete its handshake. The registry is not load-bearing
for tool calls — tool calls proxy directly to the helix HTTP API.
Registry is only used byhelix_announce+ dashboards. This patch
wraps_register_with_registry()in a try/except insidemain():
happy path unchanged, failure path logs the exception and continues
tomcp.run()rather than exiting. Closes #167. -
feat(bench): add
--isolatedflag tobench_claude_matrixfor
leak-free measurement (PR #170). When set, theclaude -psub-agent
is launched with--tools ""(all built-in tools disabled),
--strict-mcp-config, and--mcp-config '{"mcpServers":{}}'(no MCP
servers). Pair with a sterile--cwd(e.g.F:/tmp/bench_sandbox) to
also block CLAUDE.md auto-discovery. Isolates retrieval-driven answer
quality from filesystem-tool access. Recordsisolated+claude_cwd
in the per-run JSON so post-hoc analysis can distinguish leak-free runs
from contaminated runs. Brings shipped code into agreement with
shipped docs (docs/benchmarks/BENCHMARKS.md§"Layer 3 —
EnterpriseRAG-Bench" andBENCHMARK_RATIONALE.mdaddendum already
described this isolation mode). Closes #168. -
docs(benchmarks): add Layer 3 (EnterpriseRAG-Bench) + EnterpriseRAG
fixtures (PR #166). ~400 lines across four files.BENCHMARKS.md
gets a new "Layer 3 — EnterpriseRAG-Bench" section covering the
2026年05月20日→21 bench investigation rebuild (isolated=Truemode,
+32.4 pp helix lift, 65% hallucination reduction), cross-corpus results
(60% recall@10 @ 10K → 71% @ 50K → 28% @ 850K), the expression-budget
clamp fix (4%→43% correctness), Wall-1 / Wall-2 scaling-wall framing,
the v2 100q variant-A result table, and cross-host validation of the
tagger fix.GENOME_FIXTURE_MATRIX.mdgets a new EnterpriseRAG-Bench
fixtures section (5-row fixture table, shared 9-source-root scope,
excluded-from-ingest list, auto-subsharding behavior, path-portability
gotcha, branch/PR routing).BENCHMARK_RATIONALE.mdgets an addendum
on how Layer 3 answered the rationale's NIAH-doesn't-fit problems.
MULTI_VALID_GOLD.mdgets an EnterpriseRAG-Bench gold-path matching
section (schema diff,_rel_after_sourcesnormalization, prefix-tolerant
match fix).
Assets 2
v0.5.0 — Retrieval Stack Upgrade + README v2
What's new in v0.5.0
All new retrieval features are dark-shipped — feature-flagged off by default. Enable them individually as you need them.
Retrieval improvements
-
BM25 pre-filter (tier-0): restricts the scoring corpus to FTS5 top-200 before the 9-tier scorer runs. ×ばつ cheaper SEMA cosine scan, better noise rejection. Enable:
bm25_prefilter_enabled = trueinhelix.toml -
Sub-query decomposition: broad queries (
multi_hop/default) are decomposed into 3 point-fact sub-queries, run in parallel, merged with cross-query hit weighting. Converts 12-gene diluted BROAD results into targeted TIGHT/FOCUSED results. Enable:query_decomposition_enabled = true -
D8 complete — intent taxonomy + entity graph:
IntentClassenum onPromoterTagswith heuristic_classify_intent()at ingestintent_router.pyfor LLM-free template decomposition- Entity graph wired as Tier 5b (+0.5 score boost per matched entity node)
- SR gate-benched at N=50, confirmed live
- Enable entity graph:
entity_graph_retrieval_enabled = true
-
BGE-M3 dense vectors + ANN threshold:
bgem3_codec.pywith asymmetric query/passage encoding and Matryoshka 256D truncation;query_genes_ann()replaces TIGHT/FOCUSED/BROAD step function with cosine similarity threshold gate. Enable:dense_embedding_enabled = true(runscripts/backfill_bgem3.pyfirst)
Infrastructure
helix_context/_asgi.pyentry point —server.pyis now importable without opening a database connection (fixes pytest collection failures in worktrees and fresh clones)
Documentation
- README v2: benchmark-led structure, hardware-specified bench data, full 9-tier Mermaid pipeline diagram, collapsible terminal walkthrough
docs/api/— HTTP endpoint and MCP tool referencedocs/archive/— internal research/sprint/design docs reorganised
Benchmarks (Ryzen 7 5800x · 48 GB DDR4 · RTX 3080 Ti 12 GB · gemma4:e4b)
- Token savings: ×ばつ on WAL checkpoint queries · ×ばつ on port lookups · ×ばつ median across 15 query types
- GPQA diamond: +4 pp accuracy (Helix ON 26% vs OFF 22%, N=100)
- Dim-lock axis-4: 34% recall@1 vs 8% single-axis
Breaking changes: none. All new flags default to their previous values.
🤖 Generated with Claude Code
Assets 2
v0.4.0b1 — pathway identity + packet mode
Pathway-layer reframe — Helix weighs, doesn't retrieve
0.4.0b1 is a meaningful identity shift. Helix is no longer framed as a
knowledge store that competes with vector DBs; it's a coordinate
index layer that emits confidence so agents can decide know-vs-go
without being coaxed into it. Composes on top of the bundled SQLite
genome today, stacks on any content store tomorrow.
Two product surfaces
/context/packet— agent-safe index. Returns pointers +
verified / stale_risk / needs_refreshverdict + refresh plan.
Caller fetches content. Task-sensitive (plan / explain / review /
edit / debug / ops / quote)./context— decoder path. Helix assembles + compresses the
context window. Downstream LLM consumes directly. Unchanged
behavior.
Weighing layer (the conceptual center of gravity)
coord_conf ×ばつ (freshness ×ばつ authority ×ばつ specificity) = is-it-safe-to-act
- freshness_score —
exp(-age / half_life[volatility_class])with
stable=7d / medium=12h / hot=15min half-lives - authority_score — primary=1.0, derived=0.75, inferred=0.45
- specificity_score — literal=1.0, span=0.9, doc=0.75,
assertion=0.45 - coord confidence — path_token_coverage between query signals
and delivered gene source paths (hit mean 1.00 vs miss mean 0.52 on
the 10-needle bench)
Validated via Phase 5 packet bench — 10/10 scenarios pass across 5
families (stale_by_age, coordinate_mismatch, task_sensitivity,
authority_downgrade, clean_verified). See
`benchmarks/bench_packet.py`.
Ingest-time provenance
New columns on `Gene` (auto-populated from file extension at
ingest): `source_kind`, `volatility_class`, `observed_at`,
`last_verified_at`. No backfill needed for new ingests. Existing
genomes can run the one-time `scripts/backfill_gene_provenance.py`
sweep.
New endpoints
- `POST /context/packet` — the agent-safe index surface
- `POST /context/refresh-plan` — just the reread plan
- `POST /fingerprint` — navigation-first retrieval with `score_floor`
- honest accounting (`evaluated_total`, `above_floor_total`,
`filtered_by_floor`, `truncated_by_cap`)
- honest accounting (`evaluated_total`, `above_floor_total`,
New MCP tools
- `helix_context_packet`
- `helix_refresh_targets`
Plus the full existing suite (`helix_context`, `helix_stats`,
`helix_ingest`, `helix_resonance`, session/HITL toolkit).
Dep additions
- `[mcp]` extra — required for `python -m helix_context.mcp_server`
(closes an import-error gap) - `[nli]` extra — standalone torch + transformers for DeBERTa/NLI
backends - `[all]` extra now genuinely complete
Docs v2
- README v2 — lead with pathway identity, two-surface layout,
launch modes table, LLM-free pipeline as load-bearing framing - docs/architecture/PIPELINE_LANES.md v2 — adds /context/packet,
/context/refresh-plan, /fingerprint lanes; weighing layer as
first-class concept - New docs/specs/2026-04-17-agent-context-index-build-spec.md —
657-line authoritative packet-mode spec
Migration notes
- Existing `/context` callers: no action needed. The new
confidence fields appear on `ContextHealth` additively. - MCP hosts: run `pip install helix-context[mcp]` if you use
`python -m helix_context.mcp_server`. - Existing genomes: optional one-time run of
`python scripts/backfill_gene_provenance.py` to populate
provenance fields on legacy rows (so packet mode returns
`verified` instead of `stale_risk`).
Validation
- 680+ tests pass (one pre-existing test fixed in this release)
- Phase 5 packet bench: 10/10 across 5 families
- Dry-run install verified from clean resolution: `.[all]` pulls
mcp, torch, transformers, sentence-transformers, spacy, tree-sitter,
opentelemetry stack, headroom-ai[proxy,code].
Powered by Agentome.
Assets 2
v0.3.0b3 — Ribosome pause + learn() timeout
Patch release — VRAM contention survival
Two fixes born from a real incident: an external benchmark run (1000-needle test against qwen3:4b) crashed the Helix server when it competed with the ribosome model (gemma4:e4b) for GPU VRAM and triggered a cascade of httpx timeouts.
What's New
/admin/ribosome/pause — unload the ribosome without killing Helix
New admin endpoints let you disable the ribosome's LLM calls at runtime without restarting the server or losing state:
| Endpoint | Purpose |
|---|---|
POST /admin/ribosome/pause |
Monkey-patch backend.complete() to raise |
POST /admin/ribosome/resume |
Restore the original backend method |
GET /admin/ribosome/status |
Check if currently paused |
How it works: Ribosome.replicate() already has a fallback path that synthesizes a minimal gene from the raw exchange if the LLM call fails. Pausing forces that fallback to engage every time. The ribosome instance stays in memory, the backend connection stays alive — only the complete() method is swapped.
Workflow for benchmark VRAM rescue:
```bash
Free the ribosome model from Ollama
curl -X POST localhost:11437/admin/ribosome/pause
curl -X POST localhost:11434/api/generate
-d '{"model": "gemma4:e4b", "keep_alive": 0, "prompt": ""}'
Run your benchmark against qwen3:4b (or any other model)
python benchmarks/bench_needle_1000.py
Restore normal operation
curl -X POST localhost:11437/admin/ribosome/resume
```
Why this matters: Before this release, pausing the ribosome required either a config change + restart, or manually killing the Python process and relaunching. Both strategies drop in-flight /context requests and break the Continue integration. The pause endpoint is instantaneous and non-disruptive — /context queries continue to work unchanged because they don't currently route through the ribosome (ingestion uses CpuTagger, rerank is disabled, splice is a no-op in the current code path).
learn() timeout wrapper
HelixContextManager.learn() now wraps the ribosome.replicate() call in a ThreadPoolExecutor with a 15-second timeout.
Root cause of the original crash: During the n1000 benchmark, a background learn() task fired for each completed proxy request. learn() called ribosome.replicate() which went through httpx to Ollama, which was busy serving the benchmark's qwen3:4b inference requests. The learn() call sat in Ollama's request queue for over 120 seconds and eventually hit httpx's ReadTimeout, which propagated up and crashed the server.
Fix: If replicate() doesn't return within 15 seconds, learn() cancels the future, synthesizes the same minimal gene that Ribosome.replicate()'s existing fallback path produces, and moves on. The background task drops cleanly instead of blocking indefinitely.
Signature change: learn(query, response, timeout_s: float = 15.0). Backward-compatible — old two-arg callers still work with the default timeout.
Validation
- 179 tests passing (full suite)
- Server restarted cleanly on new code during live session
/admin/ribosome/pauseconfirmed working against the live genome (7,380 genes)- gemma4:e4b unloaded from Ollama mid-session, VRAM freed
/contextqueries continued returning correct results with the ribosome paused (coverage=1.0, ellipticity=0.645)/admin/ribosome/resumenot yet tested end-to-end (will be on next benchmark completion)
Migration notes
No migration needed. The new endpoints are additive, the learn() signature change is backward-compatible, and the default behavior is unchanged unless you explicitly call /admin/ribosome/pause.
🤖 Generated with Claude Code