-
Notifications
You must be signed in to change notification settings - Fork 0
Memory and Knowledge
blumi is built for long-running and distributed agentic work, which needs three things most assistants lack: memory that survives a crash, memory that learns and travels across machines, and a real understanding of your codebase. blumi ships all of it local-first — a bundled embedding model means no API key and nothing leaves your machines.
This page covers durable execution, semantic long-term memory, SEDM cross-peer diffusion, and the native code knowledge base — plus how to configure, drive, and test each.
| Capability | What it does | Drive it with |
|---|---|---|
| Durable execution | Checkpoints each tool step; resumes mid-turn after a crash/restart | automatic |
| Semantic memory (RAG) | Recalls relevant memories each turn (vectors, FTS5 fallback) | automatic + memory tool |
| SEDM governance | Dedup on write, utility scoring, consolidation/eviction | automatic |
| Cross-peer diffusion | Spreads worth-sharing memories across the grid | automatic (grid + diffuse) |
| Code knowledge base | Index a repo; hybrid vector/FTS5 code search |
blumi knowledge, code_search/code_retrieve, blugo Code tab |
| GPU acceleration | Runs the bundled embedder on Apple CoreML / NVIDIA CUDA when present | automatic · blumi accel doctor · /accel
|
| Self-healing | Classifies tool failures → budgeted recovery; learns failure→fix episodes | automatic · heal config · /heal
|
Everything is on by default. The first memory/knowledge operation downloads the embedding model
(~130 MB, bge-small-en-v1.5) once into ~/.blumi/models; after that it runs fully offline. If a
node can't reach the network, memory/knowledge transparently fall back to keyword (FTS5) search.
Every tool step within a turn is checkpointed to SQLite (checkpoints table in blumi.db). If the
process crashes or the gateway is restarted mid-turn, the next run resumes from the last completed
step instead of replaying the whole turn — the backbone for long autonomous runs (blumi loop,
grid-dispatched work). Resume is at-least-once: idempotent tools + the doom-loop guard keep it safe.
Nothing to configure. To see it: start a multi-step task, restart the gateway
(launchctl kickstart -k gui/$(id -u)/com.blumi.serve) mid-run, reconnect — it picks up where it
left off.
A vector store (memories + memory_vectors + memories_fts in blumi.db) lets blumi recall
knowledge across sessions. Each turn, blumi embeds your latest message, finds the most relevant
memories, and injects them as background context — appended as a trailing user message so the
cached system prompt stays byte-identical (prompt caching keeps working). With embeddings off, it
degrades to FTS5 keyword recall.
Namespaces
-
user— durable facts/preferences about you. Private — never diffuses off the node. -
agent— the agent's own notes/conventions (the default for "remember this..."). Diffusable. -
project:<hash>— project-scoped knowledge. Diffusable.
Using it from chat — the memory tool:
-
add— remember a fact (mirrored toMEMORY.md/USER.mdand embedded, dedup-gated). -
query— semantic search over everything remembered. -
read/replace/remove— manage the file-backed entries.
Tip: phrasing decides the namespace. "Remember about me: I prefer dark mode" →
user(private). "Remember this project convention: ..." →agent/project(shareable across the grid).
Memory that only grows becomes noise. blumi applies the governance from the SEDM paper:
-
Write-admission (dedup). Before inserting, blumi checks the closest existing memory; if it's a
near-duplicate (cosine ≥
dedup_threshold, default 0.92) it merges and bumps utility instead of piling up. -
Utility scoring. Every time a memory is recalled or merged, its
hits/utilityrise — so what actually helps surfaces more. -
Consolidation & eviction. A periodic sweep folds near-duplicate clusters into the
highest-utility member and soft-evicts the weakest once a namespace exceeds
max_per_namespace. -
Cross-peer diffusion. The sweep pushes high-value, non-
usermemories to live grid peers (POST /api/grid/memory, authenticated by the shared grid secret). Each receiver re-admits the memory through its own dedup gate and tags it with the origin node, so memories never loop (A→B→A). What one machine learns, the others pick up. Yourusernamespace never leaves the node.
Diffusion needs the grid enabled on the participating nodes and memory.diffuse = true
(default). The sweep runs every memory.sweep_secs (default 60s), so a new shareable memory appears
on peers within ~a minute.
See diffusion happen (deterministic check on each machine):
sqlite3 ~/.blumi/blumi.db \ "SELECT namespace, origin, substr(text,1,60) FROM memories WHERE status='active';"
On the node that learned it, the row has origin = ''; on a peer that received it, origin is the
sender's node name.
Index a repo into a sibling ~/.blumi/knowledge.db and search it by meaning or keyword (hybrid:
FTS5 first for exact/symbol precision, vector fill for semantic recall). Indexing is gitignore-aware
and diff-aware — re-running only touches changed files (by SHA). Symbol extraction is lightweight
(per-language, no external grammars), and asset/binary files (svg, images, lockfiles...) are skipped.
blumi knowledge ingest ~/code/my-repo # index a repo/dir (incremental) blumi knowledge search "where is auth handled" blumi knowledge status # files · symbols · vectors blumi knowledge list # indexed sources blumi knowledge remove <source> # drop an indexed source
-
code_search— find relevant functions/types/code by meaning or keywords; returns file:line + snippet. Use it to locate where something is implemented before reading/editing. -
code_retrieve— fetch a file's indexed symbols (with snippets), or one symbol by name.
Control center → Code: index a repo (path on the gateway machine, with live progress), list/remove sources, and search code with file:line + snippet result cards.
The index is local to each node (the orchestrator's
knowledge.db). Re-ingest is cheap, so keep it fresh as the code changes.
All in-process numeric work is the bundled embedder (the ONNX bge-small-en-v1.5 model behind
memory and code search) — blumi never runs an LLM in-process. blumi runs that embedder on the GPU
when one is detected and falls back to CPU otherwise. blumi accel doctor prints exactly what's
detected and how to wire more.
| Accelerator | How | Notes |
|---|---|---|
| Apple CoreML / Metal | On by default on Apple Silicon — no flag | baked into the ONNX Runtime blumi already downloads (≈zero extra binary size) |
| NVIDIA CUDA | Opt-in: curl ... | BLUMI_CUDA=1 sh, or cargo install --features gpu-cuda
|
ships libonnxruntime.so next to the binary; installer verifies it loads, else falls back to a lean build |
| CPU | Automatic fallback everywhere | the lean default on Linux/Windows/CI (FTS5 keyword search) |
blumi accel detect # the chosen accelerator on one line (apple-coreml | cuda | cpu) blumi accel status # detection + the embedder's active execution provider + backends blumi accel doctor # full report + copy-paste setup hints (local GPU server / CUDA)
-
The bigger GPU win is the LLM, not the embedder. Since blumi is BYOK and runs no model
in-process, point a provider at a local GPU server — the
ollama,local-mlx, andlocal-cudapresets ship in config. On Linux/NVIDIA this is the recommended path: a lean blumi + Ollama for the LLM and GPU embeddings, with no fragile native linking. -
Grid-aware offload. Every node reports its accelerator in
/api/grid/metrics(strongest = CUDA > Apple CoreML > CPU). A CPU node can setembeddings.backend = "grid"to offload embedding to the strongest GPU peer (secret-authedPOST /api/grid/embed) with a local/FTS5 fallback — so a lean Linux box rides a Mac/CUDA peer for vectors. See Grid → GPU-aware embedding offload. The active accelerator shows in the TUI (/accel),/api/status, and the blugo Status/Grid panels.
The heavy embedder build is Apple-default and opt-in elsewhere, so a plain Linux/CI install stays lean and never does a multi-GB native link unless you ask for GPU (
BLUMI_CUDA=1/--features gpu-cuda). Mismatch (e.g.acceleration.mode = "cuda"on a CPU build) degrades to CPU with a loud warning —blumi accel doctorshows the compiled features.
In ~/.blumi/settings.json (every field shown is the default — omit to keep it):
{
"embeddings": {
"enabled": true,
"backend": "local",
"model": "bge-small-en-v1.5",
"dim": 384
},
"memory": {
"enabled": true,
"recall_k": 5,
"dedup_threshold": 0.92,
"max_per_namespace": 2000,
"diffuse": true,
"sweep_secs": 60
},
"knowledge": {
"enabled": true,
"max_file_kb": 256,
"exclude": []
},
"acceleration": {
"mode": "auto"
},
"heal": {
"enabled": true,
"recovery_budget": 2,
"verify": false,
"learn": true,
"evolve": "auto",
"redact_paths": true
}
}-
embeddings.backend—local(bundled ONNX model, fully offline),openai(a configured OpenAI-compatible/embeddingsendpoint, e.g. Ollama/llama.cpp/cloud), orgrid(offload to the strongest GPU peer viaPOST /api/grid/embed, with a local/FTS5 fallback — see GPU acceleration). -
acceleration.mode—auto(detect),cpu,apple(CoreML/Metal), orcudafor the bundled embedder. See GPU acceleration. -
heal— self-healing reflex recovery + failure→fix learning + evolution; learned fixes are stored asagent-namespace memories (so they diffuse). See Self-Management → Self-healing. -
Lean / constrained install: set
embeddings.enabled = falseto skip the model entirely — memory and code search then run on FTS5 keyword matching, no download.
The bundled-embeddings binary is the default; a --no-default-features build of blumi-llm omits the
ONNX runtime for a leaner artifact.
-
First memory/code op is slow, then fast. That's the one-time ~130 MB model download
(
~/.blumi/models). It runs in the background; recall falls back to keyword until it's ready, and a later sweep backfills any vectors. Confirm withdu -sh ~/.blumi/models(~130 MB when complete). -
Search returns irrelevant results. Make sure the repo is indexed (
blumi knowledge status). Symbol/keyword queries are answered FTS-first, so exact names land precisely; very broad queries lean on vectors. -
Diffusion didn't reach a peer. Both nodes must run a build with
/api/grid/memory, share the grid secret, and have the grid up; the memory must be in a non-usernamespace; allow one sweep (~sweep_secs). Theusernamespace never diffuses by design.
See also: Grid (distributed) · Configuration · Troubleshooting.