-
Notifications
You must be signed in to change notification settings - Fork 0
Memory and Knowledge
blumi is built for long-running and distributed agentic work, which needs three things most assistants lack: memory that survives a crash, memory that learns and travels across machines, and a real understanding of your codebase. blumi ships all of it local-first — a bundled embedding model means no API key and nothing leaves your machines.
This page covers durable execution, semantic long-term memory, SEDM cross-peer diffusion, and the native code knowledge base — plus how to configure, drive, and test each.
| Capability | What it does | Drive it with |
|---|---|---|
| Durable execution | Checkpoints each tool step; resumes mid-turn after a crash/restart | automatic |
| Semantic memory (RAG) | Recalls relevant memories each turn (vectors, FTS5 fallback) | automatic + memory tool |
| SEDM governance | Dedup on write, utility scoring, consolidation/eviction | automatic |
| Cross-peer diffusion | Spreads worth-sharing memories across the grid | automatic (grid + diffuse) |
| Code knowledge base | Index a repo; hybrid vector/FTS5 code search |
blumi knowledge, code_search/code_retrieve, blugo Code tab |
Everything is on by default. The first memory/knowledge operation downloads the embedding model
(~130 MB, bge-small-en-v1.5) once into ~/.blumi/models; after that it runs fully offline. If a
node can't reach the network, memory/knowledge transparently fall back to keyword (FTS5) search.
Every tool step within a turn is checkpointed to SQLite (checkpoints table in blumi.db). If the
process crashes or the gateway is restarted mid-turn, the next run resumes from the last completed
step instead of replaying the whole turn — the backbone for long autonomous runs (blumi loop,
grid-dispatched work). Resume is at-least-once: idempotent tools + the doom-loop guard keep it safe.
Nothing to configure. To see it: start a multi-step task, restart the gateway
(launchctl kickstart -k gui/$(id -u)/com.blumi.serve) mid-run, reconnect — it picks up where it
left off.
A vector store (memories + memory_vectors + memories_fts in blumi.db) lets blumi recall
knowledge across sessions. Each turn, blumi embeds your latest message, finds the most relevant
memories, and injects them as background context — appended as a trailing user message so the
cached system prompt stays byte-identical (prompt caching keeps working). With embeddings off, it
degrades to FTS5 keyword recall.
Namespaces
-
user— durable facts/preferences about you. Private — never diffuses off the node. -
agent— the agent's own notes/conventions (the default for "remember this..."). Diffusable. -
project:<hash>— project-scoped knowledge. Diffusable.
Using it from chat — the memory tool:
-
add— remember a fact (mirrored toMEMORY.md/USER.mdand embedded, dedup-gated). -
query— semantic search over everything remembered. -
read/replace/remove— manage the file-backed entries.
Tip: phrasing decides the namespace. "Remember about me: I prefer dark mode" →
user(private). "Remember this project convention: ..." →agent/project(shareable across the grid).
Memory that only grows becomes noise. blumi applies the governance from the SEDM paper:
-
Write-admission (dedup). Before inserting, blumi checks the closest existing memory; if it's a
near-duplicate (cosine ≥
dedup_threshold, default 0.92) it merges and bumps utility instead of piling up. -
Utility scoring. Every time a memory is recalled or merged, its
hits/utilityrise — so what actually helps surfaces more. -
Consolidation & eviction. A periodic sweep folds near-duplicate clusters into the
highest-utility member and soft-evicts the weakest once a namespace exceeds
max_per_namespace. -
Cross-peer diffusion. The sweep pushes high-value, non-
usermemories to live grid peers (POST /api/grid/memory, authenticated by the shared grid secret). Each receiver re-admits the memory through its own dedup gate and tags it with the origin node, so memories never loop (A→B→A). What one machine learns, the others pick up. Yourusernamespace never leaves the node.
Diffusion needs the grid enabled on the participating nodes and memory.diffuse = true
(default). The sweep runs every memory.sweep_secs (default 60s), so a new shareable memory appears
on peers within ~a minute.
See diffusion happen (deterministic check on each machine):
sqlite3 ~/.blumi/blumi.db \ "SELECT namespace, origin, substr(text,1,60) FROM memories WHERE status='active';"
On the node that learned it, the row has origin = ''; on a peer that received it, origin is the
sender's node name.
Index a repo into a sibling ~/.blumi/knowledge.db and search it by meaning or keyword (hybrid:
FTS5 first for exact/symbol precision, vector fill for semantic recall). Indexing is gitignore-aware
and diff-aware — re-running only touches changed files (by SHA). Symbol extraction is lightweight
(per-language, no external grammars), and asset/binary files (svg, images, lockfiles...) are skipped.
blumi knowledge ingest ~/code/my-repo # index a repo/dir (incremental) blumi knowledge search "where is auth handled" blumi knowledge status # files · symbols · vectors blumi knowledge list # indexed sources blumi knowledge remove <source> # drop an indexed source
-
code_search— find relevant functions/types/code by meaning or keywords; returns file:line + snippet. Use it to locate where something is implemented before reading/editing. -
code_retrieve— fetch a file's indexed symbols (with snippets), or one symbol by name.
Control center → Code: index a repo (path on the gateway machine, with live progress), list/remove sources, and search code with file:line + snippet result cards.
The index is local to each node (the orchestrator's
knowledge.db). Re-ingest is cheap, so keep it fresh as the code changes.
In ~/.blumi/settings.json (every field shown is the default — omit to keep it):
{
"embeddings": {
"enabled": true,
"backend": "local",
"model": "bge-small-en-v1.5",
"dim": 384
},
"memory": {
"enabled": true,
"recall_k": 5,
"dedup_threshold": 0.92,
"max_per_namespace": 2000,
"diffuse": true,
"sweep_secs": 60
},
"knowledge": {
"enabled": true,
"max_file_kb": 256,
"exclude": []
}
}-
embeddings.backend—local(bundled ONNX model, fully offline),openai(a configured OpenAI-compatible/embeddingsendpoint, e.g. Ollama/llama.cpp/cloud), orgrid(offload to a peer). -
Lean / constrained install: set
embeddings.enabled = falseto skip the model entirely — memory and code search then run on FTS5 keyword matching, no download.
The bundled-embeddings binary is the default; a --no-default-features build of blumi-llm omits the
ONNX runtime for a leaner artifact.
-
First memory/code op is slow, then fast. That's the one-time ~130 MB model download
(
~/.blumi/models). It runs in the background; recall falls back to keyword until it's ready, and a later sweep backfills any vectors. Confirm withdu -sh ~/.blumi/models(~130 MB when complete). -
Search returns irrelevant results. Make sure the repo is indexed (
blumi knowledge status). Symbol/keyword queries are answered FTS-first, so exact names land precisely; very broad queries lean on vectors. -
Diffusion didn't reach a peer. Both nodes must run a build with
/api/grid/memory, share the grid secret, and have the grid up; the memory must be in a non-usernamespace; allow one sweep (~sweep_secs). Theusernamespace never diffuses by design.
See also: Grid (distributed) · Configuration · Troubleshooting.