-
Notifications
You must be signed in to change notification settings - Fork 0
Memory and Knowledge
Memory & Knowledge is how blumi — a local-first, bring-your-own-key (BYOK) AI coding agent — remembers facts across sessions, survives crashes mid-task, learns which memories actually work, shares knowledge across machines, and understands your codebase. It needs three things most assistants lack: memory that survives a crash, memory that learns and travels across machines, and a real understanding of your code. blumi ships all of it local-first — a bundled embedding model means no API key for memory or code search, and nothing leaves your machines.
This page covers durable execution, semantic long-term memory (RAG), SEDM cross-peer diffusion, and the native code knowledge base — plus how to configure, drive, and test each.
TL;DR — Memory & Knowledge key facts
-
Local-first by default. Memory and code search run on a bundled on-device embedding model (
bge-small-en-v1.5), so they need no API key and send nothing to the cloud. - Semantic long-term memory (RAG). Each turn, blumi recalls the most relevant past memories and injects them as background context; RAG = retrieval-augmented generation.
- Durable execution. Every tool step is checkpointed to SQLite, so a crash or restart resumes mid-turn instead of replaying the whole turn.
- SEDM governance. SEDM = self-evolving distributed memory: dedup on write, value-based fitness, consolidation/eviction, and probationary failure→fix learning keep memory useful, not just large.
-
Cross-peer diffusion. High-value, non-private memories spread across the grid so what one machine learns, the others pick up — your
user(private) namespace never leaves the node. - Native code knowledge base. Index any repo and search it by meaning or keyword (hybrid vector + FTS5) from the CLI, agent tools, or the blugo Code tab.
- Graceful offline fallback. If embeddings are off or the network is unreachable, memory and knowledge fall back to FTS5 keyword search — they keep working.
| Capability | What it does | Drive it with |
|---|---|---|
| Durable execution | Checkpoints each tool step; resumes mid-turn after a crash/restart | automatic |
| Semantic memory (RAG) | Recalls relevant memories each turn (vectors, FTS5 fallback) | automatic + memory tool |
| SEDM governance | Dedup on write, value-based fitness, consolidation/eviction, probationary learning | automatic |
| Cross-peer diffusion | Spreads worth-sharing memories across the grid | automatic (grid + diffuse) |
| Code knowledge base | Index a repo; hybrid vector/FTS5 code search |
blumi knowledge, code_search/code_retrieve, blugo Code tab |
| GPU acceleration | Runs the bundled embedder on Apple CoreML / NVIDIA CUDA when present | automatic · blumi accel doctor · /accel
|
| Self-healing | Classifies tool failures → budgeted recovery; learns failure→fix episodes | automatic · heal config · /heal
|
Everything is on by default. The first memory/knowledge operation downloads the embedding model
(~130 MB, bge-small-en-v1.5) once into ~/.blumi/models; after that it runs fully offline. If a
node can't reach the network, memory/knowledge transparently fall back to keyword (FTS5) search.
Durable execution means a crashed or restarted run resumes mid-turn instead of replaying the whole turn. Every tool step within a turn is checkpointed to SQLite (checkpoints table in blumi.db). If the
process crashes or the gateway is restarted mid-turn, the next run resumes from the last completed
step — the backbone for long autonomous runs (blumi loop,
grid-dispatched work). Resume is at-least-once: idempotent tools + the doom-loop guard keep it safe.
Nothing to configure. To see it: start a multi-step task, restart the gateway
(launchctl kickstart -k gui/$(id -u)/com.blumi.serve) mid-run, reconnect — it picks up where it
left off.
blumi's long-term memory uses retrieval-augmented generation (RAG): each turn it embeds your latest message, finds the most relevant past memories, and injects them as context. A vector store (memories + memory_vectors + memories_fts in blumi.db) lets blumi recall
knowledge across sessions. The recalled memories are added as background context — appended as a trailing user message so the
cached system prompt stays byte-identical (prompt caching keeps working). With embeddings off, it
degrades to FTS5 keyword recall.
Namespaces
-
user— durable facts/preferences about you. Private — never diffuses off the node. -
agent— the agent's own notes/conventions (the default for "remember this..."). Diffusable. -
project:<hash>— project-scoped knowledge. Diffusable.
Using it from chat — the memory tool:
-
add— remember a fact (mirrored toMEMORY.md/USER.mdand embedded, dedup-gated). -
query— semantic search over everything remembered. -
read/replace/remove— manage the file-backed entries.
Tip: phrasing decides the namespace. "Remember about me: I prefer dark mode" →
user(private). "Remember this project convention: ..." →agent/project(shareable across the grid).
Memory isn't a black box: you can inspect and curate individual entries, not just the MEMORY.md/USER.md files. Each entry shows its namespace, kind, utility, hit count, origin, and timestamps.
- Pin an entry to protect it — pinned entries are exempt from eviction + consolidation, so a critical decision never drifts or gets merged away.
- Edit an entry's text — blumi re-embeds it (and keeps the FTS5 index in sync), so search stays correct.
- Delete an entry outright (its vector + graph edges go too).
Drive it from the Web UI's Memory tab, or the API (behind the gateway password):
POST /api/memory/list {"namespace":"agent","status":"active","limit":200}
POST /api/memory/pin {"id":123,"pinned":true}
POST /api/memory/update {"id":123,"text":"corrected fact"}
POST /api/memory/delete {"id":123}
Editing or pinning a user-namespace entry stays local — it never changes what crosses the grid
(diffusion still excludes user% and ignores pinned).
SEDM (self-evolving distributed memory) is the governance layer that keeps blumi's memory useful instead of just large. Memory that only grows becomes noise, so blumi applies the governance from the SEDM paper:
-
Write-admission (dedup). Before inserting, blumi checks the closest existing memory; if it's a
near-duplicate (cosine ≥
dedup_threshold, default 0.92) it merges and bumps utility instead of piling up. -
Utility scoring. Every time a memory is recalled or merged, its
hits/utilityrise — so what actually helps surfaces more. -
Consolidation & eviction. A periodic sweep folds near-duplicate clusters into the
highest-utility member and soft-evicts the weakest once a namespace exceeds
max_per_namespace. -
Cross-peer diffusion. The sweep pushes high-value, non-
usermemories to live grid peers (POST /api/grid/memory, authenticated by the shared grid secret). Each receiver re-admits the memory through its own dedup gate and tags it with the origin node, so memories never loop (A→B→A). What one machine learns, the others pick up. Yourusernamespace never leaves the node.
Beyond dedup + utility, memory now learns from what actually works — outcome, not just retrieval:
- Probationary failure→fix. A guided recovery is stored as a pending hypothesis — physically present but invisible to recall/mining/diffusion — and only promoted to a real fix once the same tool is observed to succeed on a later step (cross-step verify, on by default), with provenance. Unverified guesses never get recalled as "fixes"; un-promoted hypotheses are reaped by the sweep.
-
Value-based fitness. Each memory carries a learned
valuedistinct from retrievalutility: rewarded when it was in context for a productive step, decayed on failures, and bumped when a peer independently holds it (cross-node consensus). Eviction drops the lowest-valuememory first — so genuinely-useful memories survive, not merely frequently-retrieved ones. (With RPL-Judgement on, its "regret" Error-Delta feeds this same signal.) - Structure-aware recall. Recall re-ranks a wider candidate pool by memory-graph degree (hub-suppression — generic "matches-everything" notes stop dominating) and is seeded from the last couple of user turns, not just the latest line.
-
Conflict resolution. Reversible supersede + conflict detection settle contradictions: with
memory.resolve_conflictson, the sweep asks the LLM to classify same-topic pairs and supersedes the outdated side (conservative — ambiguous pairs are left untouched). Diffusion shares the highest-value memories first. -
Retrospection (daily). A differential replay of session transcripts (
memory.retrospect, on by default) consolidates each session's new messages — read from a watermark, never the whole history — into durable learnings once permemory.retrospect_hours(default 24). So memory gets steadily more solid without re-iterating the chat each turn. Learnings are stored in theagentnamespace, so they diffuse across the grid like any other shared memory (the privateusernamespace never leaves the node). -
Curation everywhere. The sweep below now runs for standalone
blumi run/blumi tuitoo (not only the gateway), so memory is curated and the recall graph is built on every install.
Diffusion needs the grid enabled on the participating nodes and memory.diffuse = true
(default). The sweep runs every memory.sweep_secs (default 60s), so a new shareable memory appears
on peers within ~a minute.
See diffusion happen (deterministic check on each machine):
sqlite3 ~/.blumi/blumi.db \ "SELECT namespace, origin, substr(text,1,60) FROM memories WHERE status='active';"
On the node that learned it, the row has origin = ''; on a peer that received it, origin is the
sender's node name.
blumi's native code knowledge base indexes a repo into a sibling ~/.blumi/knowledge.db and searches it by meaning or keyword (hybrid:
FTS5 first for exact/symbol precision, vector fill for semantic recall). Indexing is gitignore-aware
and diff-aware — re-running only touches changed files (by SHA). Symbol extraction is lightweight
(per-language, no external grammars), and asset/binary files (svg, images, lockfiles...) are skipped.
By default the graph is Tier-0: edges are name co-occurrence (a symbol's body mentions another
symbol's name). Build with --features code-graph and set knowledge.graph.mode = "structural"
to parse with tree-sitter instead — declarations gain a scope-qualified name / parent / signature,
and references resolve into typed edges (call / type / implements / contains, marked
resolved when unambiguous). That makes code_graph impact <symbol> a precise change blast
radius, and (when knowledge.graph.rpl_impact is on) feeds the RPL blast radius so
editing a heavily-referenced file faces tighter review. Languages: Rust, Python, Go, JavaScript,
TypeScript. The default build stays native-lite (regex symbols, no C grammars); code_graph still
works over the lite edges, just less precisely.
blumi knowledge ingest ~/code/my-repo # index a repo/dir (incremental) blumi knowledge search "where is auth handled" blumi knowledge status # files · symbols · vectors blumi knowledge list # indexed sources blumi knowledge remove <source> # drop an indexed source blumi knowledge impact <symbol> # transitive callers — the change blast radius blumi knowledge callers <symbol> # who calls/uses it (callees = what it calls) blumi knowledge implementers <trait> # types implementing a trait
The same code-graph queries are also one tap away in the TUI (/knowledge lists the
most-connected "hot-spot" symbols), the gateway (POST /api/knowledge/graph), and blugo
(each Code-tab search result has an Impact action showing its change blast radius).
-
code_search— find relevant functions/types/code by meaning or keywords; returns file:line + snippet. Use it to locate where something is implemented before reading/editing. -
code_retrieve— fetch a file's indexed symbols (with snippets), or one symbol by name. -
code_graph— query the typed code graph around a symbol:callers(who calls/uses it),callees(what it calls),impact(transitive callers — the change blast radius of editing it), orimplementers(types implementing a trait). Use it before an edit to see what depends on something. (code_neighbors/code_pathcover raw graph traversal.)
Control center → Code: index a repo (path on the gateway machine, with live progress), list/remove sources, and search code with file:line + snippet result cards.
The index is local to each node (the orchestrator's
knowledge.db). Re-ingest is cheap, so keep it fresh as the code changes.
Does blumi use the GPU? Yes — blumi runs its bundled embedder on the GPU when one is detected and falls back to CPU otherwise, and no LLM ever runs in-process. All in-process numeric work is the bundled embedder (the ONNX bge-small-en-v1.5 model behind
memory and code search). blumi accel doctor prints exactly what's
detected and how to wire more.
| Accelerator | How | Notes |
|---|---|---|
| Apple CoreML / Metal | On by default on Apple Silicon — no flag | baked into the ONNX Runtime blumi already downloads (≈zero extra binary size) |
| NVIDIA CUDA | Opt-in: curl ... | BLUMI_CUDA=1 sh, or cargo install --features gpu-cuda
|
ships libonnxruntime.so next to the binary; installer verifies it loads, else falls back to a lean build |
| CPU | Automatic fallback everywhere | the lean default on Linux/Windows/CI (FTS5 keyword search) |
blumi accel detect # the chosen accelerator on one line (apple-coreml | cuda | cpu) blumi accel status # detection + the embedder's active execution provider + backends blumi accel doctor # full report + copy-paste setup hints (local GPU server / CUDA)
-
The bigger GPU win is the LLM, not the embedder. Since blumi is BYOK and runs no model
in-process, point a provider at a local GPU server — the
ollama,local-mlx, andlocal-cudapresets ship in config. On Linux/NVIDIA this is the recommended path: a lean blumi + Ollama for the LLM and GPU embeddings, with no fragile native linking. -
Grid-aware offload. Every node reports its accelerator in
/api/grid/metrics(strongest = CUDA > Apple CoreML > CPU). A CPU node can setembeddings.backend = "grid"to offload embedding to the strongest GPU peer (secret-authedPOST /api/grid/embed) with a local/FTS5 fallback — so a lean Linux box rides a Mac/CUDA peer for vectors. See Grid → GPU-aware embedding offload. The active accelerator shows in the TUI (/accel),/api/status, and the blugo Status/Grid panels.
The heavy embedder build is Apple-default and opt-in elsewhere, so a plain Linux/CI install stays lean and never does a multi-GB native link unless you ask for GPU (
BLUMI_CUDA=1/--features gpu-cuda). Mismatch (e.g.acceleration.mode = "cuda"on a CPU build) degrades to CPU with a loud warning —blumi accel doctorshows the compiled features.
In ~/.blumi/settings.json (every field shown is the default — omit to keep it):
{
"embeddings": {
"enabled": true,
"backend": "local",
"model": "bge-small-en-v1.5",
"dim": 384
},
"memory": {
"enabled": true,
"recall_k": 5,
"dedup_threshold": 0.92,
"max_per_namespace": 2000,
"diffuse": true,
"sweep_secs": 60
},
"knowledge": {
"enabled": true,
"max_file_kb": 256,
"exclude": []
},
"acceleration": {
"mode": "auto"
},
"heal": {
"enabled": true,
"recovery_budget": 2,
"verify": true,
"learn": true,
"evolve": "auto",
"redact_paths": true
}
}-
embeddings.backend—local(bundled ONNX model, fully offline),openai(a configured OpenAI-compatible/embeddingsendpoint, e.g. Ollama/llama.cpp/cloud), orgrid(offload to the strongest GPU peer viaPOST /api/grid/embed, with a local/FTS5 fallback — see GPU acceleration). -
acceleration.mode—auto(detect),cpu,apple(CoreML/Metal), orcudafor the bundled embedder. See GPU acceleration. -
heal— self-healing reflex recovery + failure→fix learning + evolution; learned fixes are stored asagent-namespace memories (so they diffuse). See Self-Management → Self-healing. -
Lean / constrained install: set
embeddings.enabled = falseto skip the model entirely — memory and code search then run on FTS5 keyword matching, no download.
The bundled-embeddings binary is the default; a --no-default-features build of blumi-llm omits the
ONNX runtime for a leaner artifact.
-
First memory/code op is slow, then fast. That's the one-time ~130 MB model download
(
~/.blumi/models). It runs in the background; recall falls back to keyword until it's ready, and a later sweep backfills any vectors. Confirm withdu -sh ~/.blumi/models(~130 MB when complete). -
Search returns irrelevant results. Make sure the repo is indexed (
blumi knowledge status). Symbol/keyword queries are answered FTS-first, so exact names land precisely; very broad queries lean on vectors. -
Diffusion didn't reach a peer. Both nodes must run a build with
/api/grid/memory, share the grid secret, and have the grid up; the memory must be in a non-usernamespace; allow one sweep (~sweep_secs). Theusernamespace never diffuses by design.
No. blumi's memory and code knowledge base run on a bundled, on-device embedding model (bge-small-en-v1.5), so they need no API key. You only bring your own key (BYOK) for the LLM that drives the agent and, optionally, for voice — memory and search work without one.
No — memory is local-first and stored in SQLite (~/.blumi/blumi.db). The bundled embedder runs on your machine, so nothing leaves it for memory or code search. With cross-peer diffusion on, high-value memories spread only to your own grid peers over an authenticated channel — never to a third party — and your private user namespace never leaves the originating node at all.
A plain vector database only stores and retrieves embeddings; blumi adds SEDM governance on top. blumi dedups on write (merging near-duplicates instead of piling up), scores each memory by retrieval utility and a learned value, consolidates and evicts the weakest entries, learns verified failure→fix episodes, and diffuses worthwhile memories across machines. Memory that only grows becomes noise — SEDM keeps it useful.
Through durable execution: every tool step is checkpointed to SQLite, so a crashed or restarted run resumes from the last completed step instead of replaying the whole turn. This is the backbone for long autonomous runs like blumi loop and grid-dispatched work.
blumi learns from outcomes, not just retrieval (v0.5.0). A guided recovery is stored as a probationary pending hypothesis and only promoted to a real fix once the same tool later succeeds (cross-step verify). Each memory also carries a learned value — rewarded when it helped a productive step, decayed on failures, and bumped when a peer independently holds it — and eviction drops the lowest-value memory first, so genuinely useful memories survive.
Yes. After the one-time ~130 MB model download into ~/.blumi/models, memory and code search run fully offline. If a node can't reach the network — or if you set embeddings.enabled = false — memory and knowledge transparently fall back to FTS5 keyword search and keep working.
See also: Grid (distributed) · Configuration · Troubleshooting.