Memory and Knowledge

ankurCES edited this page Jun 6, 2026 · 10 revisions

Memory & Knowledge

blumi is built for long-running and distributed agentic work, which needs three things most assistants lack: memory that survives a crash, memory that learns and travels across machines, and a real understanding of your codebase. blumi ships all of it local-first — a bundled embedding model means no API key and nothing leaves your machines.

This page covers durable execution, semantic long-term memory, SEDM cross-peer diffusion, and the native code knowledge base — plus how to configure, drive, and test each.

At a glance

Capability	What it does	Drive it with
Durable execution	Checkpoints each tool step; resumes mid-turn after a crash/restart	automatic
Semantic memory (RAG)	Recalls relevant memories each turn (vectors, FTS5 fallback)	automatic + `memory` tool
SEDM governance	Dedup on write, utility scoring, consolidation/eviction	automatic
Cross-peer diffusion	Spreads worth-sharing memories across the grid	automatic (grid + `diffuse`)
Code knowledge base	Index a repo; hybrid vector/FTS5 code search	`blumi knowledge`, `code_search`/`code_retrieve`, blugo Code tab
GPU acceleration	Runs the bundled embedder on Apple CoreML / NVIDIA CUDA when present	automatic · `blumi accel doctor` · `/accel`
Self-healing	Classifies tool failures → budgeted recovery; learns failure→fix episodes	automatic · `heal` config · `/heal`

Everything is on by default. The first memory/knowledge operation downloads the embedding model (~130 MB, bge-small-en-v1.5) once into ~/.blumi/models; after that it runs fully offline. If a node can't reach the network, memory/knowledge transparently fall back to keyword (FTS5) search.

Durable execution

Every tool step within a turn is checkpointed to SQLite (checkpoints table in blumi.db). If the process crashes or the gateway is restarted mid-turn, the next run resumes from the last completed step instead of replaying the whole turn — the backbone for long autonomous runs (blumi loop, grid-dispatched work). Resume is at-least-once: idempotent tools + the doom-loop guard keep it safe.

Nothing to configure. To see it: start a multi-step task, restart the gateway (launchctl kickstart -k gui/$(id -u)/com.blumi.serve) mid-run, reconnect — it picks up where it left off.

Semantic long-term memory (RAG)

A vector store (memories + memory_vectors + memories_fts in blumi.db) lets blumi recall knowledge across sessions. Each turn, blumi embeds your latest message, finds the most relevant memories, and injects them as background context — appended as a trailing user message so the cached system prompt stays byte-identical (prompt caching keeps working). With embeddings off, it degrades to FTS5 keyword recall.

Namespaces

user — durable facts/preferences about you. Private — never diffuses off the node.
agent — the agent's own notes/conventions (the default for "remember this..."). Diffusable.
project:<hash> — project-scoped knowledge. Diffusable.

Using it from chat — the memory tool:

add — remember a fact (mirrored to MEMORY.md/USER.md and embedded, dedup-gated).
query — semantic search over everything remembered.
read / replace / remove — manage the file-backed entries.

Tip: phrasing decides the namespace. "Remember about me: I prefer dark mode" → user (private). "Remember this project convention: ..." → agent/project (shareable across the grid).

SEDM — self-evolving, distributed memory

Memory that only grows becomes noise. blumi applies the governance from the SEDM paper:

Write-admission (dedup). Before inserting, blumi checks the closest existing memory; if it's a near-duplicate (cosine ≥ dedup_threshold, default 0.92) it merges and bumps utility instead of piling up.
Utility scoring. Every time a memory is recalled or merged, its hits/utility rise — so what actually helps surfaces more.
Consolidation & eviction. A periodic sweep folds near-duplicate clusters into the highest-utility member and soft-evicts the weakest once a namespace exceeds max_per_namespace.
Cross-peer diffusion. The sweep pushes high-value, non-user memories to live grid peers (POST /api/grid/memory, authenticated by the shared grid secret). Each receiver re-admits the memory through its own dedup gate and tags it with the origin node, so memories never loop (A→B→A). What one machine learns, the others pick up. Your user namespace never leaves the node.

Diffusion needs the grid enabled on the participating nodes and memory.diffuse = true (default). The sweep runs every memory.sweep_secs (default 60s), so a new shareable memory appears on peers within ~a minute.

See diffusion happen (deterministic check on each machine):

sqlite3 ~/.blumi/blumi.db \
 "SELECT namespace, origin, substr(text,1,60) FROM memories WHERE status='active';"

On the node that learned it, the row has origin = ''; on a peer that received it, origin is the sender's node name.

Native code knowledge base

Index a repo into a sibling ~/.blumi/knowledge.db and search it by meaning or keyword (hybrid: FTS5 first for exact/symbol precision, vector fill for semantic recall). Indexing is gitignore-aware and diff-aware — re-running only touches changed files (by SHA). Symbol extraction is lightweight (per-language, no external grammars), and asset/binary files (svg, images, lockfiles...) are skipped.

CLI — `blumi knowledge`

blumi knowledge ingest ~/code/my-repo # index a repo/dir (incremental)
blumi knowledge search "where is auth handled"
blumi knowledge status # files · symbols · vectors
blumi knowledge list # indexed sources
blumi knowledge remove <source> # drop an indexed source

From chat — agent tools

code_search — find relevant functions/types/code by meaning or keywords; returns file:line + snippet. Use it to locate where something is implemented before reading/editing.
code_retrieve — fetch a file's indexed symbols (with snippets), or one symbol by name.

From the phone — blugo Code tab

Control center → Code: index a repo (path on the gateway machine, with live progress), list/remove sources, and search code with file:line + snippet result cards.

The index is local to each node (the orchestrator's knowledge.db). Re-ingest is cheap, so keep it fresh as the code changes.

GPU acceleration

All in-process numeric work is the bundled embedder (the ONNX bge-small-en-v1.5 model behind memory and code search) — blumi never runs an LLM in-process. blumi runs that embedder on the GPU when one is detected and falls back to CPU otherwise. blumi accel doctor prints exactly what's detected and how to wire more.

Accelerator	How	Notes
Apple CoreML / Metal	On by default on Apple Silicon — no flag	baked into the ONNX Runtime blumi already downloads (≈zero extra binary size)
NVIDIA CUDA	Opt-in: `curl ... \| BLUMI_CUDA=1 sh`, or `cargo install --features gpu-cuda`	ships `libonnxruntime.so` next to the binary; installer verifies it loads, else falls back to a lean build
CPU	Automatic fallback everywhere	the lean default on Linux/Windows/CI (FTS5 keyword search)

blumi accel detect # the chosen accelerator on one line (apple-coreml | cuda | cpu)
blumi accel status # detection + the embedder's active execution provider + backends
blumi accel doctor # full report + copy-paste setup hints (local GPU server / CUDA)

The bigger GPU win is the LLM, not the embedder. Since blumi is BYOK and runs no model in-process, point a provider at a local GPU server — the ollama, local-mlx, and local-cuda presets ship in config. On Linux/NVIDIA this is the recommended path: a lean blumi + Ollama for the LLM and GPU embeddings, with no fragile native linking.
Grid-aware offload. Every node reports its accelerator in /api/grid/metrics (strongest = CUDA > Apple CoreML > CPU). A CPU node can set embeddings.backend = "grid" to offload embedding to the strongest GPU peer (secret-authed POST /api/grid/embed) with a local/FTS5 fallback — so a lean Linux box rides a Mac/CUDA peer for vectors. See Grid → GPU-aware embedding offload. The active accelerator shows in the TUI (/accel), /api/status, and the blugo Status/Grid panels.

The heavy embedder build is Apple-default and opt-in elsewhere, so a plain Linux/CI install stays lean and never does a multi-GB native link unless you ask for GPU (BLUMI_CUDA=1 / --features gpu-cuda). Mismatch (e.g. acceleration.mode = "cuda" on a CPU build) degrades to CPU with a loud warning — blumi accel doctor shows the compiled features.

Configuration

In ~/.blumi/settings.json (every field shown is the default — omit to keep it):

{
 "embeddings": {
 "enabled": true,
 "backend": "local",
 "model": "bge-small-en-v1.5",
 "dim": 384
 },
 "memory": {
 "enabled": true,
 "recall_k": 5,
 "dedup_threshold": 0.92,
 "max_per_namespace": 2000,
 "diffuse": true,
 "sweep_secs": 60
 },
 "knowledge": {
 "enabled": true,
 "max_file_kb": 256,
 "exclude": []
 },
 "acceleration": {
 "mode": "auto"
 },
 "heal": {
 "enabled": true,
 "recovery_budget": 2,
 "verify": false,
 "learn": true,
 "evolve": "auto",
 "redact_paths": true
 }
}

embeddings.backend — local (bundled ONNX model, fully offline), openai (a configured OpenAI-compatible /embeddings endpoint, e.g. Ollama/llama.cpp/cloud), or grid (offload to the strongest GPU peer via POST /api/grid/embed, with a local/FTS5 fallback — see GPU acceleration).
acceleration.mode — auto (detect), cpu, apple (CoreML/Metal), or cuda for the bundled embedder. See GPU acceleration.
heal — self-healing reflex recovery + failure→fix learning + evolution; learned fixes are stored as agent-namespace memories (so they diffuse). See Self-Management → Self-healing.
Lean / constrained install: set embeddings.enabled = false to skip the model entirely — memory and code search then run on FTS5 keyword matching, no download.

The bundled-embeddings binary is the default; a --no-default-features build of blumi-llm omits the ONNX runtime for a leaner artifact.

Troubleshooting

First memory/code op is slow, then fast. That's the one-time ~130 MB model download (~/.blumi/models). It runs in the background; recall falls back to keyword until it's ready, and a later sweep backfills any vectors. Confirm with du -sh ~/.blumi/models (~130 MB when complete).
Search returns irrelevant results. Make sure the repo is indexed (blumi knowledge status). Symbol/keyword queries are answered FTS-first, so exact names land precisely; very broad queries lean on vectors.
Diffusion didn't reach a peer. Both nodes must run a build with /api/grid/memory, share the grid secret, and have the grid up; the memory must be in a non-user namespace; allow one sweep (~sweep_secs). The user namespace never diffuses by design.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory and Knowledge

Memory & Knowledge

At a glance

Durable execution

Semantic long-term memory (RAG)

SEDM — self-evolving, distributed memory

Native code knowledge base

CLI — `blumi knowledge`

From chat — agent tools

From the phone — blugo Code tab

GPU acceleration

Configuration

Troubleshooting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blumi wiki

Clone this wiki locally

Memory and Knowledge

Memory & Knowledge

At a glance

Durable execution

Semantic long-term memory (RAG)

SEDM — self-evolving, distributed memory

Native code knowledge base

CLI — blumi knowledge

From chat — agent tools

From the phone — blugo Code tab

GPU acceleration

Configuration

Troubleshooting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blumi wiki

Clone this wiki locally

CLI — `blumi knowledge`