Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Memory and Knowledge

ankurCES edited this page Jun 6, 2026 · 10 revisions

Memory & Knowledge

blumi is built for long-running and distributed agentic work, which needs three things most assistants lack: memory that survives a crash, memory that learns and travels across machines, and a real understanding of your codebase. blumi ships all of it local-first — a bundled embedding model means no API key and nothing leaves your machines.

This page covers durable execution, semantic long-term memory, SEDM cross-peer diffusion, and the native code knowledge base — plus how to configure, drive, and test each.


At a glance

Capability What it does Drive it with
Durable execution Checkpoints each tool step; resumes mid-turn after a crash/restart automatic
Semantic memory (RAG) Recalls relevant memories each turn (vectors, FTS5 fallback) automatic + memory tool
SEDM governance Dedup on write, utility scoring, consolidation/eviction automatic
Cross-peer diffusion Spreads worth-sharing memories across the grid automatic (grid + diffuse)
Code knowledge base Index a repo; hybrid vector/FTS5 code search blumi knowledge, code_search/code_retrieve, blugo Code tab

Everything is on by default. The first memory/knowledge operation downloads the embedding model (~130 MB, bge-small-en-v1.5) once into ~/.blumi/models; after that it runs fully offline. If a node can't reach the network, memory/knowledge transparently fall back to keyword (FTS5) search.


Durable execution

Every tool step within a turn is checkpointed to SQLite (checkpoints table in blumi.db). If the process crashes or the gateway is restarted mid-turn, the next run resumes from the last completed step instead of replaying the whole turn — the backbone for long autonomous runs (blumi loop, grid-dispatched work). Resume is at-least-once: idempotent tools + the doom-loop guard keep it safe.

Nothing to configure. To see it: start a multi-step task, restart the gateway (launchctl kickstart -k gui/$(id -u)/com.blumi.serve) mid-run, reconnect — it picks up where it left off.


Semantic long-term memory (RAG)

A vector store (memories + memory_vectors + memories_fts in blumi.db) lets blumi recall knowledge across sessions. Each turn, blumi embeds your latest message, finds the most relevant memories, and injects them as background context — appended as a trailing user message so the cached system prompt stays byte-identical (prompt caching keeps working). With embeddings off, it degrades to FTS5 keyword recall.

Namespaces

  • user — durable facts/preferences about you. Private — never diffuses off the node.
  • agent — the agent's own notes/conventions (the default for "remember this..."). Diffusable.
  • project:<hash> — project-scoped knowledge. Diffusable.

Using it from chat — the memory tool:

  • add — remember a fact (mirrored to MEMORY.md/USER.md and embedded, dedup-gated).
  • query — semantic search over everything remembered.
  • read / replace / remove — manage the file-backed entries.

Tip: phrasing decides the namespace. "Remember about me: I prefer dark mode" → user (private). "Remember this project convention: ..." → agent/project (shareable across the grid).


SEDM — self-evolving, distributed memory

Memory that only grows becomes noise. blumi applies the governance from the SEDM paper:

  • Write-admission (dedup). Before inserting, blumi checks the closest existing memory; if it's a near-duplicate (cosine ≥ dedup_threshold, default 0.92) it merges and bumps utility instead of piling up.
  • Utility scoring. Every time a memory is recalled or merged, its hits/utility rise — so what actually helps surfaces more.
  • Consolidation & eviction. A periodic sweep folds near-duplicate clusters into the highest-utility member and soft-evicts the weakest once a namespace exceeds max_per_namespace.
  • Cross-peer diffusion. The sweep pushes high-value, non-user memories to live grid peers (POST /api/grid/memory, authenticated by the shared grid secret). Each receiver re-admits the memory through its own dedup gate and tags it with the origin node, so memories never loop (A→B→A). What one machine learns, the others pick up. Your user namespace never leaves the node.

Diffusion needs the grid enabled on the participating nodes and memory.diffuse = true (default). The sweep runs every memory.sweep_secs (default 60s), so a new shareable memory appears on peers within ~a minute.

See diffusion happen (deterministic check on each machine):

sqlite3 ~/.blumi/blumi.db \
 "SELECT namespace, origin, substr(text,1,60) FROM memories WHERE status='active';"

On the node that learned it, the row has origin = ''; on a peer that received it, origin is the sender's node name.


Native code knowledge base

Index a repo into a sibling ~/.blumi/knowledge.db and search it by meaning or keyword (hybrid: FTS5 first for exact/symbol precision, vector fill for semantic recall). Indexing is gitignore-aware and diff-aware — re-running only touches changed files (by SHA). Symbol extraction is lightweight (per-language, no external grammars), and asset/binary files (svg, images, lockfiles...) are skipped.

CLI — blumi knowledge

blumi knowledge ingest ~/code/my-repo # index a repo/dir (incremental)
blumi knowledge search "where is auth handled"
blumi knowledge status # files · symbols · vectors
blumi knowledge list # indexed sources
blumi knowledge remove <source> # drop an indexed source

From chat — agent tools

  • code_search — find relevant functions/types/code by meaning or keywords; returns file:line + snippet. Use it to locate where something is implemented before reading/editing.
  • code_retrieve — fetch a file's indexed symbols (with snippets), or one symbol by name.

From the phone — blugo Code tab

Control center → Code: index a repo (path on the gateway machine, with live progress), list/remove sources, and search code with file:line + snippet result cards.

The index is local to each node (the orchestrator's knowledge.db). Re-ingest is cheap, so keep it fresh as the code changes.


Configuration

In ~/.blumi/settings.json (every field shown is the default — omit to keep it):

{
 "embeddings": {
 "enabled": true,
 "backend": "local",
 "model": "bge-small-en-v1.5",
 "dim": 384
 },
 "memory": {
 "enabled": true,
 "recall_k": 5,
 "dedup_threshold": 0.92,
 "max_per_namespace": 2000,
 "diffuse": true,
 "sweep_secs": 60
 },
 "knowledge": {
 "enabled": true,
 "max_file_kb": 256,
 "exclude": []
 }
}
  • embeddings.backendlocal (bundled ONNX model, fully offline), openai (a configured OpenAI-compatible /embeddings endpoint, e.g. Ollama/llama.cpp/cloud), or grid (offload to a peer).
  • Lean / constrained install: set embeddings.enabled = false to skip the model entirely — memory and code search then run on FTS5 keyword matching, no download.

The bundled-embeddings binary is the default; a --no-default-features build of blumi-llm omits the ONNX runtime for a leaner artifact.


Troubleshooting

  • First memory/code op is slow, then fast. That's the one-time ~130 MB model download (~/.blumi/models). It runs in the background; recall falls back to keyword until it's ready, and a later sweep backfills any vectors. Confirm with du -sh ~/.blumi/models (~130 MB when complete).
  • Search returns irrelevant results. Make sure the repo is indexed (blumi knowledge status). Symbol/keyword queries are answered FTS-first, so exact names land precisely; very broad queries lean on vectors.
  • Diffusion didn't reach a peer. Both nodes must run a build with /api/grid/memory, share the grid secret, and have the grid up; the memory must be in a non-user namespace; allow one sweep (~sweep_secs). The user namespace never diffuses by design.

See also: Grid (distributed) · Configuration · Troubleshooting.

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /