Configuration

ankurCES edited this page Jun 7, 2026 · 14 revisions

Configuration

Everything blumi does is driven by one JSON file — ~/.blumi/settings.json. This page is the complete reference: every section, its fields, its defaults, and copy-pasteable examples. All sections are optional — blumi ships with sensible defaults, so an empty file is valid and a one-line provider key is usually all you need to start.

The annotated JSON blocks below use // comments for clarity. Strip them if your editor is strict — settings.json is plain JSON.

How config is loaded (layering)

blumi merges these in order (later wins), via figment:

Built-in defaults
~/.blumi/settings.json — global, your main file
./.blumi/settings.json — per-project overrides (commit-safe project tweaks; secrets stay global)
Environment variables — prefix BLUMI_, nest with __ (e.g. BLUMI_LLM__MODEL=claude-opus-4-5, BLUMI_GRID__SECRET=..., BLUMI_PROVIDERS__OPENAI__API_KEY=...)

settings.json is written mode 0600 and holds secrets — never commit it (the repo's .blumi/ is gitignored). Per-invocation flags (--provider, --model, --persona, --yolo) beat all of the above.

Fastest path: the login wizard

blumi login

Pick a provider, paste a key (or endpoint), choose a model — it writes the right bits into settings.json. Re-run any time to switch. That's all most people need; the rest of this page is for tuning.

Applying changes

TUI / gateway session: ask the agent to reload_self, or use the web/phone "reload" — it re-reads settings.json without losing the conversation.
Process-level settings (bind host/port, the web password, the grid identity) are read once at startup, so they need a restart: launchctl kickstart -k gui/$(id -u)/com.blumi.serve (macOS) / systemctl --user restart blumi-serve (Linux), or just relaunch blumi tui.

Providers & models

Providers & keys

blumi is provider-agnostic. Built-in presets exist for the common ones (so you usually set only a key), keyed by name in providers:

Preset name	`kind`	Notes
`anthropic`	`anthropic`	Claude — API-key auth only
`openai`	`openai_compat`	OpenAI and any OpenAI-compatible endpoint (set `base_url`)
`gemini`	`gemini`	Google Gemini (native client)
`azure`	`anthropic_foundry`	Azure AI Foundry (Anthropic models)
`local`	`openai_compat`	a local server (llama.cpp / Ollama-compatible) — no key
`mock`	—	deterministic, for tests/demos

Each provider entry:

"providers": {
 "anthropic": { "api_key_env": "ANTHROPIC_API_KEY" }, // read key from env (preferred)
 "openai": { "api_key": "sk-...", "base_url": "https://api.openai.com/v1" },
 "local": { "kind": "openai_compat", "base_url": "http://localhost:11434/v1" }
}

api_key_env (read the key from an env var) is preferred over a literal api_key so the key never sits in the file.
base_url points OpenAI-compatible clients at any endpoint (Ollama, llama.cpp, OpenRouter, ...).
kind is only needed for a fully custom provider name; the presets above set it for you.

`llm` — the active model + turn limits

"llm": {
 "provider": "anthropic", // which entry in "providers"
 "model": "claude-sonnet-4-5", // "" = let the provider pick/probe
 "context_size": 131072,
 "max_output_tokens": 16384,
 "temperature": 0.7,
 "top_p": 0.8,
 "top_k": 20,
 "max_iterations": 25, // tool steps allowed per turn
 "max_auto_continue": 12, // self-continue rounds when a turn hits the step cap (0 = off)
 "max_auto_continue_tokens": 400000, // token ceiling for one auto-continue run (0 = no cap)
 "max_local_agents": 4 // max concurrent local sub-agents (overflow → grid or queue)
}

Override per run without editing: blumi --provider openai --model gpt-4o run "...".

Agent behavior

`permissions` — what the agent may do unattended

"permissions": {
 "yolo": false, // true = auto-approve EVERYTHING (use only sandboxed)
 "tools": {
 "Bash": { "deny": ["rm -rf*", "sudo*"], "ask": ["git push*"] },
 "FileWrite": { "allow": ["src/**"] }
 }
}

Per-tool allow / deny / ask pattern lists. Interactive by default (mutating tools surface an approval card); toggle YOLO live with ctrl+y / /yolo / --yolo. For real guardrails, pair with hooks.pre_tool_use.

`persona` + `personas` — agent style

"persona": "default", // active persona (built-in or custom)
"personas": {
 "reviewer": {
 "description": "Careful code reviewer",
 "instructions": "Be terse. Prioritise correctness + security. Propose diffs.",
 "model": "claude-opus-4-5", // optional: switch model when active
 "temperature": 0.2 // optional override
 }
}

Built-ins include architect, pair, reviewer, team. Switch with --persona <name> or /persona.

`executor` — where tools run

"executor": {
 "backend": "local", // "local" (host) | "docker" (sandbox) | "ssh" (remote)
 "docker_image": "debian:stable-slim",
 "ssh_host": "user@box", // for backend = "ssh"
 "ssh_workdir": "/home/user/proj"
}

Use docker (or ssh to a throwaway box) when you want YOLO/gateway autonomy without risking the host.

`brain` — local-LLM approval reviewer

"brain": {
 "mode": "off", // "off" | "advisory" (annotate) | "auto" (decide)
 "provider": "local", // "" = reuse the main agent's provider
 "model": "qwen2.5:3b" // a small/cheap/local model is ideal here
}

A cheap model that vets approval prompts so the flagship isn't interrupted — escalates to you on uncertainty or danger.

`router` — cost-aware model routing

"router": {
 "mode": "off", // "off" | "heuristic" | "hybrid" | "judge"
 "light": { "provider": "", "model": "claude-haiku-4-5" },
 "heavy": { "provider": "", "model": "claude-opus-4-5" },
 "judge": { "provider": "", "model": "" }, // "" = reuse brain.* then llm.*
 "subagent_tier": "light", // "light" | "heavy" | "inherit"
 "prefer_grid_light": false, // run the light tier on a grid peer's local model (free)
 "heuristics": {
 "heavy_chars": 1500, "light_chars": 280, "heavy_tool_count": 12,
 "escalate_iteration": 6, "heavy_keywords": [], "light_keywords": []
 }
}

Per turn, picks light vs flagship to cut cost. Off by default. Empty tier provider/model reuse the active llm.*. See Self-Management → Cost-aware routing.

`heal` — self-healing & evolution

"heal": {
 "enabled": true,
 "recovery_budget": 2, // recovery attempts per turn
 "verify": false, // only mark a fix "verified" if a later step succeeds
 "learn": true, // store failure→fix episodes in memory
 "evolve": "auto", // "auto" | "propose" | "off" (mine recurring fixes → skills)
 "redact_paths": true
}

Auto-recovers failed tool calls, learns fixes, and mines recurring ones into recovery skills. On by default. See Self-Management → Self-healing.

Memory & knowledge

These three power recall, semantic memory, and code search. All on by default — the bundled local embedder downloads a ~90 MB model on first use, then works fully offline. See Memory & Knowledge.

"embeddings": {
 "enabled": true,
 "backend": "local", // "local" (bundled ONNX) | "openai" (endpoint) | "grid" (peer)
 "provider": "", // for backend = "openai": a name from "providers"
 "model": "bge-small-en-v1.5",
 "dim": 384
},
"memory": {
 "enabled": true,
 "recall_k": 5, // memories injected per turn (RAG)
 "dedup_threshold": 0.92, // admission gate: near-duplicates are merged
 "max_per_namespace": 2000,
 "diffuse": true, // share non-`user` learnings across the grid
 "sweep_secs": 60 // governance cadence (eviction/consolidation)
},
"knowledge": {
 "enabled": true,
 "max_file_kb": 256, // skip files larger than this when indexing
 "exclude": ["target", "node_modules", ".git", "dist"]
}

Semantic memory accrues automatically (failure→fix episodes, the agent's memory tool); browse it with /memories (TUI) or the web/phone Control Center (where you can pin/edit/delete). The user namespace never leaves your node. Tell the agent "remember that ..." to force-store; pin an entry to exempt it from eviction.
Knowledge is empty until you index a repo: blumi knowledge ingest . then blumi knowledge status (or /knowledge in the TUI). Powers code_search / code_retrieve.

`acceleration` — embedder execution provider

"acceleration": {
 "mode": "auto", // "auto" | "cpu" | "apple" (CoreML) | "cuda"
 "embeddings_accel": "auto" // override just the bundled embedder
}

blumi accel doctor shows what was detected. See Memory & Knowledge → GPU acceleration.

Surfaces (web, phone, bots, grid)

`web` — the gateway / web UI auth

"web": { "password_hash": "" } // set via `blumi serve pair --password <pw>` (argon2; never plaintext)

A password is required when binding to a non-loopback address. The same server serves the browser UI and the blugo phone app. See Gateway.

`voice` — speech-to-text + text-to-speech

"voice": {
 "enabled": false,
 "api_key": "",
 "stt_base_url": "https://api.openai.com/v1",
 "stt_model": "whisper-1",
 "tts_provider": "openai", // "openai" | "elevenlabs"
 "tts_base_url": "", // blank = provider default
 "tts_model": "tts-1",
 "tts_voice": "alloy", // OpenAI voice name, or an ElevenLabs voice id
 "tts_api_key": "" // separate TTS key (falls back to api_key)
}

See Voice.

`gateway` — run blumi as a chat bot

"gateway": {
 "yolo": false, // auto-approve in bot sessions (only with a sandboxed executor!)
 "telegram": {
 "token": "123456:AA...", // @BotFather
 "allowed_chats": [], // [] = anyone who messages it; or [<your-chat-id>]
 "voice": false // transcribe voice notes + speak replies (needs voice.* too)
 },
 "discord": { "token": "", "allowed_channels": [] },
 "slack": { "bot_token": "xoxb-...", "app_token": "xapp-..." },
 "whatsapp": { "token": "", "phone_number_id": "", "verify_token": "", "webhook_port": 8080 }
}

Run one transport in the foreground (blumi gateway telegram) or all configured ones as a service (blumi gateway install). One bot per token — see Gateway → Messaging-bot gateway.

`grid` — distributed multi-node

"grid": {
 "enabled": false,
 "secret": "", // same value on every node = same grid (or BLUMI_GRID__SECRET)
 "grid_id": "", // blank = derived from the secret digest
 "node_name": "", // blank = hostname
 "peers": ["10.0.0.150:7777"] // static peers (in addition to mDNS auto-discovery)
}

Several blumi serve nodes sharing a secret form a grid that hands tasks to peers. The secret is never advertised (only a non-sensitive digest). See Grid.

`remote` + `workspaces`

"remote": { "instances": [] }, // remote blumi instances the TUI can attach to as tabs
"workspaces": { "roots": ["~/code"] } // dirs scanned for sibling git repos in the TUI sidebar

`git` — commit identity

"git": { "author_name": "Blumi", "author_email": "you@example.com" }

Stamped on commits the agent makes (GIT_AUTHOR_* / GIT_COMMITTER_*). Empty author_name disables the override.

Autonomy & notifications

`always_on` — proactive discovery

"always_on": {
 "enabled": false,
 "autonomy": "propose", // "off" | "propose" (add tasks) | "auto" (reserved)
 "cadence_secs": 900,
 "min_interval_secs": 300,
 "skip_if_todos": 1, // skip while the board already has todos
 "max_open_discoveries": 5,
 "max_per_pass": 3
}

When idle, the gateway runs a read-only discovery pass and proposes tasks + a report. Off by default. See Self-Management → Always-on discovery.

`notify` — completion notifications

"notify": {
 "enabled": false,
 "on": ["loop", "discovery"], // which completions fire (also "turn"); [] = loop+discovery
 "desktop": true, // OS notification on the host
 "bot": { "transport": "telegram", "target": "<chat-id>" }, // proactive bot message
 "web_push": false // browser Web Push (VAPID; secure-context only)
}

Pings you when blumi loop / discovery finishes. Off by default. See Self-Management → Completion notifications.

Push notifications (FCM)

The blugo phone app's Dispatch feature gets a push when a node finishes a turn. This is not a config-file setting — it's enabled by file presence alone, so it's zero-config and independent of the notify block above (it never adds desktop/bot noise).

To turn push on, drop a Firebase service account on the gateway machine:

cp your-firebase-adminsdk.json ~/.blumi/fcm-service-account.json
chmod 600 ~/.blumi/fcm-service-account.json

The gateway auto-detects the file and sends turn-complete pushes via FCM HTTP v1 (a short-lived OAuth token is minted from the service account and cached in-process). The project_id is read from the file — nothing else to configure.
Device tokens are stored at ~/.blumi/fcm.json (the app registers them on connect) and are pruned automatically when FCM reports them stale (HTTP 404 / UNREGISTERED).
No file = silent no-op. Dispatch still works in-app; you just don't get backgrounded pushes.
Never commit the service account (or the app's google-services.json) — both are gitignored. The private key stays at ~/.blumi/fcm-service-account.json (chmod 600) and never enters git.

Optional override: set notify.fcm.service_account_path (and notify.fcm.project_id) if you keep the file somewhere other than ~/.blumi/fcm-service-account.json. Push is Android-only today.

`hooks` — lifecycle hooks

"hooks": {
 "user_prompt_submit": [
 { "command": "git branch --show-current", "timeout_secs": 5 }
 ],
 "pre_tool_use": [
 { "command": "jq -e '.input.command|test(\"rm -rf\")|not' >/dev/null", "matcher": "Bash" }
 ]
}

user_prompt_submit injects each command's stdout as turn context; pre_tool_use can block a tool (non-zero exit). Off by default. See Self-Management → Lifecycle hooks.

Tools & integrations

`mcp_servers` — Model Context Protocol tools

"mcp_servers": {
 "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"],
 "env": { "GITHUB_TOKEN": "ghp_..." }, "enabled": true }
}

External MCP servers launched over stdio; their tools appear to the agent. {workspace} in args is substituted with the project path. See blumi mcp and CLI Usage.

`lsp_servers` — language servers (code intel)

"lsp_servers": {
 "rust": { "command": "rust-analyzer", "args": [], "extensions": ["rs"], "language_id": "rust" }
}

Power the Lsp code-intelligence tool (definitions, references, diagnostics).

Putting it together

A realistic everyday settings.json — Claude for the flagship, a cheap local brain, memory on, notifications to Telegram:

{
 "llm": { "provider": "anthropic", "model": "claude-sonnet-4-5" },
 "providers": {
 "anthropic": { "api_key_env": "ANTHROPIC_API_KEY" },
 "local": { "kind": "openai_compat", "base_url": "http://localhost:11434/v1" }
 },
 "brain": { "mode": "advisory", "provider": "local", "model": "qwen2.5:3b" },
 "router": { "mode": "hybrid", "light": { "model": "claude-haiku-4-5" },
 "heavy": { "model": "claude-opus-4-5" } },
 "memory": { "enabled": true },
 "knowledge": { "enabled": true },
 "git": { "author_name": "Blumi", "author_email": "you@example.com" },
 "notify": { "enabled": true, "bot": { "transport": "telegram", "target": "123456789" } },
 "gateway": { "telegram": { "token": "123456:AA...", "allowed_chats": [123456789] } }
}

Auto-continue (token budget)

When a turn stops only because it hit the per-turn step cap, blumi auto-continues the same session and narrates each step — bounded by both llm.max_auto_continue (default 12) and llm.max_auto_continue_tokens (default 400k), whichever hits first. Tune live with /autocontinue <n> (0 disables).

The memory files (`MEMORY.md` / `USER.md`)

Beyond semantic memory, two markdown files in ~/.blumi/ are read as a frozen snapshot each session: project MEMORY.md (agent notes) and USER.md (about you). View/edit them in the TUI (/memory), the web/phone Control Center, or on disk.

Configuration

Configuration

How config is loaded (layering)

Fastest path: the login wizard

Applying changes

Providers & models

Providers & keys

llm — the active model + turn limits

Agent behavior

permissions — what the agent may do unattended

persona + personas — agent style

executor — where tools run

brain — local-LLM approval reviewer

router — cost-aware model routing

heal — self-healing & evolution

Memory & knowledge

acceleration — embedder execution provider

Surfaces (web, phone, bots, grid)

web — the gateway / web UI auth

voice — speech-to-text + text-to-speech

gateway — run blumi as a chat bot

grid — distributed multi-node

remote + workspaces

git — commit identity

Autonomy & notifications

always_on — proactive discovery

notify — completion notifications

Push notifications (FCM)

hooks — lifecycle hooks

Tools & integrations

mcp_servers — Model Context Protocol tools

lsp_servers — language servers (code intel)

Putting it together

Auto-continue (token budget)

The memory files (MEMORY.md / USER.md)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blumi wiki

Clone this wiki locally

`llm` — the active model + turn limits

`permissions` — what the agent may do unattended

`persona` + `personas` — agent style

`executor` — where tools run

`brain` — local-LLM approval reviewer

`router` — cost-aware model routing

`heal` — self-healing & evolution

`acceleration` — embedder execution provider

`web` — the gateway / web UI auth

`voice` — speech-to-text + text-to-speech

`gateway` — run blumi as a chat bot

`grid` — distributed multi-node

`remote` + `workspaces`

`git` — commit identity

`always_on` — proactive discovery

`notify` — completion notifications

`hooks` — lifecycle hooks

`mcp_servers` — Model Context Protocol tools

`lsp_servers` — language servers (code intel)

The memory files (`MEMORY.md` / `USER.md`)