-
Notifications
You must be signed in to change notification settings - Fork 0
Self Management
blumi can evolve itself: author its own skills, edit its own config (validated), and reload in place to apply both — no restart, conversation preserved. These are agent tools the model calls during a turn; just ask it in chat ("add a skill for ...", "set llm.temperature to 0.3 and reload").
Status: the self-management tools below are live. Triggering them directly from the phone app (buttons for reload / restart / edit config / build skills) and a restart-the-gateway capability are rolling out incrementally.
Reads/writes ~/.blumi/settings.json by dotted key, and can add personas. Every change is
validated (it must deserialize as a valid config) before an atomic write, so a bad edit is
rejected rather than corrupting your config.
Process-level settings (bind host/port, the gateway password, the grid identity) are read once at startup, so changing them needs a restart, not just a reload.
Create / update / delete a SKILL.md under ~/.blumi/skills/<name>/. Skills are
progressive-disclosure instructions: their name + description sit in the system prompt; the agent
loads the full body on demand. See blumi skills and CLI Usage.
Emits a reload: blumi re-reads settings.json, re-scans skills, and rebuilds the session seeded
from the current conversation (messages, todos, token counts preserved). Use it after
self_config / manage_skill.
The always-on gateway is supervised: launchd KeepAlive (macOS) and systemd Restart=always
(Linux) auto-restart it on crash. That's crash recovery for free — see Gateway.
For a wedged-but-alive session, a reload (above) rebuilds it without losing the conversation.
Beyond crash recovery, blumi treats reliability as a bounded control problem (after the self-healing-orchestrators paper) and turns repeated failures into durable improvements — wired into the failure taxonomy and the semantic memory.
-
Reflex recovery. A failed tool result is classified (bad args, state conflict, crash,
empty) and gets a budgeted, targeted recovery action — re-read-then-retry, an argument fix from
the tool's hint, narrow-the-query, or escalate. Only idempotent (read-only) tools auto-retry;
mutating tools (Bash, FileWrite, ...) escalate rather than blind-retry. It composes with the
doom-loop guard, and each attempt emits an observability trace (
⚕ self-heal ...inline in the TUI). -
Learns from failures — only verified ones. A guided recovery is stored as a pending
failure→fix hypothesis and promoted to a real
agent-namespace episode only once the same tool is observed to succeed on a later step (see Confirmed below). Promoted fixes diffuse across the grid , and a similar future failure recalls the known fix as trailing guidance. Paths/secrets are redacted first; un-promoted guesses are reaped by the sweep, never recalled. -
Evolves. Recurring failure clusters are mined (on the gateway sweep) into auto-written
recovery skills (low-risk, with a notice — via the same
manage_skillactuator above); anything risky — config / providers / secrets / deletes — raises an approval instead. The audit trail is kept asevolutionmemories. -
Confirmed (the credit-assignment keystone). With
heal.verify(now on by default), a recovery is promoted from pending to a recallable fix only when the retried tool actually succeeds on a later step — ground truth, not just "a fix was suggested" — with provenance. So the agent can tell a fix that worked from one that didn't, and only proven fixes are ever recalled, mined into skills, or diffused.
Configure it (~/.blumi/settings.json; defaults shown):
"heal": { "enabled": true, "recovery_budget": 2, "verify": true, "learn": true, "evolve": "auto", "redact_paths": true }
-
evolve—"auto"(apply low-risk skills automatically, with a notice) ·"propose"(mine + always ask) ·"off"(kill switch: still recover & learn, but never self-modify). -
recovery_budgetmax recovery attempts per turn ·verifyrequire cross-step success ·learnwrite failure→fix episodes ·redact_pathsscrub paths/secrets before storage ·enabledmaster switch.
See it: the TUI /heal overlay (recovery / evolution / proposal counts + recent items), the
inline ⚕ self-heal traces, the blugo Heal tab, or GET /api/heal on the gateway.
An opt-in, adversarial, regret-minimizing reasoning loop that wraps the agent's tool calls —
"Raskolnikov's Psychological Loop". A standard agent maximizes success; an RPL agent minimizes
regret: it debates a risky plan with an internal prosecutor before anything touches the live
system. Off by default (rpl.enabled); it trades extra LLM calls for far fewer catastrophic
actions, so turn it on for high-stakes work.
When enabled, before a materialized tool batch executes, blumi runs five phases:
- The Hypothesis (blast radius). Map what the batch would touch — files written, commands, network egress, VCS mutations, whether it's destructive / reversible — into a 0–100 severity. Read-only / low-blast batches skip the loop entirely (the cheap path), so cost stays bounded.
-
The Fever Dream (simulation). For a batch over
blast_threshold, predict the worst-case outcome + a paranoia score. (MVP: a dry prediction; real sandboxed branch execution in throwaway git worktrees is a planned follow-up.) -
The Porfiry Node (adversarial judge). The plan goes to an adversarial "Porfiry" LLM judge
whose only job is to find the flaw — the ignored edge case, the un-re-read state, the irreversible
step. It must approve, or the plan is bounced (the flaw is injected as guidance and the
model re-plans), bounded by
max_defend_roundsbefore proceeding under caution. A flaky / unreachable judge fails open, so it never deadlocks the agent. - The Strike (actuation). The surviving plan flows through the normal typed tool pipeline, unchanged — same permission engine, same approvals.
-
The Confession (regret → memory). After execution, blumi compares the predicted risk to the
actual outcome and writes the Error Delta ("regret") back to memory as an
rpl_deltaepisode — feeding value-based memory fitness and priming future blast-radius judgments. It evolves from the consequence.
Configure it (~/.blumi/settings.json; defaults shown — enabled is false):
"rpl": { "enabled": false, "blast_threshold": 40, // 0–100 severity a mutating batch must hit to be reviewed "branches": 3, // simulated branches per review (1–5) "max_defend_rounds": 2, // Porfiry reject → re-plan rounds before proceeding "judge_model": "", // empty = reuse the main model "sandbox": "dry" // dry (predict) | worktree (real sandboxed sim — planned) }
RPL composes with — it doesn't replace — your permissions and
pre_tool_usehooks: those still gate the Strike. RPL adds an upstream adversarial review so a risky plan is caught and re-thought before it ever reaches them.
Per turn, blumi can pick a difficulty tier and route to a light vs flagship model — simple, mechanical work doesn't have to burn the flagship's price. Off by default. The mechanism is hybrid: a fast, zero-cost heuristic (prompt length, tool count, iteration depth, keyword hints) decides most turns, and only ambiguous turns consult a small local judge model (which fails safe to the cheap tier — an unreachable judge never silently upgrades you).
- Delegated sub-agents default to the cheap tier (
router.subagent_tier), so investigation/fan-out is cheap while the main agent stays sharp. - Model swaps are prompt-cache-safe: blumi only changes the model when the tier actually changes, and on a long turn it escalates Light→Heavy on deep iterations but never demotes mid-turn.
- On a grid,
router.prefer_grid_lightcan run the light tier on a peer's local model (free).
"router": { "mode": "hybrid", // off | heuristic | hybrid | judge "light": { "provider": "", "model": "claude-haiku-4-5" }, "heavy": { "provider": "", "model": "claude-opus-4-5" }, "judge": { "provider": "", "model": "" }, // empty = reuse brain.*, then llm.* "subagent_tier": "light", // light | heavy | inherit "prefer_grid_light": false }
See it: the TUI /route overlay (per-tier turns/tokens + $ saved vs all-heavy) or
GET /api/route. /route off|heuristic|hybrid|judge switches the mode live. Empty tier
provider/model reuse the active llm.*. (Cost is estimated from each model's list price; local /
unpriced models show as free.)
When you step away, blumi can keep finding what's worth doing. With it enabled, the always-on
gateway periodically runs a read-only discovery pass that surfaces candidate tasks for the
workspace, adds them to the task board as Discovered: todos,
and lands a markdown report in ~/.blumi/reports/ plus an agent-namespace discovery memory.
Off by default.
"always_on": { "enabled": true, "autonomy": "propose", // off | propose | auto "cadence_secs": 900, "min_interval_secs": 300, // rate-limit floor "skip_if_todos": 1, // skip while the board already has todos "max_open_discoveries": 5, "max_per_pass": 3 }
-
Safe by design: the pass runs one bounded turn with
yolo = false, so approval-requiring (mutating) tools are denied — it can read + reason but not change anything. It's gated by cadence + rate-limit + a board-busy check + an open-discovery cap, and stored text is path/secret-redacted. -
proposeadds Todo tasks + a report (you run them).autois reserved for autonomously running low-risk discoveries in an isolated git worktree/snapshot — a planned follow-up; today it behaves likepropose.
See it: blumi serve status (an always-on: line), GET /api/always-on (recent + reports), or
the TUI /discoveries overlay.
Claude-Code-style extension points. Two events are wired: user_prompt_submit (inject context)
and pre_tool_use (block a tool call).
When you submit a prompt, blumi runs your configured shell commands and injects each command's stdout as background context for that turn. The prompt is piped to the hook's stdin, the command runs in the workspace, and its output lands as a cache-safe trailing message (never the cached system prefix). Use it to stamp every turn with live context — current git branch, active ticket, deploy status, a lint summary, etc.
Before a tool runs, matching hooks receive a {"tool": "...", "input": {...}} JSON payload on stdin.
Exit non-zero to block the call (the hook's stderr/stdout becomes the denial reason the agent
sees); exit zero to fall through to your normal permissions policy. A hook runs
ahead of policy, so it can veto even an auto-allowed tool. Use it for guardrails — forbid
rm -rf, block writes outside a path, require a clean working tree before a deploy, etc.
-
matcherfilters by tool name (substring match; empty = every tool). E.g."matcher": "Bash"only gates shell commands. - Fail-open on infra errors: if the hook can't be spawned or times out, the call is allowed — a broken guardrail can't brick the agent. Only an explicit non-zero exit blocks.
"hooks": { "user_prompt_submit": [ { "command": "git branch --show-current", "timeout_secs": 5 }, { "command": "cat .blumi/context.md" } ], "pre_tool_use": [ { "command": "jq -e '.input.command | test(\"rm -rf\")|not' >/dev/null", "matcher": "Bash" } ] }
Hooks are trusted — they're your own commands from settings.json (the same trust model as cron),
so blumi runs them as written. Each is bounded by timeout_secs (default 10s) so a hung hook can't
stall a turn. A user_prompt_submit hook that times out or exits non-zero is simply skipped; a
pre_tool_use hook fails open (allows) on a spawn error or timeout, and blocks only on a clean
non-zero exit.
Hooks are read when the session is built, so a freshly added hook takes effect on the next
reload_self/ restart. The agent can add its own hooks viaself_config+reload_self.
When an autonomous run finishes — blumi loop or an always-on discovery pass — blumi can fan out a
short alert so you don't have to babysit it. Off by default. It's split into two kinds of channel:
Server-side push (reaches you with no app open), configured under notify:
-
Desktop — an OS notification on the box blumi runs on (macOS
osascript, Linuxnotify-send). - Gateway bot — a proactive message to one of your configured gateway bots (Telegram / Discord / Slack / WhatsApp), reusing that transport's credentials. You just pick the transport + destination.
-
Browser Web Push (
web_push: true) — push to subscribed browsers via VAPID. Each browser opts in with the Enable button in the web Control Center (Discovery tab); blumi keeps the VAPID keypair + subscriptions in~/.blumi/push.json.⚠️ Browser Web Push only works from a secure context — HTTPS orhttp://localhost— so a browser loading the gateway over a plain-HTTP LAN IP can't subscribe, and the Enable button explains why. It lights up once you put the gateway behind TLS (or open it onlocalhost).
"notify": { "enabled": true, "on": ["loop", "discovery"], // which completions fire (also: "turn"); empty = loop+discovery "desktop": true, "bot": { "transport": "telegram", "target": "123456789" }, "web_push": true }
-
targetis a Telegram chat id, Discord channel id, Slack channel, or WhatsApp recipient phone. - A half-configured bot (missing token/target) is a silent no-op, never an error. Every channel is best-effort — a failure is logged, never fatal to the run.
-
blumi loop --notifystill fires a one-off desktop notification for that run even whennotifyis off — handy for a quick foreground run.
Live-stream surfaces (no config — they ride the event stream a client is already watching, so they cover the interactive turn you started in that client):
- Browser in-tab alert — when a turn finishes while the web tab is backgrounded, blumi flashes the page title, badges the favicon, plays a short ping, and drops a click-to-focus toast. It stays silent while the tab is focused, so it never interrupts active use.
- blugo phone notification — the blugo app fires a heads-up local notification when a turn finishes while the app is backgrounded. Android 13+ asks for the notification permission on first launch; deny it and this is simply a no-op.
The server-side channels (desktop / bot / Web Push) are what reach you for background runs (
blumi loop, always-on discovery), which execute off-session; the in-tab/phone surfaces cover the turn you kicked off and then walked away from.
- Config writes are validated + atomic (temp file → rename), mode
0600. - Skill names are slug-jailed (no path traversal).
- These tools are powerful; under non-YOLO sessions, mutating actions still surface an approval card. Keep destructive operations behind "ask" in your permissions.