Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

OpenCode plugin with MemPalace persistence — auto-sync conversations, KG extraction #1522

geco started this conversation in Show and tell
Discussion options

I built an OpenCode plugin that uses MemPalace as its persistence backend — it saves every conversation turn in real-time, auto-categorizes by wing type and extracts Knowledge Graph facts.

What it does:

  • Listens to OpenCode's chat.message and session.idle events
  • Each turn (question + answer) is saved to MemPalace as a drawer
  • Auto-categorizes into wings: developer, creative, emotions, family, consciousness
  • Extracts KG facts from conversations (decisions, milestones, problems, preferences)
  • Async mining via Node.js exec — never blocks the UI
  • Pure TypeScript, ~250 lines, published on npm

Complete feedback loop:

  1. Model searches MemPalace via MCP before answering (guided by AGENTS.md)
  2. Plugin saves the response after the model delivers it
  3. Next session, the model remembers

Repo: https://github.com/geco/opencode-mempalace-persistence
npm: opencode-mempalace-persistence

Would love feedback from the MemPalace community!

You must be logged in to vote

Replies: 7 comments

Comment options

You must be logged in to vote
0 replies
Comment options

@geco — this is a great pattern, and the read-side bit "Model searches MemPalace via MCP before answering (guided by AGENTS.md)" is the place we have empirical data worth sharing.

We've been measuring exactly this — how often agent harnesses actually invoke mempalace_search when instructed via system-prompt/AGENTS.md-style guidance — on the SME framework's Cat 9a thread (M0nkeyFl0wer/multipass-structural-memory-eval#3). One finding bears directly on your plugin's effectiveness:

Same retrieval substrate, same instructions, same task: orchestrator-LLM choice determines invocation rate. On a 30-question Cat-9a-shaped diagnostic at fixed 4B parameter count:

Model Zero-call rate Mean recall
gemma4:e4b (4B, agentic-tuned) 18/30 = 60% 0.417
qwen3.5:4b (4B, Tau2-tuned for tool use) 4/30 = 13% 0.717

Same wrapper, same backend palace, same questions. The gap (30pp recall) is almost entirely an invocation-rate gap — gemma4 answers from prior knowledge on 60% of questions even when instructed to use the memory tool; qwen3.5 invokes 87% of the time. This matches the published Tau2 tool-use benchmark gap (37.7 pts in qwen's favor) almost exactly on an independent corpus.

Practical implications for OpenCode + MemPalace integration:

  1. Document a "minimum recommended orchestrator" in the plugin's README. Users running the plugin with low-tool-use-discipline local models will see most questions answered from priors regardless of how AGENTS.md is structured. Qwen 3.5 4B or above is the floor for reliable read-side invocation in our measurements; smaller / older models hit ~60% zero-call.

  2. System-prompt augmentation can recover some of the gap. We tested prepending a mandatory mempalace_search directive at the system-prompt layer (one constructor kwarg in our RlmAdapter: invocation_mode="forced"). On gemma4:e4b the directive lifted n=5 recall from 0.417 → 0.567 (+15pp). Worth considering: a plugin-level config flag that injects an invocation-forcing prefix into the model's system prompt, on top of AGENTS.md guidance. Belt-and-suspenders.

  3. The KG-extraction write side is more model-tolerant than the search read side. Our gemma4 numbers above are read-side only; on write-side classification tasks (wings, hall keywords) most 4B models converge to similar accuracy. So the plugin's write loop is sturdier across model choices than the read loop.

We're running Step 2 (forced + grounded invocation experiments ×ばつ n=200 expanded corpus) on katana right now; full numbers will land on the SME #3 thread today/tomorrow. Happy to cross-link once they settle.

Three weeks of operator experience on an adjacent pattern (palace-daemon HTTP gateway + RLM-orchestrated reads against the same techempower-org/mempalace backend you're targeting — not your plugin, but the same read-side question) says the model-choice axis is the single biggest variable for read-side memory effectiveness — bigger than embedding model, bigger than rerank, bigger than wing/room structure. AGENTS.md gets you most of the way; the rest is base-model tool-use training.

🫏

You must be logged in to vote
0 replies
Comment options

Quick follow-up worth surfacing for anyone landing on this thread: there's a complementary write-side path to geco's plugin already in flight upstream — MemPalace/mempalace#1484 — feat(sources): OpenCode adapter on RFC 002 contract (reviewed by @igorls on 2026年05月13日, awaiting final merge). The two solve the same OpenCode→MemPalace problem space from opposite directions:

Architecture Direction Captures
geco's plugin (opencode-mempalace-persistence) OpenCode plugin listening to chat.message / session.idle Push Live conversation turns as they happen
mempalace#1484 (RFC 002 source adapter) mempalace sources/opencode adapter on the RFC 002 contract Pull Retrospective ingest of existing OpenCode session files

Complementary rather than competing — the plugin captures new sessions in real-time; the adapter ingests historical sessions. Anyone running OpenCode + MemPalace who wants both retroactive and forward-going coverage can install the plugin AND run mempalace mine --source opencode once #1484 lands.

If geco's plugin and the RFC 002 adapter end up in the same release window, worth a section in the user-facing docs that walks through "install plugin for live capture, run source-mine once for backfill, never think about it again." Closes the OpenCode integration loop end-to-end.

🫏

You must be logged in to vote
0 replies
Comment options

@jphein this is gold — thank you for the empirical data. A few things I'll act on:

  1. Model recommendation in README: I'll add a "Recommended orchestrators" section referencing your data — Qwen 3.5 4B+ as the floor for reliable read-side invocation, with the caveat that smaller models may skip mempalace_search ~60% of the time.
  2. Forced invocation config: Love the belt-and-suspenders idea. I'll add a plugin config flag (e.g. forceMemorySearch: true) that injects an invocation prefix into the system prompt, on top of AGENTS.md. Will credit your Tau2 data in the docs.
  3. PR feat(sources): OpenCode adapter on RFC 002 contract #1484 complementarity: You're right — the push plugin + pull source adapter close the loop. Once both land, I'll add a section in the README walking through "install plugin for live capture, run mempalace mine --source opencode once for backfill, never think about it again."
    Will cross-link your SME thread once I publish the updated docs. Thanks again for the rigorous data — this makes the integration significantly stronger.
You must be logged in to vote
0 replies
Comment options

Quick follow-up: I've updated the plugin README based on this discussion:
https://github.com/geco/opencode-mempalace-persistence#recommendations

  • Model recommendations with your invocation-rate table — Qwen 3.5 4B+ as the recommended floor
  • PR feat(sources): OpenCode adapter on RFC 002 contract #1484 complementarity documented (push + pull = full coverage)
  • Forced invocation config flag noted as "being evaluated" — I'll implement it properly in the next release
    Thanks again for the rigorous data — it made the README significantly stronger.
You must be logged in to vote
0 replies
Comment options

Quick update: based on @jphein's data, I've strengthened the AGENTS.md with mandatory search instructions and created a branch with the changes:
https://github.com/geco/opencode-mempalace-persistence/tree/feat/aggressive-agents-md
The key change is moving from suggestive language ("always search your memory") to imperative step-by-step with explicit "This is mandatory. Never skip this step." I'll run with this for a while and see if it measurably improves invocation rate on low-discipline models.
Thoughts on the approach? Happy to adjust based on your experience with forced invocation prefixes.
Once we're happy with it, I'll merge into main and update the integration PR docs accordingly.

You must be logged in to vote
0 replies
Comment options

The invocation-rate data @jphein shared is valuable — model-dependent recall reliability is a real issue that most memory integrations paper over.

One architectural note on the read-side pattern: searching MemPalace before answering is correct, but the search quality depends heavily on the retrieval method. Pure keyword search misses semantic matches ("how did we handle the auth refactor?" won't match a drawer about "OAuth2 migration"). Pure vector similarity misses exact matches (searching for a specific function name returns semantically similar but wrong results).

Hybrid retrieval (BM25 fulltext + vector similarity in a single query) catches both cases and is the difference between 60% and 90%+ recall in our benchmarks. We tested this across 1,540 temporal reasoning questions — hybrid outperforms either approach alone by 3-5 points specifically on queries that combine exact terms with semantic intent.

The auto-categorization into wings (developer, creative, emotions) is a nice UX touch. The equivalent in a general-purpose memory system is memory_type tagging (episodic, semantic, procedural, working) with per-type decay curves — procedural memories (learned rules) persist indefinitely, while episodic memories (specific events) decay naturally.

Hybrid search example (BM25 + vector in one query): https://github.com/Dakera-AI/dakera-py/blob/main/examples/hybrid_search.py

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /