Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

What are you using MemPalace for? #1056

bensig started this conversation in General
Apr 21, 2026 · 9 comments · 6 replies
Discussion options

Lets talk about what is working for you and what is not working.

You must be logged in to vote

Replies: 9 comments 6 replies

Comment options

First of all i want to congratulate you for the work you put into this project.

My use case:
I run MemPalace on a home server, it's been an enormous upgrade to my workflow since i mined and organized my projects and chat history, but I access it from multiple clients simultaneously:

  • Claude Code running on a separate machine on local network or via VPN
  • An Android app I'm building (still early stage) that calls the palace over REST
  • Bulk mining jobs that import large conversation exports

The problem i got was all three write directly to ChromaDB. With the palace on a network mount and concurrent writers, I was getting SQLITE_IOERR and compaction errors regularly.

After going down the rabbit and a lot of back-and-forth in claude code(with claude doing the heavy lifting) I came up with palace-daemon:
A thin FastAPI server that funnels every ChromaDB operation through a single asyncio.Lock(). MCP calls, REST requests, and bulk imports all queue through one process — no more concurrent write races.

It also adds a /mine endpoint that runs mempalace mine under the lock, so bulk imports can't race with live queries.

Clients connect via a zero-dependency stdio proxy (clients/mempalace-mcp.py) that forwards MCP JSON-RPC to the daemon over HTTP — works with Claude Code, Claude Desktop, or any MCP client.
Repo: https://github.com/rboarescu/palace-daemon

Now that i have this working, i'll make some time to see if someone else figured out a more elegant solution for this, but for now i'm quite happy with how it turned out.

You must be logged in to vote
0 replies
Comment options

I have switched from my original codex-mcp-memory (PostgreSQL with pgvector) to MemPalace since the global memory instructions I had for my original server had to be adapted whenever the underlying mechanism in Claude changed.

With the hooks provided by MemPalace this is no longer an issue. I have added a usage tracker hook though to evaluate the token usage with MemPalace.

So far it performs really well and faster than my own memory implementation, the question will be whether it's also as token efficient.

My main usage is for my own coding projects, aka I scanned the whole directory containing ALL of my repositories to be able to work on all projects without having to care where I start Claude from.

What I did notice is that when closing vscode while the mining process still runs leads to a zombie process that needs to be killed manually before being able to use Claude again.

Definitely curious to follow this projects development and to contribute :)

You must be logged in to vote
2 replies
Comment options

I'm also curious about token usage, haven't benchmarked yet but the workflow does feel much faster.

As to the mining spawning that zombie process, the problem could be process ownership: the mine subprocess is spawned by mempalace.mcp_server, which is itself spawned by Claude Code/VSCode. For this, you could try the gateway from my previous comment(exposed to the network or only local), which coincidentally i just made (sorry for the propaganda). Would really appreciate any feedback.

Link to repo:
https://github.com/rboarescu/palace-daemon

One important mention: you'd need to trigger mine via the daemon's /mine endpoint rather than running mempalace mine directly from Claude.

Comment options

This looks interesting! In my case though I don't have a multi client setup, so it was just a bug caused by me closing vscode while the mining process had not finished. This only happened once so far, but if it happens again I'll try to find the cause.

I don't think concurrent writes are the problem in this case but rather the hooks themselves or the handling of the hooks inside Claude Code.

Comment options

i am using it to make running dungeons and dragons campaigns more fun :-)

You must be logged in to vote
2 replies
Comment options

How you do that ?

Comment options

go check his/her/their repos!

Comment options

@bensig

I’m using MemPalace mainly as a continuity and compression layer for software engineering work.

The most useful pattern so far has been:

  • mine project docs / repo knowledge / prior coding conversations
  • create explicit handoff capsules when switching between agents or sessions
  • refresh the palace after durable work changes
  • start the next agent with a wake-up pass instead of replaying the whole previous chat

That has been especially useful when moving work between Claude and Codex. The biggest win is not that the model "remembers everything"; it is that the next session can recover the important shape of the work quickly, then validate against the source files.

The strongest result I’ve seen so far is on recurring architecture questions. On one private work repo, I benchmarked repo-only context retrieval versus MemPalace-backed retrieval for questions like:

  • how a signing flow works end to end
  • how three related services differ
  • how a reminder/notification flow moves through the system

Very rough token estimates, but the reductions were large:

  • ~24.5k tokens repo-only down to ~460 via MemPalace
  • ~73.8k tokens repo-only down to ~120 via MemPalace
  • after tuning dedicated memory entries, several recurring questions still stayed around ~2k-3k tokens instead of ~26k-58k from direct repo inspection

The important part for me is not raw retrieval speed. Local disk reads are already fast, and MCP transport can add overhead. The real value is prompt-budget recovery: instead of spending most of the model context on raw repo mass, I can spend it on the actual question, constraints, decisions, acceptance criteria, and handoff state.

What is working well:

  • recurring architecture explanations
  • service/flow summaries
  • agent-to-agent handoff
  • cross-session continuity
  • using a cheaper/faster model to resume from a good handoff created by a stronger planning model
  • turning useful discoveries into reusable memory entries

What still needs care:

  • memory coverage matters a lot; if the right thing has not been distilled, repo inspection still wins
  • freshness still needs source validation
  • large conversation mining needs curation or the palace can get noisy
  • MCP/transport/process behavior on Windows is an area I’m watching closely
  • I’ve seen cases where broad/global retrieval worked better than wing-filtered retrieval, so I’m still tuning wing/room hygiene

The most interesting shift for me is that MemPalace is starting to feel less like "search over notes" and more like an engineering continuity substrate.

The next thing I’m working toward is using MemPalace underneath a CDD-style workbench, where memory retrieval is only one part of the context bundle. The idea is: MemPalace provides compact reusable memory, then the workbench spends the saved context budget on the current wave, business context, decisions, provenance, and validation criteria.

So my short answer is: I’m using MemPalace to make long-running AI-assisted engineering work resumable, cheaper to re-enter, and less dependent on dumping huge chunks of repo context into every new session.

You must be logged in to vote
0 replies
Comment options

\Hey! I'm building AI agents for diagnostic centres — multiple patients, multiple workflows (report gen, follow-up, billing), long-running sessions.

I came across MemPalace's benchmark numbers — 96.6% LongMemEval R@5, 88.9% LoCoMo R@10 (hybrid v5), and 92.9% ConvoMem — and want to understand what those mean in a real product setup before committing.

A few honest questions, because I've read the GitHub issues too:

  1. Reproducing the benchmarks — the harness is in the repo. But independent audits flagged that the LongMemEval runner measures retrieval recall (R@5), not end-to-end QA accuracy, and that the benchmarks don't exercise palace-specific code (wings/rooms/graph) — just ChromaDB. Is there a benchmark path that actually tests the palace architecture end-to-end? The BEAM 100K results in issue BEAM 100K benchmark results - first end-to-end answer quality evaluation #125 show raw ChromaDB outperforming every palace-specific mode — how should I interpret that?

  2. LoCoMo methodology — the honest LoCoMo score cited in BENCHMARKS.md is 60.3% R@10 without rerank. The 88.9% is hybrid v5. For a production setup where I want real retrieval (not retrieve-everything), what's the realistic operating point?

  3. ConvoMem at 250 items — the 92.9% is from a 250-item sample (50 per category) out of 75K+ QA pairs. Is there a plan to run this on a larger or held-out split?

  4. Multi-patient isolation — the palace structure (wings per agent/user) sounds well-suited for patient isolation. Does scoping a wing per patient_id actually prevent cross-patient retrieval, or is search still corpus-wide?

Basically trying to go from benchmark → working pilot at one centre as fast as possible — want to understand what the architecture genuinely delivers before building on it.

You must be logged in to vote
0 replies
Comment options

jphein/mempalace fork running on a ~160K-drawer palace behind palace-daemon at disks.jphe.in:8085. Hooks fire on every Claude Code Stop and PreCompact (mining the JSONL transcript verbatim into wings/rooms/drawers), so anything I do in Claude Code gets archived without manual save.

Three patterns that have paid off:

  • Verbatim-first as the canonical write contract — no LLM in the index path. Anything derivative (KG triples, decay scores, AAAK closets) sits next to the verbatim record, not replacing it.
  • Daemon-fronted retrieval — palace-daemon as the single writer, MCP clients route through HTTP. Resolved a class of HNSW concurrency bugs and makes cross-machine reads safe. familiar.realm.watch (a TS/Bun AI companion I built on top) reads the same palace.
  • Substrate exploration — live mempalace-db container (PG16 + pgvector + Apache AGE) on the homelab for a migration that composes on @skuznetsov's open Add optional PostgreSQL backend with pg_sorted_heap support #665 . Different bet than @tehw0lf's pgvector switch — going for unified Postgres-with-vectors-and-graph in one engine.

Compounding pattern: every Stop hook adds drawers; every search retrieves verbatim; the index gets sharper without me curating. Closest sibling: @rboarescu's homelab setup mentioned above — palace-daemon downstreams his work.

You must be logged in to vote
1 reply
Comment options

I'm currently using upstream MemPalace without pgvector.

Before I discovered MemPalace, I had a simple pgvector setup which was effective but not nearly as fast.

Your fork sounds very promising, I'm only wondering if I understand correctly that you save the whole verbatim copy inside the palace. Isn't this redundant with the conversation transcripts that are being mined, or is it so that you can delete the original conversation transcripts (which are tool specific [Claude, Codex, ...]) while keeping the memories agent agnostic?

That would make perfect sense and I am curious to try your fork!

For my use case the upstream implementation was already really nice since I changed something inside my reusable workflows repository, then closed the chat and switched to another repository using said workflow. Now, instead of explicitly nudging Claude to remember the other projects' changes I could just ask whether it knew what we just did and what it had to adapt to make the project work with the updated workflow.

Granted this is likely a nudge as well but it was really cool to see how it works and felt like a real conversation instead of having to give exact instructions.

Comment options

Not using MemPalace directly, but built a parallel system for the same problem — sharing what works and what doesn't for cross-pollination.

Use case: Personal AI command center (SwarmAI) where one person + AI operates at team scale. Memory is the entire competitive moat.

What we learned after 2 months of daily use (300+ sessions):

1. Recall without judgment is noise.
Raw conversation mining gives you everything — but "everything" is useless without ranking. Our biggest improvement came from adding a "does this entry earn its tokens?" filter. Entries with zero recall after 30 days get evicted. The memory gets smarter by getting SMALLER.

2. Two-tier beats flat.

  • Hot tier (MEMORY.md, ~15K tokens): Curated facts, decisions, corrections. Injected into every session. Agent maintains this itself.
  • Cold tier (DailyActivity/, 300+ files): Raw session logs. Searchable on demand but never bulk-loaded.

This is structurally similar to MemPalace's episodic vs semantic split, but we let the AI do the promotion (distillation) rather than relying on embedding similarity.

3. Corrections are the highest-value memory type.
Not "what did we do" but "what did we do WRONG and why." We track 25 corrections in an Evolution Registry. Each one permanently prevents a class of errors. After 2 months, the agent makes structurally different (better) mistakes than it did on day one.

4. Conversation mining has an N+1 problem.
(Saw the PR #1474 fix for this — nice.) We hit the same thing: mining 1500+ JSONL session files sequentially took 40+ minutes. Our fix was different — skip mining and use structured extraction at session-close time instead. Each session writes its own summary. No batch mining needed.

What's not working: Pure vector retrieval for "what did we decide about X?" questions. The decision is often 1 sentence buried in a 200-turn conversation. Embedding similarity finds the conversation, not the sentence. We switched to keyword-indexed structured entries for decisions, and vector only for "vibe match" queries.

You must be logged in to vote
1 reply
Comment options

How does your project stack up to codebase-memory-mcp and memweave or is it apples to oranges?

Comment options

I use it for a continuous Claude Cowork memory. Comes in very handy with long projects, or projects with massive token usage. Ive had projects that maxed out Claude's tiny 200k token limit in a couple page conversation. I also brought in my projects with other rAIs so I can manage their memories with Claude. Ive had conversations with gemini run into the upper 10ks of mostly words. I also once lost my local files for cowork so all my chats were gone, but the MemPalace entries made restoring most of it pretty easy.

You must be logged in to vote
0 replies
Comment options

Hello,

I'm using it with this project with the stated goal of: turning AI sessions into compounding work — every session more correct, cheaper, and smarter than the last

I have been fighting with Chromadb issues since day one. Mempalace is a great idea although I've recently discovered it doesn't satisfy my stated goal because it's broken more often than not. Additionally, it's not efficient for my purposes. I'm about to replace it with codebase-memory-mcp for things such as coding and memweave for things such as conversations. I will post a very deep comparison if you or anyone else is interested.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

AltStyle によって変換されたページ (->オリジナル) /