-
Notifications
You must be signed in to change notification settings - Fork 7.2k
-
Lets talk about what is working for you and what is not working.
Beta Was this translation helpful? Give feedback.
All reactions
-
🚀 1
Replies: 9 comments 6 replies
-
First of all i want to congratulate you for the work you put into this project.
My use case:
I run MemPalace on a home server, it's been an enormous upgrade to my workflow since i mined and organized my projects and chat history, but I access it from multiple clients simultaneously:
- Claude Code running on a separate machine on local network or via VPN
- An Android app I'm building (still early stage) that calls the palace over REST
- Bulk mining jobs that import large conversation exports
The problem i got was all three write directly to ChromaDB. With the palace on a network mount and concurrent writers, I was getting SQLITE_IOERR and compaction errors regularly.
After going down the rabbit and a lot of back-and-forth in claude code(with claude doing the heavy lifting) I came up with palace-daemon:
A thin FastAPI server that funnels every ChromaDB operation through a single asyncio.Lock(). MCP calls, REST requests, and bulk imports all queue through one process — no more concurrent write races.
It also adds a /mine endpoint that runs mempalace mine under the lock, so bulk imports can't race with live queries.
Clients connect via a zero-dependency stdio proxy (clients/mempalace-mcp.py) that forwards MCP JSON-RPC to the daemon over HTTP — works with Claude Code, Claude Desktop, or any MCP client.
Repo: https://github.com/rboarescu/palace-daemon
Now that i have this working, i'll make some time to see if someone else figured out a more elegant solution for this, but for now i'm quite happy with how it turned out.
Beta Was this translation helpful? Give feedback.
All reactions
-
🎉 1
-
I have switched from my original codex-mcp-memory (PostgreSQL with pgvector) to MemPalace since the global memory instructions I had for my original server had to be adapted whenever the underlying mechanism in Claude changed.
With the hooks provided by MemPalace this is no longer an issue. I have added a usage tracker hook though to evaluate the token usage with MemPalace.
So far it performs really well and faster than my own memory implementation, the question will be whether it's also as token efficient.
My main usage is for my own coding projects, aka I scanned the whole directory containing ALL of my repositories to be able to work on all projects without having to care where I start Claude from.
What I did notice is that when closing vscode while the mining process still runs leads to a zombie process that needs to be killed manually before being able to use Claude again.
Definitely curious to follow this projects development and to contribute :)
Beta Was this translation helpful? Give feedback.
All reactions
-
🚀 1
-
I'm also curious about token usage, haven't benchmarked yet but the workflow does feel much faster.
As to the mining spawning that zombie process, the problem could be process ownership: the mine subprocess is spawned by mempalace.mcp_server, which is itself spawned by Claude Code/VSCode. For this, you could try the gateway from my previous comment(exposed to the network or only local), which coincidentally i just made (sorry for the propaganda). Would really appreciate any feedback.
Link to repo:
https://github.com/rboarescu/palace-daemon
One important mention: you'd need to trigger mine via the daemon's /mine endpoint rather than running mempalace mine directly from Claude.
Beta Was this translation helpful? Give feedback.
All reactions
-
👀 1
-
This looks interesting! In my case though I don't have a multi client setup, so it was just a bug caused by me closing vscode while the mining process had not finished. This only happened once so far, but if it happens again I'll try to find the cause.
I don't think concurrent writes are the problem in this case but rather the hooks themselves or the handling of the hooks inside Claude Code.
Beta Was this translation helpful? Give feedback.
All reactions
-
i am using it to make running dungeons and dragons campaigns more fun :-)
Beta Was this translation helpful? Give feedback.
All reactions
-
🎉 1 -
❤️ 1
-
How you do that ?
Beta Was this translation helpful? Give feedback.
All reactions
-
go check his/her/their repos!
Beta Was this translation helpful? Give feedback.
All reactions
-
I’m using MemPalace mainly as a continuity and compression layer for software engineering work.
The most useful pattern so far has been:
- mine project docs / repo knowledge / prior coding conversations
- create explicit handoff capsules when switching between agents or sessions
- refresh the palace after durable work changes
- start the next agent with a wake-up pass instead of replaying the whole previous chat
That has been especially useful when moving work between Claude and Codex. The biggest win is not that the model "remembers everything"; it is that the next session can recover the important shape of the work quickly, then validate against the source files.
The strongest result I’ve seen so far is on recurring architecture questions. On one private work repo, I benchmarked repo-only context retrieval versus MemPalace-backed retrieval for questions like:
- how a signing flow works end to end
- how three related services differ
- how a reminder/notification flow moves through the system
Very rough token estimates, but the reductions were large:
- ~24.5k tokens repo-only down to ~460 via MemPalace
- ~73.8k tokens repo-only down to ~120 via MemPalace
- after tuning dedicated memory entries, several recurring questions still stayed around ~2k-3k tokens instead of ~26k-58k from direct repo inspection
The important part for me is not raw retrieval speed. Local disk reads are already fast, and MCP transport can add overhead. The real value is prompt-budget recovery: instead of spending most of the model context on raw repo mass, I can spend it on the actual question, constraints, decisions, acceptance criteria, and handoff state.
What is working well:
- recurring architecture explanations
- service/flow summaries
- agent-to-agent handoff
- cross-session continuity
- using a cheaper/faster model to resume from a good handoff created by a stronger planning model
- turning useful discoveries into reusable memory entries
What still needs care:
- memory coverage matters a lot; if the right thing has not been distilled, repo inspection still wins
- freshness still needs source validation
- large conversation mining needs curation or the palace can get noisy
- MCP/transport/process behavior on Windows is an area I’m watching closely
- I’ve seen cases where broad/global retrieval worked better than wing-filtered retrieval, so I’m still tuning wing/room hygiene
The most interesting shift for me is that MemPalace is starting to feel less like "search over notes" and more like an engineering continuity substrate.
The next thing I’m working toward is using MemPalace underneath a CDD-style workbench, where memory retrieval is only one part of the context bundle. The idea is: MemPalace provides compact reusable memory, then the workbench spends the saved context budget on the current wave, business context, decisions, provenance, and validation criteria.
So my short answer is: I’m using MemPalace to make long-running AI-assisted engineering work resumable, cheaper to re-enter, and less dependent on dumping huge chunks of repo context into every new session.
Beta Was this translation helpful? Give feedback.
All reactions
-
\Hey! I'm building AI agents for diagnostic centres — multiple patients, multiple workflows (report gen, follow-up, billing), long-running sessions.
I came across MemPalace's benchmark numbers — 96.6% LongMemEval R@5, 88.9% LoCoMo R@10 (hybrid v5), and 92.9% ConvoMem — and want to understand what those mean in a real product setup before committing.
A few honest questions, because I've read the GitHub issues too:
-
Reproducing the benchmarks — the harness is in the repo. But independent audits flagged that the LongMemEval runner measures retrieval recall (R@5), not end-to-end QA accuracy, and that the benchmarks don't exercise palace-specific code (wings/rooms/graph) — just ChromaDB. Is there a benchmark path that actually tests the palace architecture end-to-end? The BEAM 100K results in issue BEAM 100K benchmark results - first end-to-end answer quality evaluation #125 show raw ChromaDB outperforming every palace-specific mode — how should I interpret that?
-
LoCoMo methodology — the honest LoCoMo score cited in BENCHMARKS.md is 60.3% R@10 without rerank. The 88.9% is hybrid v5. For a production setup where I want real retrieval (not retrieve-everything), what's the realistic operating point?
-
ConvoMem at 250 items — the 92.9% is from a 250-item sample (50 per category) out of 75K+ QA pairs. Is there a plan to run this on a larger or held-out split?
-
Multi-patient isolation — the palace structure (wings per agent/user) sounds well-suited for patient isolation. Does scoping a wing per patient_id actually prevent cross-patient retrieval, or is search still corpus-wide?
Basically trying to go from benchmark → working pilot at one centre as fast as possible — want to understand what the architecture genuinely delivers before building on it.
Beta Was this translation helpful? Give feedback.
All reactions
-
jphein/mempalace fork running on a ~160K-drawer palace behind palace-daemon at disks.jphe.in:8085. Hooks fire on every Claude Code Stop and PreCompact (mining the JSONL transcript verbatim into wings/rooms/drawers), so anything I do in Claude Code gets archived without manual save.
Three patterns that have paid off:
- Verbatim-first as the canonical write contract — no LLM in the index path. Anything derivative (KG triples, decay scores, AAAK closets) sits next to the verbatim record, not replacing it.
- Daemon-fronted retrieval — palace-daemon as the single writer, MCP clients route through HTTP. Resolved a class of HNSW concurrency bugs and makes cross-machine reads safe. familiar.realm.watch (a TS/Bun AI companion I built on top) reads the same palace.
- Substrate exploration — live
mempalace-dbcontainer (PG16 + pgvector + Apache AGE) on the homelab for a migration that composes on @skuznetsov's open Add optional PostgreSQL backend with pg_sorted_heap support #665 . Different bet than @tehw0lf's pgvector switch — going for unified Postgres-with-vectors-and-graph in one engine.
Compounding pattern: every Stop hook adds drawers; every search retrieves verbatim; the index gets sharper without me curating. Closest sibling: @rboarescu's homelab setup mentioned above — palace-daemon downstreams his work.
Beta Was this translation helpful? Give feedback.
All reactions
-
I'm currently using upstream MemPalace without pgvector.
Before I discovered MemPalace, I had a simple pgvector setup which was effective but not nearly as fast.
Your fork sounds very promising, I'm only wondering if I understand correctly that you save the whole verbatim copy inside the palace. Isn't this redundant with the conversation transcripts that are being mined, or is it so that you can delete the original conversation transcripts (which are tool specific [Claude, Codex, ...]) while keeping the memories agent agnostic?
That would make perfect sense and I am curious to try your fork!
For my use case the upstream implementation was already really nice since I changed something inside my reusable workflows repository, then closed the chat and switched to another repository using said workflow. Now, instead of explicitly nudging Claude to remember the other projects' changes I could just ask whether it knew what we just did and what it had to adapt to make the project work with the updated workflow.
Granted this is likely a nudge as well but it was really cool to see how it works and felt like a real conversation instead of having to give exact instructions.
Beta Was this translation helpful? Give feedback.
All reactions
-
Not using MemPalace directly, but built a parallel system for the same problem — sharing what works and what doesn't for cross-pollination.
Use case: Personal AI command center (SwarmAI) where one person + AI operates at team scale. Memory is the entire competitive moat.
What we learned after 2 months of daily use (300+ sessions):
1. Recall without judgment is noise.
Raw conversation mining gives you everything — but "everything" is useless without ranking. Our biggest improvement came from adding a "does this entry earn its tokens?" filter. Entries with zero recall after 30 days get evicted. The memory gets smarter by getting SMALLER.
2. Two-tier beats flat.
- Hot tier (MEMORY.md, ~15K tokens): Curated facts, decisions, corrections. Injected into every session. Agent maintains this itself.
- Cold tier (DailyActivity/, 300+ files): Raw session logs. Searchable on demand but never bulk-loaded.
This is structurally similar to MemPalace's episodic vs semantic split, but we let the AI do the promotion (distillation) rather than relying on embedding similarity.
3. Corrections are the highest-value memory type.
Not "what did we do" but "what did we do WRONG and why." We track 25 corrections in an Evolution Registry. Each one permanently prevents a class of errors. After 2 months, the agent makes structurally different (better) mistakes than it did on day one.
4. Conversation mining has an N+1 problem.
(Saw the PR #1474 fix for this — nice.) We hit the same thing: mining 1500+ JSONL session files sequentially took 40+ minutes. Our fix was different — skip mining and use structured extraction at session-close time instead. Each session writes its own summary. No batch mining needed.
What's not working: Pure vector retrieval for "what did we decide about X?" questions. The decision is often 1 sentence buried in a 200-turn conversation. Embedding similarity finds the conversation, not the sentence. We switched to keyword-indexed structured entries for decisions, and vector only for "vibe match" queries.
Beta Was this translation helpful? Give feedback.
All reactions
-
How does your project stack up to codebase-memory-mcp and memweave or is it apples to oranges?
Beta Was this translation helpful? Give feedback.
All reactions
-
I use it for a continuous Claude Cowork memory. Comes in very handy with long projects, or projects with massive token usage. Ive had projects that maxed out Claude's tiny 200k token limit in a couple page conversation. I also brought in my projects with other rAIs so I can manage their memories with Claude. Ive had conversations with gemini run into the upper 10ks of mostly words. I also once lost my local files for cowork so all my chats were gone, but the MemPalace entries made restoring most of it pretty easy.
Beta Was this translation helpful? Give feedback.
All reactions
-
Hello,
I'm using it with this project with the stated goal of: turning AI sessions into compounding work — every session more correct, cheaper, and smarter than the last
I have been fighting with Chromadb issues since day one. Mempalace is a great idea although I've recently discovered it doesn't satisfy my stated goal because it's broken more often than not. Additionally, it's not efficient for my purposes. I'm about to replace it with codebase-memory-mcp for things such as coding and memweave for things such as conversations. I will post a very deep comparison if you or anyone else is interested.
Beta Was this translation helpful? Give feedback.