-
Notifications
You must be signed in to change notification settings - Fork 7.2k
-
I love mempalace. Thank you, milla! It's made my large multi-project work almost effortless. Yet I just had my second ChromaDB crash 🥺and have to rebuild a large database, which means downtime. Has anybody migrated from ChromaDB to PostgreSQL? Is it possible? Does it work? Gotchas?
Funny, after the rebuild check: Used mempalace: mempalace search
Semantic search is back and ranking is sensible (top hit is even a prior repair episode — apt).
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment
-
Second ChromaDB crash plus a full-rebuild-and-downtime is exactly the pain that pushed us off ChromaDB too — so you have a lot of company here, and the short answer is yes, it's possible, it's done, and the recall is unchanged.
Our fork (techempower-org/mempalace) runs PostgreSQL + pgvector as the substrate, with BM25 via Postgres tsvector/pg_trgm for the lexical half of hybrid search, and Apache AGE as an optional graph layer on top (you do not need AGE just to escape ChromaDB — it's additive, not required). Full writeup with the architecture and where the real bottleneck turned out to be: #1659
On "does it work / what does it cost in recall" — no recall cost. We verified substrate-parity directly: Postgres+pgvector vs upstream ChromaDB, same MiniLM embedding, agree to four decimal places on LongMemEval-S — R@5 = 0.9660 overall, identical per-category across all six question types (knowledge-update, multi-session, single-session-assistant/preference/user, temporal-reasoning), with byte-identical per-question rankings including the questions that miss top-5 in both. The reason it's exact rather than just close: the migration recomputes embeddings from the original document text with the same embedding function ChromaDB uses, which is deterministic per (model, text), so the vectors match. Numbers + method: docs/benchmarks/2026-05-17-longmemeval-substrate-parity.md in the SME repo.
For your specific pain — repeated corruption forcing a re-embed-the-world rebuild — this is the headline benefit: Postgres gives you durability, transactions, and pg_dump/replication, so crash recovery becomes a DB restore (or a replica failover) instead of rebuilding the index from scratch. The class of "my vector store segfaulted and now I'm re-ingesting everything" failure mostly goes away, because the data of record lives in WAL-backed tables you can snapshot.
Honest gotchas, because that's the useful part:
- It's a re-ingest / re-embed, not a drop-in file swap. The first migration recomputes every embedding from source text, so plan for it taking minutes-to-longer at six-figure drawer counts. After that you're on durable storage.
- Embedding dimension has to match (MiniLM is 384-dim here). If you ever change the embedding model you re-embed anyway — same as ChromaDB — but keep the dimension consistent within a migration.
- pgvector uses an ANN index (we build HNSW with
vector_cosine_ops), so there's the usual recall/latency knob — HNSW recall is excellent at sane build params but it's still approximate, same as ChromaDB's HNSW. Our parity run above was on identical retrieval, so in practice this hasn't cost us anything, but it's the dial to know about. - Postgres is more operational surface than a single ChromaDB file — you're running a database now (extensions:
pgvector, plusageonly if you want the graph layer). For most people that's a net win given what it buys, but it's not zero ops.
One thing that matters for your crashed-ChromaDB situation specifically: our migration tool reads drawers straight from chroma.sqlite3 rather than through the chromadb Python client, precisely because chromadb 1.5.x SIGSEGVs on stale HNSW state. The on-disk sqlite tables are stable even when the HNSW binary layout is wrecked — so the migration runs even when ChromaDB itself is too broken to open. It's a restartable, checkpointed, idempotent run (preflight → schema → copy drawers → copy closets → build indexes → optional KG→AGE → verify), safe to re-invoke against the same source/target.
Happy to share the migration approach and the script if it's useful — glad mempalace has been good to your multi-project work, and this is a very fixable kind of pain. 🫏
Beta Was this translation helpful? Give feedback.