Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

CodeNinjaSarthak/eidetic-memory

Repository files navigation

🧠 Eidetic Memory

Long-term memory for AI agents β€” extracts, evolves, and retrieves facts across conversations.


✨ Why Eidetic Memory

  • 🧠 Remembers what matters β€” extracts atomic facts from every conversation turn, not raw chat logs
  • ⚑ Evolves over time β€” ADD, UPDATE, DELETE, NOOP decisions keep memory fresh and conflict-free
  • 🎯 Retrieves the right context β€” semantic search + neural reranking surfaces the most recent, relevant facts at query time
  • πŸ—£οΈ Multi-party ready β€” per-speaker memory isolation prevents cross-speaker contamination in group conversations
  • πŸ”Œ Any LLM, any time β€” swap Claude, Gemini, Azure OpenAI, or Groq with a single env var
  • πŸ“Š Benchmark-validated β€” 66.6% QA accuracy on LoCoMo, +50.2 pp on temporal questions over RAG, at just 1.02 LLM calls per query

How It Works

Every conversation turn passes through a four-stage pipeline:

×ば぀ 3 candidates\\n(per-speaker namespace)\"]\n H --> E\n E -->|candidates| I[Local Cross-Encoder\\ncross-encoder/ms-marco-MiniLM-L-6-v2]\n I -->|top-k reranked facts| J[Answer Generation LLM]\n J --> K[Response]\n J -->|\"don't know + non-OD (1.7%)\"| L[Two-pass Rephrase]\n L --> G\n\n style I fill:#8e44ad,color:#fff\n style E fill:#e74c3c,color:#fff\n style J fill:#3498db,color:#fff\n style CTX fill:#e67e22,color:#fff\n"}" data-plain="flowchart TD A[User Message] --> CTX["Context Assembly\ncurrent speaker Β· rolling summary (every 15 turns) Β· recent msgs (last 10)"] CTX --> B[ExtractionPipeline] B -->|candidate facts + speaker attribution| C[EvolutionEngine] C -->|ADD / UPDATE / DELETE / NOOP| D[QdrantMemoryStore] D -->|stored embeddings + payloads| E[(Qdrant Vector DB)] F[Query] --> G[MemoryRetriever] G -->|embed query| H["Vector Search\ntop-k ×ば぀ 3 candidates\n(per-speaker namespace)"] H --> E E -->|candidates| I[Local Cross-Encoder\ncross-encoder/ms-marco-MiniLM-L-6-v2] I -->|top-k reranked facts| J[Answer Generation LLM] J --> K[Response] J -->|"don't know + non-OD (1.7%)"| L[Two-pass Rephrase] L --> G style I fill:#8e44ad,color:#fff style E fill:#e74c3c,color:#fff style J fill:#3498db,color:#fff style CTX fill:#e67e22,color:#fff " dir="auto">
flowchart TD
 A[User Message] --> CTX["Context Assembly\ncurrent speaker Β· rolling summary (every 15 turns) Β· recent msgs (last 10)"]
 CTX --> B[ExtractionPipeline]
 B -->|candidate facts + speaker attribution| C[EvolutionEngine]
 C -->|ADD / UPDATE / DELETE / NOOP| D[QdrantMemoryStore]
 D -->|stored embeddings + payloads| E[(Qdrant Vector DB)]
 F[Query] --> G[MemoryRetriever]
 G -->|embed query| H["Vector Search\ntop-k ×ば぀ 3 candidates\n(per-speaker namespace)"]
 H --> E
 E -->|candidates| I[Local Cross-Encoder\ncross-encoder/ms-marco-MiniLM-L-6-v2]
 I -->|top-k reranked facts| J[Answer Generation LLM]
 J --> K[Response]
 J -->|"don't know + non-OD (1.7%)"| L[Two-pass Rephrase]
 L --> G
 style I fill:#8e44ad,color:#fff
 style E fill:#e74c3c,color:#fff
 style J fill:#3498db,color:#fff
 style CTX fill:#e67e22,color:#fff
Loading

πŸ“Š Evaluation

Evaluated on the LoCoMo benchmark across 10 conversations (n=1540 QA pairs) with per-speaker memory isolation.

Component accuracy

Metric Score Details
Fact extraction recall 95.0% n=100, QA pairs with evidence
Fact extraction precision 83.0% Wilson 95% CI [74.5, 89.1]
Speaker attribution accuracy 97.0% Wilson 95% CI [91.5, 99.0]
Conflict resolution accuracy 100% 26 test cases (ADD/UPDATE/DELETE/NOOP)

Retrieval quality

25.5% of incorrect answers had the correct fact present in Qdrant but it did not reach the top-30 reranked context window β€” the primary remaining retrieval bottleneck, addressable with sparse vector (BM25) retrieval on re-ingestion.

End-to-end QA accuracy

Category Accuracy
Temporal 75.1%
Open-domain 70.5%
Single-hop 50.4%
Multi-hop 51.0%
Overall 66.6%

Bootstrap 95% CIs: Overall [64.4%, 68.8%], Temporal [69.9%, 80.1%]
Average LLM calls per query: 1.02 (two-pass fires on 1.7% of non-open-domain queries)

SOTA Comparison (LoCoMo, LLM-as-judge)

System Overall Temporal Notes
RAG baseline (ours) 44.4% 24.9% Direct retrieval over raw turns
Pipeline v2 (ours) 46.6% 57.3% Per-speaker isolation + round-robin, no reranker
Eidetic Memory v1 (ours) 56.3% 64.2% + local cross-encoder
Eidetic Memory v2 (ours) 66.6% 75.1% + extraction context + prompt opt; 1.02 LLM calls/query
Mem0 ~66.9% β€” No speaker isolation; ×ば぀ more LLM calls per query
Memobase 75.78% 85.05% β€”
Hindsight (OSS-20B) 83.18% 76.32% β€”
Hindsight (OSS-120B) 85.67% 79.44% β€”

Eidetic Memory v2 is the only system in this comparison with per-speaker namespace isolation and a neural reranking step. Its temporal score (75.1%) is within noise of Hindsight-20B (76.3%) at a fraction of the model scale.

Progress

Run Score Details
Baseline 14.8% Gemini, conv-30 only, top-k=10
+ Per-speaker isolation 31.8% Multi-namespace retrieval
+ Fixed merge 52.4% Round-robin interleaving
+ Named entities + two-pass 53.2% n=233
+ Isolation + RR, no rerank 25.8% n=233 β€” reranker is load-bearing
+ Isolation + RR + local cross-encoder 55.8% n=233
+ Isolation + score-based + local cross-encoder 55.8% n=233 β€” merge strategy irrelevant
+ NER + two-pass + local cross-encoder 56.6% n=233
+ Local cross-encoder (full run, v1) 56.3% Full n=1540, all 10 convs
+ Extraction context (rolling summary + speaker attribution) 64.6% Full n=1540 (+8.3 pp)
+ Generation prompt optimization (v2, canonical) 66.6% Full n=1540 (+2.0 pp)

Retrieval Architecture

×ば぀3 facts| RR[\"Round-Robin Merge\\nzip_longest interleave\"]\n VB -->|top-k×ば぀3 facts| RR\n RR -->|180 candidates| JR[\"Local Cross-Encoder\\n(ms-marco-MiniLM-L-6-v2)\"]\n JR -->|top-30 reranked| LLM[\"Answer Generation\\n1 LLM call\"]\n LLM --> ANS[Answer]\n style JR fill:#8e44ad,color:#fff\n style LLM fill:#3498db,color:#fff\n"}" data-plain="flowchart LR Q[Question] --> VA["Vector Search\nSpeaker A namespace"] Q --> VB["Vector Search\nSpeaker B namespace"] VA -->|top-k×ば぀3 facts| RR["Round-Robin Merge\nzip_longest interleave"] VB -->|top-k×ば぀3 facts| RR RR -->|180 candidates| JR["Local Cross-Encoder\n(ms-marco-MiniLM-L-6-v2)"] JR -->|top-30 reranked| LLM["Answer Generation\n1 LLM call"] LLM --> ANS[Answer] style JR fill:#8e44ad,color:#fff style LLM fill:#3498db,color:#fff " dir="auto">
flowchart LR
 Q[Question] --> VA["Vector Search\nSpeaker A namespace"]
 Q --> VB["Vector Search\nSpeaker B namespace"]
 VA -->|top-k×ば぀3 facts| RR["Round-Robin Merge\nzip_longest interleave"]
 VB -->|top-k×ば぀3 facts| RR
 RR -->|180 candidates| JR["Local Cross-Encoder\n(ms-marco-MiniLM-L-6-v2)"]
 JR -->|top-30 reranked| LLM["Answer Generation\n1 LLM call"]
 LLM --> ANS[Answer]
 style JR fill:#8e44ad,color:#fff
 style LLM fill:#3498db,color:#fff
Loading

πŸš€ Reproduce Results

# 1. Clone and install
git clone https://github.com/CodeNinjaSarthak/eidetic-memory.git
cd eidetic-memory
uv sync --all-packages
# 2. Configure β€” copy and fill in your API keys
cp .env.development.example .env.development
# 3. Run the benchmark (requires ingested memories)
# Uses ~/eval-venv (sentence-transformers + ONNX); cross-encoder model
# (~66MB) downloads to ~/.cache/huggingface/ on first run.
~/eval-venv/bin/python eval/eval_qa_accuracy.py \
 --conv-ids conv-26 conv-30 conv-41 conv-42 conv-43 conv-44 \
 conv-47 conv-48 conv-49 conv-50 \
 --local-rerank \
 --output eval/results/my_results.json \
 --concurrency 1

See eval/README.md for ingestion instructions and full evaluation documentation.


Project Structure

eidetic-memory/
β”œβ”€β”€ backend/
β”‚ β”œβ”€β”€ apps/api/ # FastAPI routes
β”‚ β”œβ”€β”€ services/ # llm Β· memory Β· retrieval Β· storage
β”‚ └── packages/config/ # Shared settings
β”œβ”€β”€ frontend/ # Next.js chat UI
└── eval/ # LoCoMo evaluation harness

Dependency graph (no circular deps):

config β†’ storage β†’ llm β†’ retrieval β†’ memory β†’ api

Development

make check # lint + 152 tests
make run # start API with hot reload

License

Apache License 2.0

About

Per-speaker memory isolation with neural reranking for multi-party LLM agents. 66.6% on LoCoMo (+50.2 pp on temporal over RAG) at 1.02 LLM calls per query.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /