Name	Name	Last commit message	Last commit date
Latest commit History 110 Commits
.github/workflows	.github/workflows
backend	backend
eval	eval
figures	figures
frontend	frontend
.env.development.example	.env.development.example
.gitignore	.gitignore
ARCHITECTURE.md	ARCHITECTURE.md
LICENSE	LICENSE
Makefile	Makefile
README.md	README.md
pyproject.toml	pyproject.toml
result.md	result.md
uv.lock	uv.lock

🧠 Eidetic Memory

Long-term memory for AI agents — extracts, evolves, and retrieves facts across conversations.

✨ Why Eidetic Memory

🧠 Remembers what matters — extracts atomic facts from every conversation turn, not raw chat logs
⚡ Evolves over time — ADD, UPDATE, DELETE, NOOP decisions keep memory fresh and conflict-free
🎯 Retrieves the right context — semantic search + neural reranking surfaces the most recent, relevant facts at query time
🗣️ Multi-party ready — per-speaker memory isolation prevents cross-speaker contamination in group conversations
🔌 Any LLM, any time — swap Claude, Gemini, Azure OpenAI, or Groq with a single env var
📊 Benchmark-validated — 66.6% QA accuracy on LoCoMo, +50.2 pp on temporal questions over RAG, at just 1.02 LLM calls per query

How It Works

Every conversation turn passes through a four-stage pipeline:

×ばつ 3 candidates\\n(per-speaker namespace)\"]\n H --> E\n E -->|candidates| I[Local Cross-Encoder\\ncross-encoder/ms-marco-MiniLM-L-6-v2]\n I -->|top-k reranked facts| J[Answer Generation LLM]\n J --> K[Response]\n J -->|\"don't know + non-OD (1.7%)\"| L[Two-pass Rephrase]\n L --> G\n\n style I fill:#8e44ad,color:#fff\n style E fill:#e74c3c,color:#fff\n style J fill:#3498db,color:#fff\n style CTX fill:#e67e22,color:#fff\n"}" data-plain="flowchart TD A[User Message] --> CTX["Context Assembly\ncurrent speaker · rolling summary (every 15 turns) · recent msgs (last 10)"] CTX --> B[ExtractionPipeline] B -->|candidate facts + speaker attribution| C[EvolutionEngine] C -->|ADD / UPDATE / DELETE / NOOP| D[QdrantMemoryStore] D -->|stored embeddings + payloads| E[(Qdrant Vector DB)] F[Query] --> G[MemoryRetriever] G -->|embed query| H["Vector Search\ntop-k ×ばつ 3 candidates\n(per-speaker namespace)"] H --> E E -->|candidates| I[Local Cross-Encoder\ncross-encoder/ms-marco-MiniLM-L-6-v2] I -->|top-k reranked facts| J[Answer Generation LLM] J --> K[Response] J -->|"don't know + non-OD (1.7%)"| L[Two-pass Rephrase] L --> G style I fill:#8e44ad,color:#fff style E fill:#e74c3c,color:#fff style J fill:#3498db,color:#fff style CTX fill:#e67e22,color:#fff " dir="auto">

flowchart TD
 A[User Message] --> CTX["Context Assembly\ncurrent speaker · rolling summary (every 15 turns) · recent msgs (last 10)"]
 CTX --> B[ExtractionPipeline]
 B -->|candidate facts + speaker attribution| C[EvolutionEngine]
 C -->|ADD / UPDATE / DELETE / NOOP| D[QdrantMemoryStore]
 D -->|stored embeddings + payloads| E[(Qdrant Vector DB)]
 F[Query] --> G[MemoryRetriever]
 G -->|embed query| H["Vector Search\ntop-k ×ばつ 3 candidates\n(per-speaker namespace)"]
 H --> E
 E -->|candidates| I[Local Cross-Encoder\ncross-encoder/ms-marco-MiniLM-L-6-v2]
 I -->|top-k reranked facts| J[Answer Generation LLM]
 J --> K[Response]
 J -->|"don't know + non-OD (1.7%)"| L[Two-pass Rephrase]
 L --> G
 style I fill:#8e44ad,color:#fff
 style E fill:#e74c3c,color:#fff
 style J fill:#3498db,color:#fff
 style CTX fill:#e67e22,color:#fff

📊 Evaluation

Evaluated on the LoCoMo benchmark across 10 conversations (n=1540 QA pairs) with per-speaker memory isolation.

Component accuracy

Metric	Score	Details
Fact extraction recall	95.0%	n=100, QA pairs with evidence
Fact extraction precision	83.0%	Wilson 95% CI [74.5, 89.1]
Speaker attribution accuracy	97.0%	Wilson 95% CI [91.5, 99.0]
Conflict resolution accuracy	100%	26 test cases (ADD/UPDATE/DELETE/NOOP)

Retrieval quality

25.5% of incorrect answers had the correct fact present in Qdrant but it did not reach the top-30 reranked context window — the primary remaining retrieval bottleneck, addressable with sparse vector (BM25) retrieval on re-ingestion.

End-to-end QA accuracy

Category	Accuracy
Temporal	75.1%
Open-domain	70.5%
Single-hop	50.4%
Multi-hop	51.0%
Overall	66.6%

Bootstrap 95% CIs: Overall [64.4%, 68.8%], Temporal [69.9%, 80.1%]
Average LLM calls per query: 1.02 (two-pass fires on 1.7% of non-open-domain queries)

SOTA Comparison (LoCoMo, LLM-as-judge)

System	Overall	Temporal	Notes
RAG baseline (ours)	44.4%	24.9%	Direct retrieval over raw turns
Pipeline v2 (ours)	46.6%	57.3%	Per-speaker isolation + round-robin, no reranker
Eidetic Memory v1 (ours)	56.3%	64.2%	+ local cross-encoder
Eidetic Memory v2 (ours)	66.6%	75.1%	+ extraction context + prompt opt; 1.02 LLM calls/query
Mem0	~66.9%	—	No speaker isolation; ×ばつ more LLM calls per query
Memobase	75.78%	85.05%	—
Hindsight (OSS-20B)	83.18%	76.32%	—
Hindsight (OSS-120B)	85.67%	79.44%	—

Eidetic Memory v2 is the only system in this comparison with per-speaker namespace isolation and a neural reranking step. Its temporal score (75.1%) is within noise of Hindsight-20B (76.3%) at a fraction of the model scale.

Progress

Run	Score	Details
Baseline	14.8%	Gemini, conv-30 only, top-k=10
+ Per-speaker isolation	31.8%	Multi-namespace retrieval
+ Fixed merge	52.4%	Round-robin interleaving
+ Named entities + two-pass	53.2%	n=233
+ Isolation + RR, no rerank	25.8%	n=233 — reranker is load-bearing
+ Isolation + RR + local cross-encoder	55.8%	n=233
+ Isolation + score-based + local cross-encoder	55.8%	n=233 — merge strategy irrelevant
+ NER + two-pass + local cross-encoder	56.6%	n=233
+ Local cross-encoder (full run, v1)	56.3%	Full n=1540, all 10 convs
+ Extraction context (rolling summary + speaker attribution)	64.6%	Full n=1540 (+8.3 pp)
+ Generation prompt optimization (v2, canonical)	66.6%	Full n=1540 (+2.0 pp)

Retrieval Architecture

×ばつ3 facts| RR[\"Round-Robin Merge\\nzip_longest interleave\"]\n VB -->|top-k×ばつ3 facts| RR\n RR -->|180 candidates| JR[\"Local Cross-Encoder\\n(ms-marco-MiniLM-L-6-v2)\"]\n JR -->|top-30 reranked| LLM[\"Answer Generation\\n1 LLM call\"]\n LLM --> ANS[Answer]\n style JR fill:#8e44ad,color:#fff\n style LLM fill:#3498db,color:#fff\n"}" data-plain="flowchart LR Q[Question] --> VA["Vector Search\nSpeaker A namespace"] Q --> VB["Vector Search\nSpeaker B namespace"] VA -->|top-k×ばつ3 facts| RR["Round-Robin Merge\nzip_longest interleave"] VB -->|top-k×ばつ3 facts| RR RR -->|180 candidates| JR["Local Cross-Encoder\n(ms-marco-MiniLM-L-6-v2)"] JR -->|top-30 reranked| LLM["Answer Generation\n1 LLM call"] LLM --> ANS[Answer] style JR fill:#8e44ad,color:#fff style LLM fill:#3498db,color:#fff " dir="auto">

flowchart LR
 Q[Question] --> VA["Vector Search\nSpeaker A namespace"]
 Q --> VB["Vector Search\nSpeaker B namespace"]
 VA -->|top-k×ばつ3 facts| RR["Round-Robin Merge\nzip_longest interleave"]
 VB -->|top-k×ばつ3 facts| RR
 RR -->|180 candidates| JR["Local Cross-Encoder\n(ms-marco-MiniLM-L-6-v2)"]
 JR -->|top-30 reranked| LLM["Answer Generation\n1 LLM call"]
 LLM --> ANS[Answer]
 style JR fill:#8e44ad,color:#fff
 style LLM fill:#3498db,color:#fff

🚀 Reproduce Results

# 1. Clone and install
git clone https://github.com/CodeNinjaSarthak/eidetic-memory.git
cd eidetic-memory
uv sync --all-packages
# 2. Configure — copy and fill in your API keys
cp .env.development.example .env.development
# 3. Run the benchmark (requires ingested memories)
# Uses ~/eval-venv (sentence-transformers + ONNX); cross-encoder model
# (~66MB) downloads to ~/.cache/huggingface/ on first run.
~/eval-venv/bin/python eval/eval_qa_accuracy.py \
 --conv-ids conv-26 conv-30 conv-41 conv-42 conv-43 conv-44 \
 conv-47 conv-48 conv-49 conv-50 \
 --local-rerank \
 --output eval/results/my_results.json \
 --concurrency 1

See eval/README.md for ingestion instructions and full evaluation documentation.

Project Structure

eidetic-memory/
├── backend/
│ ├── apps/api/ # FastAPI routes
│ ├── services/ # llm · memory · retrieval · storage
│ └── packages/config/ # Shared settings
├── frontend/ # Next.js chat UI
└── eval/ # LoCoMo evaluation harness

Dependency graph (no circular deps):

config → storage → llm → retrieval → memory → api

Development

make check # lint + 152 tests
make run # start API with hot reload

License

Apache License 2.0

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeNinjaSarthak/eidetic-memory

Folders and files

Latest commit

History

Repository files navigation

🧠 Eidetic Memory

✨ Why Eidetic Memory

How It Works

📊 Evaluation

Component accuracy

Retrieval quality

End-to-end QA accuracy

SOTA Comparison (LoCoMo, LLM-as-judge)

Progress

Retrieval Architecture

🚀 Reproduce Results

Project Structure

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Eidetic Memory

✨ Why Eidetic Memory

How It Works

📊 Evaluation

Component accuracy

Retrieval quality

End-to-end QA accuracy

SOTA Comparison (LoCoMo, LLM-as-judge)

Progress

Retrieval Architecture

🚀 Reproduce Results

Project Structure

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages