Lossless structured memory for LLM agents. Summaries index; raw is preserved.
Scattered conversations, tool outputs, and documents flow through a dual-speed sensor + compacter pipeline and land as navigable MemoryNodes — each holding a short summary, a retrieval trigger, lazy-generated detail, and the full raw content. Agents read at the shallowest tier that answers the question and only pay the token cost for depth they actually need.
CoMeT is the memory substrate for CoBrA and is consumable standalone via Python API, LangChain tools, or an MCP server.
Looking for a drop-in memory layer for Claude Code specifically? CoMeT-CC runs as a local TLS proxy that CC routes through, so every outgoing
/v1/messagesrequest gets trim + retrieval injection without modifying Claude Code itself. Lighter than full CoMeT (single-session scope, no consolidation), zero API key — drives sensor/compacter viaclaude -pusing your existing Claude subscription.
- 3-tier progressive retrieval — summary → lazy detailed summary → raw
- Dual-path RAG — separate embeddings for
summary(WHAT) andtrigger(WHEN), fused with RRF + graph-aware 2-hop link traversal - LanceDB triple-table index — disk-resident (mmap) vector store over summary / trigger / raw
- Multi-modality compacting — dialog, code artifacts, images, execution traces, external content each get tailored summary/trigger rules
- Snapshot-protected consolidation — dedup + cross-link + tag normalization + synthesis, all reversible on failure
- Per-session briefs — rule-hint-rationale guidance layer rewritten each compaction
- Handoff / inherited memory — curated IMPORTANCE:HIGH carry-over + per-chunk synthesis on session transition
- Provider-agnostic LLMs — OpenAI · Anthropic · Google · Ollama · vLLM, with short aliases
- MCP server — expose memory tools to any MCP-capable client via
comet-mcp
CoMeT and CoBrA are deployed in production at Dimension as the self-evolving memory substrate behind the company's autonomous manufacturing intelligence platform.
Manufacturing is a 16ドルT industry, but the design-to-production process still depends on human engineers operating decades-old CAD workflows. Dimension is building autonomous manufacturing intelligence: AI agents that can design, engineer, and eventually help produce physical products end-to-end. We start with CAD — drawings, part search, modeling, reconstruction, and engineering documentation — where our agents already deliver ×ばつ to ×ばつ productivity gains. In two months, we generated 1ドル.5M with six Tier-1 manufacturers. CAD is the entry; the long-term vision is zero-human design, engineering, and production with physical factory.
Where CoMeT sits in that loop:
- Text-to-CAD / drawing-to-CAD / CAD-to-drawing — manufacturing
intelligence (geometric conventions, tolerancing rules, prior
designs) is persisted as long-lived memory nodes. Compacter policies
for
artifact_codeandartifact_imagekeep specs/views indexable without losing the raw drawing as tier-3 evidence. - Parametric design iteration — successive natural-language edits to the same part (e.g. flange dimensions, bolt-hole pattern, material swap) replay against the linked node graph rather than the dialogue scrollback, so iterative refinement keeps a stable working memory.
- Assembly + parts-database workflows — component-level memories
(e.g.
PartDB_Machine_Arm) link into assembly-level nodes so the agent can drill from sub-part rationale up to the assembly context via the standard 1-hop graph expansion.
User / tool / document
│
▼
┌─────────────┐ ┌───────────┐
│ CognitiveSensor │ ─────────────▶ │ L1 Buffer │
│ (SLM, fast) │ load + flow └─────┬─────┘
└─────────────┘ │ topic_shift | high_load | buffer_overflow
▼
┌───────────┐
│ Compacter │ Main LLM (modality-aware policy)
└─────┬─────┘
summary + trigger + recall + importance + tags (+ brief)
▼
┌──────────────────────┴──────────────────────┐
▼ ▼
┌──────────────┐ ┌─────────────────┐
│ MemoryStore │ JSON KV + sessions + │ VectorIndex │ LanceDB
│ (atomic) │ briefs + inherited memory │ summary/trig/ │ triple-table
└──────┬───────┘ + snapshots │ raw │
│ └────────┬────────┘
│ │
│ ┌──────────────────────────┘
▼ ▼
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Retriever │◀──▶│ ScoreFusion │ │ Consolidator │
│ QueryAnalyzer│ │ RRF + sim │ │ dedup+link+ │
│ + 2-hop link │ │ fusion_α │ │ synthesize │
└─────────────┘ └──────────────┘ └──────────────┘
| Layer | Speed | Model | Output |
|---|---|---|---|
| Sensor | fast | slm_model |
L1Memory (raw-pass-through) + CognitiveLoad{logic_flow, load_level, redundancy_detected} |
| Compacter | slow | main_model |
MemoryNode with summary, trigger, recall_mode, importance, topic_tags, optional session_brief |
Compaction triggers when buffer_size ≥ min_l1_buffer and one of: logic_flow == BROKEN (topic shift) / load_level ≥ load_threshold / buffer_size ≥ max_l1_buffer. The originating reason is stored on the node (compaction_reason).
| Tier | API | Content | Token Cost |
|---|---|---|---|
| 1 | retrieve() / retrieve_dual() |
summary + trigger + tags + links | minimal |
| 2 | get_detailed_summary(node_id) |
3–8-sentence detailed summary (lazy-generated via SLM, cached on node) | medium, one-time |
| 3 | get_raw_content(node_id) / read_memory(node_id, depth=2) |
full original text | full |
Tier 2 is generated on first request from raw content and stored back on the node. Subsequent reads are free.
class MemoryNode(BaseModel): node_id: str # mem_YYYYMMDD_HHMMSS_<6hex> session_id: str | None depth_level: int # 0=summary, 1=detail, 2=raw/virtual recall_mode: Literal['passive', 'active', 'both'] topic_tags: list[str] # content tags + ORIGIN:/FLAG:/IMPORTANCE: meta summary: str # factual index, multi-topic separated by "; " detailed_summary: str | None # Tier 2 cache trigger: str # "When I ..." — 2–4 anchors, retrieval-oriented content_key: str # pointer into raw store raw_location: str # path to raw .txt links: list[str] # cross-references to other node_ids source_links: list[str] # file/image paths referenced by this node capsule: str # action-capsule prefix (e.g. "[ACT_FETCH] web_read (ok)") created_at: datetime compaction_reason: str | None # topic_shift | high_load | buffer_overflow | forced | external
| Mode | Behavior |
|---|---|
passive |
Always in the rendered context window |
active |
Retrieved on-demand via semantic search (default) |
both |
Always in context and searchable |
- QueryAnalyzer (SLM) decomposes the query →
semantic_query(WHAT) +search_intent(WHEN) +urgency+risk_level - Triple-table vector search (LanceDB, cosine distance):
- summary table ←
semantic_query - trigger table ←
search_intent - raw table ← both (fallback path)
- summary table ←
- ScoreFusion —
combined = 0.6 · RRF + 0.4 · max_similarity; path weights split byfusion_alphawithraw_search_weightcarved out - Graph-aware re-ranking — for each top-K result, follow outgoing
links(2-hop, decay 0.5 / 0.25). Nodes referenced by multiple top-K results get a refcount bonus (×ばつ sum) risk_level=highattaches an inline warning to the retrieval tool's response, telling the agent summaries are likely insufficient and to open raw
Triggers are written from the agent's perspective ("When I need to verify ..."), not user-centric, so retrieval matches even without an explicit user query rephrase.
MemoryCompacter.compact(...) accepts either a template name or a policy object with modality ∈ {dialog, artifact_code, artifact_image, execution_trace, external}. Each modality gets tailored summary / trigger / recall_mode instructions:
- dialog — factual index; retrieval scenarios; may emit a full-rewrite
session_brief - artifact_code — start with language/type; trigger anchors on exports to modify
- artifact_image — visual content, dimensions, format; trigger = visual-verification need
- execution_trace — tool name, success/failure, key values; trigger = verify exact results
- external — web search / doc ingest; active recall; URL/path preservation
Policy-driven compaction is used by CoBrA to label tool outputs correctly; standalone users can stick with the compacting.txt default template.
Alongside content tags, nodes carry structured meta-tags with fixed prefixes, priority-ranked for terse rendering:
| Axis | Prefix | Examples | Purpose |
|---|---|---|---|
| Origin | ORIGIN: |
USER, WEB_SEARCH, FILE_EDIT, SUBAGENT_RESULT, PROJECT_GOAL |
Where the content came from |
| Action | FLAG:ACT_ |
FAIL, EDIT, EXECUTE, DIAGNOSE, FETCH, PLAN, DECIDE, NONE |
What kind of action this turn represents |
| Kind | FLAG: |
SKILL, USER_REJECT, USER_FEEDBACK, PASSIVE |
Semantic flags |
| Importance | IMPORTANCE: |
HIGH, MED, LOW |
Prior on raw-reopen likelihood; drives handoff curation |
Only the highest-priority tag per axis is rendered in context windows ((O:USER A:EDIT F:FEEDBACK I:H)).
consolidate() runs (snapshot-protected — failure → full restore + index rebuild):
- Dedup — if
1 − cosine ≥ 0.32, merge newer into older (keep oldernode_id, bounded byMAX_MERGE_PER_KEEPER=5) - Summary/trigger regeneration — SLM rewrites the keeper's summary+trigger to cover both sources (
merge_summary.txt) - Cross-link — if
1 − cosine ≥ 0.45(and not dupes), add bidirectionallinks - Tag normalization — merge variants by case-insensitive / substring match (meta-prefixed tags excluded)
- Prune — drop dangling link ids
synthesize() is a separate cross-session pass that forms a chandelier of virtual parent nodes:
- Union-Find over pairwise summary similarity (
≥ 0.22, cluster size 2–8) - SLM validates each cluster is a coherent knowledge unit (
synthesis_validate.txt) - SLM generates a unified summary + trigger (
synthesis_create.txt) - New
MemoryNode(depth 2) with bidirectional links to all sources, indexed into the vector store, carrying its source sessions' ids
- Sessions — every node is tagged with a
session_id.sessions.jsontracks status (active/closed), timestamps, andnode_ids.list_sessions()/list_session_memories(session_id)/close_session()are public. - Session briefs — at most one per session, stored as
{base}/session_briefs/{session_id}.md, full-rewritten on each DIALOG compaction (never appended). Fixed skeleton —## Active Work Context,## Hints. ≤ 1500 chars. Out-of-band rewrites go throughregenerate_brief(reason=...). - Inherited memory —
save_inherited_memory(new_sid, source_sid, node_ids, synthesis_node_ids)persists a curated IMPORTANCE:HIGH carry-over + per-chunk synthesis nodes at{base}/inherited_memory/{sid}.json. The successor renders each in its own harness block. - Pinned external nodes —
pin_node(node_id)surfaces cross-session nodes in the current session'sget_session_context()output under an[External Nodes]header.
MemoryStore.create_snapshot(label) copies index.json, sessions.json, and all node files into {base}/.snapshot/{label}/. consolidate() and synthesize() wrap their mutations in a snapshot and restore-on-failure. On startup, CoMeT._recover_pending_snapshots() detects any leftover snapshots and rolls the store back + rebuilds the vector index.
from comet import CoMeT, scope @scope def main(config): memo = CoMeT(config) memo.add('B200 4대로 월드모델 학습 가능할까?') memo.add('2B면 충분하고 커봐야 8B') memo.add('DPO 데이터는 negative를 syntax error로 구성했어') memo.force_compact() for entry in memo.list_memories(): print(memo.read_memory(entry['node_id'], depth=0)) tools = memo.get_tools() # → get_memory_index, read_memory_node, search_memory, retrieve_memory (if retrieval enabled) main()
@scope def main(config): memo = CoMeT(config) memo.add('JWT 액세스 15분, 리프레시 7일') memo.force_compact() # Tier 1: summaries results = memo.retrieve('토큰 만료 설정?') for r in results: print(f'[{r.node.node_id}] score={r.relevance_score:.3f} — {r.node.summary}') nid = results[0].node.node_id # Tier 2: lazy detailed summary print(memo.get_detailed_summary(nid)) # Tier 3: raw print(memo.get_raw_content(nid))
# Sync nodes = memo.add_document(web_page_text, source='https://example.com/article') # Background (non-blocking) with completion callback memo.add_document( long_pdf_text, source='paper.pdf', background=True, on_complete=lambda created: print(f'ingested {len(created)} nodes'), ) memo.drain() # wait for all queued jobs
Chunks split at paragraph → line → sentence boundaries with overlap; duplicates are detected via content hash.
# Bypasses the L1 buffer — becomes an L2 node, linked to the next turn compaction node = memo.add_external( tool_result_text, source_tag='WEB_SEARCH', template_name='compacting_external', source_links=['https://example.com/page'], )
# Curated carry-over for a successor session high = memo.list_high_importance_nodes(limit=10) # Close + consolidate session-scoped dedup / links result = memo.close_session() # {'status': 'done', 'merged': ..., 'linked': ..., 'session_id': ...} # Pin external nodes into another session's rendered context other = CoMeT(config, session_id='successor') for n in high: other.pin_node(n['node_id']) print(other.get_session_context()) # includes [External Nodes] header
virtuals = memo.synthesize() # cluster + SLM validate + virtual-node creation
memo.get_tools() returns LangChain-compatible BaseTools:
| Tool | Returns |
|---|---|
get_memory_index() |
Rendered context window (passive/both first, then recent active) |
read_memory_node(node_id) |
Full raw data for a node |
search_memory(tag) |
Node ids matching a topic tag |
retrieve_memory(summary_query, trigger_query) |
Dual-path RAG with risk_level warning; returns summary/trigger/tags/links only (read the node for raw) |
The retrieval tool annotates the response with a ⚠️ HIGH RISK prefix when risk_level=high, steering the agent toward read_memory_node when exact values / wording matter.
CoMeT ships an MCP server for external agent integration:
pip install "comet-memory[mcp]" comet-mcp # from pyproject scripts # or: python -m comet.mcp_server
Store location is configurable via COMET_STORE_PATH.
Resources
memory://nodes— rendered memory indexmemory://sessions— session registry with metadata
Tools — get_memory_index, read_memory_node, search_memory, retrieve_memory (same semantics as the LangChain tools).
Configuration (ato)
# comet/config.py — defaults @scope.observe(default=True) def default(comet: ADict): comet.slm_model = 'gpt-4o-mini' # Sensor, QueryAnalyzer, detail, synthesis, merge comet.main_model = 'gpt-4o' # Compacter comet.llm = ADict(provider='openai') comet.language = 'the same language as the user' comet.compacting = ADict( load_threshold = 4, # CognitiveLoad >= N triggers compaction max_l1_buffer = 10, # hard buffer cap # min_l1_buffer defaults to 3 ) comet.storage = ADict( type='json', base_path='./memory_store', raw_path='./memory_store/raw', ) comet.consolidation = ADict( min_tag_overlap = 2, cross_link_threshold = 0.45, # cross_session_min_tag_overlap=1, cross_session_link_threshold=0.40 # merge_threshold=0.32, cluster_threshold=0.22 ) comet.retrieval = ADict( embedding_model = 'text-embedding-3-small', vector_backend = 'lance', vector_db_path = './memory_store/vectors', fusion_alpha = 0.5, # summary (α) vs trigger (1-α) weight rrf_k = 5, raw_search_weight = 0.2, # weight of raw-path signal in fusion top_k = 5, rerank = False, )
@scope.observe() def anthropic(comet: ADict): comet.llm.provider = 'anthropic' comet.slm_model = 'claude-3-5-haiku-latest' comet.main_model = 'claude-3-5-sonnet-latest' @scope.observe() def google(comet: ADict): comet.llm.provider = 'google' comet.slm_model = comet.main_model = 'gemini-2.0-flash' @scope.observe() def local_slm(comet: ADict): # Ollama comet.llm.provider = 'ollama' comet.llm.base_url = 'http://localhost:11434/v1' comet.slm_model = 'gemma2:9b' comet.retrieval.embedding_provider = 'ollama' comet.retrieval.embedding_model = 'nomic-embed-text' @scope.observe() def vllm(comet: ADict): comet.llm.provider = 'vllm' comet.llm.base_url = 'http://localhost:8000/v1' @scope.observe() def aggressive(comet: ADict): comet.compacting.load_threshold = 3 comet.compacting.max_l1_buffer = 5
Providers resolve from (1) short alias (sonnet, opus, haiku, flash, pro, gpt, mini, codex), (2) explicit provider/model prefix, (3) config.llm.provider, (4) openai fallback. Supported: openai, anthropic, google, ollama, vllm.
# Default python main.py # Anthropic + aggressive compaction python main.py anthropic aggressive
Two suites under benchmark/ compare CoMeT against a naive full-conversation-summary baseline, both using the judging-LLM pattern:
| Suite | File | Shape |
|---|---|---|
| Standard | benchmark/run_benchmark.py |
10 topics · short turns · one-hop recall |
| Hard | benchmark/run_benchmark_hard.py |
20 topics · needle-in-haystack · numerically adjacent distractors |
Both report CoMeT vs baseline accuracy and character-level compression. Run with:
python -m benchmark.run_benchmark python -m benchmark.run_benchmark_hard
Results are written to benchmark_results.json / benchmark_results_hard.json.
comet/
├── orchestrator.py # CoMeT — pipeline facade, sessions, pin, briefs, handoff, tools
├── sensor.py # CognitiveSensor — L1 extract + load/flow/redundancy (SLM)
├── compacter.py # MemoryCompacter — policy-aware L1→MemoryNode (Main LLM)
├── consolidator.py # dedup + cross-link + tag-norm + synthesize (snapshot-protected)
├── retriever.py # QueryAnalyzer + ScoreFusion + 2-hop link traversal
├── vector_index.py # LanceDB triple-table (summary / trigger / raw) + rerank
├── storage.py # JSON KV + sessions.json + session_briefs/ + inherited_memory/ + snapshots
├── schemas.py # MemoryNode, CognitiveLoad, L1Memory, RetrievalResult
├── llm_factory.py # Multi-provider chat/embed factory + model aliases
├── mcp_server.py # FastMCP server — resources + tools
├── config.py # ato scope presets (default / anthropic / google / local_slm / vllm / aggressive)
└── templates/
├── compacting.txt # default dialog compacter
├── compacting_base.txt # policy-injected base (modality-aware)
├── compacting_code.txt # code-artifact modality
├── compacting_external.txt # external content modality
├── compacting_file.txt # file read/write modality
├── cognitive_load.txt # sensor: logic_flow / load_level / redundancy
├── l1_extraction.txt # L1 extraction (legacy — sensor now pass-through)
├── query_analysis.txt # QueryAnalyzer decomposition
├── merge_summary.txt # post-dedup summary+trigger regen
├── merge_trigger.txt # (merge helper)
├── synthesis_validate.txt # cluster coherence check
├── synthesis_create.txt # virtual node generation
└── consolidation_assessment.txt # redundancy trigger for auto-consolidation
benchmark/
├── run_benchmark.py / run_benchmark_hard.py
└── synthetic_data{,_hard,_ultra}.py
tests/ # test_comet, test_rag, test_comparison, test_extended,
# test_agent_retrieval, test_snapshot, test_storage_bugs
pip install comet-memory # core (OpenAI + LanceDB) pip install "comet-memory[providers]" # + Anthropic, Google pip install "comet-memory[mcp]" # + FastMCP server pip install "comet-memory[providers,mcp]" # everything
Python ≥ 3.11. License: PolyForm Noncommercial 1.0.0 — free for personal, research, and nonprofit use; commercial use requires a separate license from the author.