LLM Wiki vs RAG: a different approach to team-chat memory

DEV Community

Same fact, said dozens of times. Someone announces "we're moving to Postgres on March 15" in #engineering. Twelve people ack. Four people quote it in later threads. Three retrospectives cite it. The same fact now exists as 20+ near-duplicate chunks. Chunk-based retrieval surfaces one somewhat arbitrarily, or several near-duplicates that crowd out unrelated relevant context.

Time-ordered, not topic-grouped. In a document, related content sits adjacent. In chat, a feature decision, a bug report, and a lunch plan can be interleaved across five minutes.

Structure lives outside the text. Meaningful metadata is not in the message body: author, thread parent, reactions, platform, mentions, attachments. Embedding the body only throws away half the signal.

Answer instability. Ask the same question a week apart. Different chunks rank highest depending on recent message volume and minor embedding perturbations. You get subtly different answers — corrosive to user trust.

What an LLM Wiki can answer that RAG can't

This is the part of the design that most people miss when they first see the architecture. A good vector retriever can handle "what did we decide about onboarding?" just fine. The capabilities that separate an LLM Wiki from RAG show up when the question is relational, temporal, or multi-hop:

"Who decided we're moving to Postgres, and did anything supersede that decision?"
"Which projects does the billing service block, and who owns them?"
"Show me every decision Alan announced in Q1 that was later reversed."
"Which technologies are members of team A working with that are still flagged experimental?"
"What got decided in threads that mention the SOC2 audit?"

Each of these requires walking a relationship — decision supersession, team membership, thread mentions, status attributes — that chunk-based retrieval fundamentally can't express. In Beever Atlas the knowledge graph in Neo4j holds those relationships. The Cypher traversal used in the codebase for decision-chain queries looks like this:

MATCH(d:Entity {type: 'Decision'})
WHERE d.channel_id = $channel_id
 OR EXISTS {
 MATCH(d)-[:MENTIONED_IN]->(ev:Event)
 WHERE ev.channel_id = $channel_id
 }
OPTIONAL MATCH(person:Entity)-[:DECIDED]->(d)
OPTIONAL MATCH(d)-[:SUPERSEDES]->(old:Entity)
OPTIONAL MATCH(newer:Entity)-[:SUPERSEDES]->(d)
WITH d,
 collect(DISTINCT person.name) AS decided_by,
 collect(DISTINCT old.name) AS supersedes,
 collect(DISTINCT newer.name) AS superseded_by
RETURN d, decided_by, supersedes, superseded_by
LIMIT $limit

This is the kind of query a Q&A agent's router hands off to the graph path when a question's shape demands it.

The shape of a fact and an entity

Before showing the pipeline, the data shapes that hold everything together:

Atomic fact — a single self-contained claim with a source message, author, timestamp, and thread context:

{"text":"The team is migrating from MySQL to Postgres on March 15","source_message_id":"slack:C08TX:1712500000001100","author":"alan5543","timestamp":"2026-03-01T14:22:00Z","thread_parent":null,"entities":[{"name":"MySQL","type":"Technology"},{"name":"Postgres","type":"Technology"}],"relationships":[{"source":"Team","target":"Postgres","type":"MIGRATING_TO","confidence":0.92,"valid_from":"2026-03-01T14:22:00Z","valid_until":null,"source_message_id":"slack:C08TX:1712500000001100"}]}

Two details to flag:

Every relationship has temporal props — confidence, valid_from, valid_until, source_message_id. This is how we answer "as of when" questions and how later decisions supersede earlier ones without destroying history.
Entity scope — some entity types (Person, Technology, Project, Team) merge globally across channels. Others (Decision, Meeting, Artifact) merge only within a channel scope. Scope-aware MERGE prevents two different channels' "Q1 planning decisions" from collapsing into each other, while still letting "Alan" mean the same person everywhere.

String-level dedup is done with Jaro-Winkler similarity (via APOC) inside the scope; semantic dedup is done with embedding cosine on Entity.name_vector.

How the wiki gets built — six ADK stages

Ingestion runs as a pipeline of Google ADK LlmAgent steps, using Gemini 2.5 Flash for extraction. Six stages:

Preprocessor — normalises platform-specific message shapes into a single NormalizedMessage with author, timestamp, thread context, attachments, platform metadata.
Fact Extractor — pulls atomic facts with channel + author + timestamp attached. Bounded output via MAX_FACTS_PER_MESSAGE.
Entity Extractor — LLM-driven with a flexible type schema (Person / Decision / Project / Technology / Team / Meeting / Artifact / ...). Type vocabulary isn't frozen; new types can be added without a migration.
Cross-batch Validator — dedupes entities and relationships across batch boundaries. This is where scope-aware MERGE and Jaro-Winkler similarity live.
Relationship Graph — materialises bidirectional relationships (DECIDED ↔ DECIDED_BY, BLOCKED_BY ↔ BLOCKS, and so on) with the temporal props above.
Persister — transactional-outbox write to Weaviate + Neo4j + MongoDB, with media attribution preserved. The outbox pattern means a partial failure doesn't leave the stores inconsistent.

Periodically, a separate consolidation agent clusters related facts and synthesises them into topic pages — the "wiki" users browse in the dashboard. Consolidation is idempotent and checkpointed: you re-run it on a subset without rebuilding from scratch.

Dual memory: why Weaviate and Neo4j

Facts live in two stores because no single store wins both query shapes:

Weaviate — 3-tier semantic memory (channel-level summaries, topic-level synopses, fact-level detail). Hybrid BM25 + vector retrieval. Good for "what did we decide about X?" — any question whose answer is a semantic lookup over fact text.
Neo4j — entities, decisions, episodic links, media attributions. Good for "who decided this and why?" — any question that requires walking a relationship.

A query router picks semantic, graph, or both per question. The router is a small LLM classifier, not a rule engine. Rule engines fail on novel phrasings; a classifier trades a few milliseconds of routing latency for resilience to queries we haven't seen before.

flowchart LR
 Q[Question] --> R[Query Router<br/>LLM classifier]
 R -->|semantic| W[Weaviate<br/>channel / topic / fact]
 R -->|graph| N[Neo4j<br/>entity + rel traversal]
 R -->|both| W
 R -->|both| N
 W --> M[Merge + dedup by fact_id]
 N --> M
 M --> A[Answer agent<br/>ADK SkillToolset]
 A --> S[SSE stream<br/>response + citations]

MongoDB holds the wiki page cache, ingestion state, and the transactional outbox. Redis holds sessions. Nothing interesting happens in either — they're operational plumbing.

The MCP surface: same memory, two audiences

The dashboard is one surface. The MCP (Model Context Protocol) server is the other. Sixteen tools are exposed to external agents like Claude Code, Cursor, or any MCP-capable client:

search_by_topic, search_channel_facts
get_decision_timeline, find_supersessions
graph_traverse, resolve_entity, find_mentions
list_channels, read_wiki_section
...and more for ingestion status, media lookup, and policy.

The meaningful shift here is that the team memory becomes reusable context, not a siloed app. An IDE-resident coding agent can ask "what did the team decide about the auth library?" with the same guarantees the dashboard gives a human — citations back to source messages, scope-aware resolution, permission-aware access (roadmap, below).

What this costs

Honest accounting: ingestion is slower and more expensive than RAG. You pay the LLM twice — once for extraction, once for consolidation. Using Gemini Flash keeps both bounded. Consolidation runs sparsely (once per topic cluster) and amortises across many messages.

The tradeoff is explicit: you pay more ingestion cost to get a standing artefact instead of ephemeral retrieval results. For team-chat corpora with heavy repetition, high conversational noise, and relational/temporal queries, an LLM Wiki reliably wins on answer quality, citation fidelity, browsability, and the types of question it can even entertain. For already-structured documents, classic RAG is still the right tool.

A concrete query path

Someone asks "who decided we're moving to Postgres, and has anything superseded that decision?":

The router classifies this as both — it needs fact retrieval and graph traversal, since there's a decider and a supersession chain.
Weaviate retrieves the top facts about Postgres migration from the fact tier.
Neo4j runs the decision-chain traversal shown earlier, returning decided_by, supersedes, and superseded_by collections.
Results merge, dedup by fact_id, and pass to the answer agent.
The agent streams back via SSE:

Alan announced the migration from MySQL to Postgres on March 1, 2026 [1], with cutover scheduled for March 15 [1]. The decision was ratified in the March 3 engineering sync [2] and has not been superseded since.

[1] alan5543 in #engineering, 2026年03月01日 14:22
[2] alan5543 in #engineering-sync, 2026年03月03日 10:00

The citations come straight from the fact records surfaced in steps 2 and 3. There's no separate "find the source" step that can drift from what the agent actually retrieved.

Where we sit, and what's defensible

Most team-memory products are vector-only — Glean, Notion AI, various OSS "LLM wiki" clones built on a single embedding store. Beever Atlas is the first OSS product we're aware of putting a real knowledge graph under team-chat memory.

The graph isn't cosmetic. It's load-bearing for a specific class of query — multi-hop, temporal, scope-aware — that vector retrieval fundamentally can't serve. It's also the substrate on which a permission spine can be built: mirroring Slack Enterprise Grid ACLs as graph-level access rules means permissions get enforced at query time, not papered over with app-tier filters. That's on our roadmap and we think it's the right shape for enterprise deployment.

The rest of the stack

The full system runs locally via docker compose:

Python backend — FastAPI + Google ADK agents
TypeScript bot bridge — Slack / Discord / Teams / Mattermost / Telegram webhooks with platform-specific signature verification
React dashboard — wiki view, interactive Cytoscape knowledge graph, streaming Q&A with live citations
Weaviate, Neo4j, MongoDB, Redis

Platform credentials (bot tokens) are encrypted at rest with AES-256-GCM. All data stays in databases you control. The app sends no telemetry anywhere — LLM calls go through API keys you configure in your own .env.

Try it

A make demo target brings up the full stack pre-loaded with a public Wikipedia corpus (Ada Lovelace + Python history, CC-BY-SA 3.0). Pre-computed fixtures ship in the repo, so seeding runs without any API keys. Asking questions via the Q&A agent needs a free-tier Gemini API key — Ollama support for fully-local inference is on the roadmap.

git clone https://github.com/Beever-AI/beever-atlas
cd beever-atlas
make demo

Then:

curl -X POST http://localhost:8000/api/channels/demo-wikipedia/ask \
 -H "Authorization: Bearer dev-key-change-me" \
 -H "Content-Type: application/json" \
 -d '{"question":"Who was Ada Lovelace?"}'

You'll see a streaming SSE response with six citations linking back to the source Wikipedia articles.

Links

Repository: github.com/Beever-AI/beever-atlas
Documentation: docs.beever.ai/atlas
Demo video: youtu.be/VJ81Uxyjxb0

Beever Atlas is developed by Beever AI Limited in Toronto, Ontario, and released under the Apache 2.0 license. The LLM Wiki concept was inspired by Andrej Karpathy's earlier-2026 thread on LLM Knowledge Bases; we took the idea seriously and built the implementation.

Contributions welcome. Issues especially — we want to hear the edge cases where a vector-only system would have worked and our graph overhead turned out not to pay for itself.