English | 中文
Drop-in memory middleware for LLM agents. Point your agent's OpenAI-compatible API endpoint to MemoryBridge and it automatically retrieves relevant memories, manages conversation context, and stores new memories — all without changing a single line of your agent code.
Agent ──(OpenAI API + Token)──→ MemoryBridge ──(HTTP)──→ Your LLM Provider
│ ↑
│ ├── Retrieve memories (read)
│ └── Store memories (write)
│
┌───┴────┐
│ Qdrant │
└───┬────┘
│
Mem0 (memory engine)
For a deep dive into the architecture, see docs/ARCHITECTURE.md.
| Problem | Solution |
|---|---|
| Agents forget past conversations | Automatic long-term memory retrieval and injection |
| Every LLM provider has different APIs | OpenAI-compatible single endpoint; provider details per token |
| Memory code mixed with agent logic | Transparent proxy — agent sees only a standard chat API |
| Manual context window management | Automatic session windowing and memory truncation |
| Stateless chat requests lose context | Automatic session history injection — recent conversation history prepended after system prompt |
Key Design Choices:
| Principle | Description |
|---|---|
| Drop-in compatible | Same request/response format as any OpenAI-compatible chat API |
| Transparent proxy | Only intercepts messages for memory injection; all other fields passed through untouched |
| Tool-call safe | LLM tool_calls responses are preserved in session history for correct multi-turn tool chains |
| Hard failure, no fallback | Missing token → 401, invalid config → fail immediately. Never guess, degrade, or default |
| Per-request isolation | Fresh handler instance per HTTP request, no shared mutable state |
| Config externalized | LLM/Embedding keys stored per-token via Admin API; .env holds only infrastructure |
| Minimal dependencies | Zero Docker, zero Redis, zero ORM, zero message queue |
- Python 3.11+
- Linux x86_64 / aarch64, or macOS x86_64 / arm64
# 1. Install git clone https://github.com/51193/MemoryBridge && cd MemoryBridge curl -LsSf https://astral.sh/uv/install.sh | sh source $HOME/.local/bin/env uv sync # 2. Initialize (downloads Qdrant, creates databases, generates admin token) uv run python src/memory_bridge/host_manager.py --init # Output: ADMIN TOKEN (save this): <32-char hex> # 3. Start uv run python src/memory_bridge/host_manager.py # Visit http://localhost:8000/health → {"status":"ok","qdrant":"connected"}
After starting, register a Service Token using the Admin Token from step 2 above. Each token carries its own LLM and Embedding configuration:
curl -X POST http://localhost:8000/v1/admin/tokens \ -H "Authorization: Bearer <admin_token>" \ -H "Content-Type: application/json" \ -d '{ "label": "my-llm", "main_llm_base_url": "https://api.deepseek.com", "main_llm_api_key": "sk-xxx", "memory_llm_base_url": "https://api.deepseek.com", "memory_llm_api_key": "sk-xxx", "memory_llm_provider": "openai", "embed_base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1", "embed_api_key": "sk-xxx", "embed_dims": 1024, "embed_provider": "openai" }' # Returns: {"token": "<32-char service_token>", ...}
# Exactly like calling any LLM API — just use the service token curl -X POST http://localhost:8000/v1/chat/completions \ -H "Authorization: Bearer <service_token>" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-chat", "messages": [{"role": "user", "content": "Hi, my name is Ming"}], "stream": false }'
MemoryBridge will retrieve any relevant past memories, inject them into the system prompt, forward the enriched request to your LLM provider, and asynchronously store new memories from the conversation — all transparently.
curl -X POST http://localhost:8000/v1/chat/completions \ -H "Authorization: Bearer <service_token>" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-chat", "messages": [ {"role": "user", "content": "My name is Ming"}, {"role": "assistant", "content": "Hello Ming!"}, {"role": "user", "content": "What is my name?"} ], "stream": true }'
MemoryBridge enforces token authentication. There are two types:
| Type | Purpose | Created By |
|---|---|---|
| Admin Token | Register/manage Service Tokens | --init auto-generates |
| Service Token | Call /v1/chat/completions |
Admin API |
Each Service Token carries its full LLM/Embedding configuration:
| Field | Required | Description |
|---|---|---|
label |
No | Human-readable label |
main_llm_base_url |
Yes | Chat LLM base URL (path /v1/chat/completions auto-appended) |
main_llm_api_key |
Yes | Chat LLM API key |
memory_llm_base_url |
Yes | Memory extraction LLM base URL |
memory_llm_api_key |
Yes | Memory extraction LLM API key |
memory_llm_provider |
Yes | Memory extraction LLM provider (e.g. deepseek, dashscope) |
memory_llm_body |
No | Transparent JSON (model, temperature, etc.) |
embed_base_url |
Yes | Embedding service base URL |
embed_api_key |
Yes | Embedding API key |
embed_dims |
Yes | Vector dimensions (e.g. 1024); must not be 0 |
embed_provider |
Yes | Embedding provider name (e.g. openai, dashscope) |
embed_body |
No | Transparent JSON (model, etc.) |
memory_enabled |
No | Enable memory (default: true) |
memory_limit |
No | Memories per retrieval (1–20, default: 5) |
session_window_size |
No | Session history window (1–100, default: 10) |
memory_prompt |
No | Custom memory extraction prompt |
Hard-failure semantics: If any required field is null, empty, or zero (for numeric fields), the token is treated as invalid and returns 401.
| Method | Path | Description |
|---|---|---|
POST |
/v1/chat/completions |
OpenAI-compatible chat, with memory injection |
| Method | Path | Description |
|---|---|---|
POST |
/v1/admin/tokens |
Register a Service Token (201) |
GET |
/v1/admin/tokens/{token} |
View config (API keys masked) |
POST |
/v1/admin/tokens/{token} |
Update config (fields optional) |
DELETE |
/v1/admin/tokens/{token} |
Delete a token (204) |
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check (bridge + Qdrant status) |
| Status | Meaning |
|---|---|
200 |
Success |
201 |
Token registered |
204 |
Token deleted |
401 |
Token missing, invalid, or config parse failed |
403 |
Service token used on admin endpoint |
404 |
Token not found |
422 |
Request validation failed |
500 |
Unexpected internal error |
502 |
Memory retrieval failed or LLM provider unreachable |
| Other | Passed through from LLM provider (status code + headers + body) |
Incoming request
→ TokenAuthMiddleware (auth + request-id + read raw JSON body)
→ router (parse messages, validate, create Handler instance)
→ ChatRequestHandler.execute()
├─ SessionStore.get() ← Read recent conversation history
├─ MemoryManager.search() ← Memory retrieval (Qdrant vector search)
├─ ContextBuilder.build() ← Assemble context (memory block + request)
├─ _inject_history() ← Prepend session history after system prompt
├─ response_sender.send_*() ← Forward enriched request to LLM
└─ _store_memory() ← Async store (session + memories)
src/memory_bridge/
├── main.py # FastAPI app factory + lifespan
├── host_manager.py # Process manager (Qdrant + API subprocess)
├── host_process.py # Subprocess lifecycle management
├── host_init.py # First-time initialization
├── config.py # Infrastructure config (pydantic-settings)
├── exceptions.py # Custom exceptions
├── api/ # Routes + middleware + request handling
│ ├── response_sender.py # Non-stream / streaming LLM dispatch
├── core/ # Memory / session / context / token / logging
├── providers/ # HTTP client (transparent proxy)
└── models/ # Request / response models
templates/
├── memory_template.md # Memory injection template (English)
└── memory_template_zh.md # Memory injection template (Chinese)
uv sync --extra dev # Install dev dependencies uv run mypy src/ # Type check (strict mode) uv run ruff check src/ tests/ # Lint uv run pytest -v # Unit tests uv run pytest --cov=src/memory_bridge --cov-fail-under=95 # Coverage gate
The .env template (committed, no secrets) includes MEMORY_SEARCH_TIMEOUT=30
(to prevent memory searches from hanging indefinitely) and MEMORY_STORE_TIMEOUT=120
(to protect background memory store operations that involve LLM fact extraction).
Test structure mirrors source:
tests/
├── api/ ← src/memory_bridge/api/
├── core/ ← src/memory_bridge/core/
├── providers/ ← src/memory_bridge/providers/
├── architecture/ # Architecture guard tests (boundary checks)
└── integration/ # Integration smoke tests
Download the single-file .pyz from GitHub Releases:
mkdir -p /opt/memorybridge && cd /opt/memorybridge wget https://github.com/51193/MemoryBridge/releases/latest/download/memorybridge.pyz python3 memorybridge.pyz --init # First-time setup python3 memorybridge.pyz # Start
Apache 2.0 — see also NOTICE for bundled open source attributions,
and docs/LICENSES.md for a full third-party license summary.