Name	Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows	.github/workflows
docs	docs
scripts	scripts
src/memory_bridge	src/memory_bridge
templates	templates
tests	tests
.env	.env
.gitignore	.gitignore
AGENTS.md	AGENTS.md
LICENSE	LICENSE
NOTICE	NOTICE
README.md	README.md
README_zh.md	README_zh.md
pyproject.toml	pyproject.toml
uv.lock	uv.lock

MemoryBridge

English | 中文

Drop-in memory middleware for LLM agents. Point your agent's OpenAI-compatible API endpoint to MemoryBridge and it automatically retrieves relevant memories, manages conversation context, and stores new memories — all without changing a single line of your agent code.

Agent ──(OpenAI API + Token)──→ MemoryBridge ──(HTTP)──→ Your LLM Provider
 │ ↑
 │ ├── Retrieve memories (read)
 │ └── Store memories (write)
 │
 ┌───┴────┐
 │ Qdrant │
 └───┬────┘
 │
 Mem0 (memory engine)

For a deep dive into the architecture, see docs/ARCHITECTURE.md.

Why MemoryBridge?

Problem	Solution
Agents forget past conversations	Automatic long-term memory retrieval and injection
Every LLM provider has different APIs	OpenAI-compatible single endpoint; provider details per token
Memory code mixed with agent logic	Transparent proxy — agent sees only a standard chat API
Manual context window management	Automatic session windowing and memory truncation
Stateless chat requests lose context	Automatic session history injection — recent conversation history prepended after system prompt

Key Design Choices:

Principle	Description
Drop-in compatible	Same request/response format as any OpenAI-compatible chat API
Transparent proxy	Only intercepts `messages` for memory injection; all other fields passed through untouched
Tool-call safe	LLM tool_calls responses are preserved in session history for correct multi-turn tool chains
Hard failure, no fallback	Missing token → 401, invalid config → fail immediately. Never guess, degrade, or default
Per-request isolation	Fresh handler instance per HTTP request, no shared mutable state
Config externalized	LLM/Embedding keys stored per-token via Admin API; `.env` holds only infrastructure
Minimal dependencies	Zero Docker, zero Redis, zero ORM, zero message queue

Quick Start

Prerequisites

Python 3.11+
Linux x86_64 / aarch64, or macOS x86_64 / arm64

3 Steps to Run

# 1. Install
git clone https://github.com/51193/MemoryBridge && cd MemoryBridge
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
uv sync
# 2. Initialize (downloads Qdrant, creates databases, generates admin token)
uv run python src/memory_bridge/host_manager.py --init
# Output: ADMIN TOKEN (save this): <32-char hex>
# 3. Start
uv run python src/memory_bridge/host_manager.py
# Visit http://localhost:8000/health → {"status":"ok","qdrant":"connected"}

Register a Service Token

After starting, register a Service Token using the Admin Token from step 2 above. Each token carries its own LLM and Embedding configuration:

curl -X POST http://localhost:8000/v1/admin/tokens \
 -H "Authorization: Bearer <admin_token>" \
 -H "Content-Type: application/json" \
 -d '{
 "label": "my-llm",
 "main_llm_base_url": "https://api.deepseek.com",
 "main_llm_api_key": "sk-xxx",
 "memory_llm_base_url": "https://api.deepseek.com",
 "memory_llm_api_key": "sk-xxx",
 "memory_llm_provider": "openai",
 "embed_base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
 "embed_api_key": "sk-xxx",
 "embed_dims": 1024,
 "embed_provider": "openai"
 }'
# Returns: {"token": "<32-char service_token>", ...}

Make Your First Request

# Exactly like calling any LLM API — just use the service token
curl -X POST http://localhost:8000/v1/chat/completions \
 -H "Authorization: Bearer <service_token>" \
 -H "Content-Type: application/json" \
 -d '{
 "model": "deepseek-chat",
 "messages": [{"role": "user", "content": "Hi, my name is Ming"}],
 "stream": false
 }'

MemoryBridge will retrieve any relevant past memories, inject them into the system prompt, forward the enriched request to your LLM provider, and asynchronously store new memories from the conversation — all transparently.

Multi-turn Conversation (with Streaming)

curl -X POST http://localhost:8000/v1/chat/completions \
 -H "Authorization: Bearer <service_token>" \
 -H "Content-Type: application/json" \
 -d '{
 "model": "deepseek-chat",
 "messages": [
 {"role": "user", "content": "My name is Ming"},
 {"role": "assistant", "content": "Hello Ming!"},
 {"role": "user", "content": "What is my name?"}
 ],
 "stream": true
 }'

Token System

MemoryBridge enforces token authentication. There are two types:

Type	Purpose	Created By
Admin Token	Register/manage Service Tokens	`--init` auto-generates
Service Token	Call `/v1/chat/completions`	Admin API

Each Service Token carries its full LLM/Embedding configuration:

Field	Required	Description
`label`	No	Human-readable label
`main_llm_base_url`	Yes	Chat LLM base URL (path `/v1/chat/completions` auto-appended)
`main_llm_api_key`	Yes	Chat LLM API key
`memory_llm_base_url`	Yes	Memory extraction LLM base URL
`memory_llm_api_key`	Yes	Memory extraction LLM API key
`memory_llm_provider`	Yes	Memory extraction LLM provider (e.g. `deepseek`, `dashscope`)
`memory_llm_body`	No	Transparent JSON (model, temperature, etc.)
`embed_base_url`	Yes	Embedding service base URL
`embed_api_key`	Yes	Embedding API key
`embed_dims`	Yes	Vector dimensions (e.g. 1024); must not be 0
`embed_provider`	Yes	Embedding provider name (e.g. `openai`, `dashscope`)
`embed_body`	No	Transparent JSON (model, etc.)
`memory_enabled`	No	Enable memory (default: true)
`memory_limit`	No	Memories per retrieval (1–20, default: 5)
`session_window_size`	No	Session history window (1–100, default: 10)
`memory_prompt`	No	Custom memory extraction prompt

Hard-failure semantics: If any required field is null, empty, or zero (for numeric fields), the token is treated as invalid and returns 401.

API Reference

Chat Endpoint

Method	Path	Description
`POST`	`/v1/chat/completions`	OpenAI-compatible chat, with memory injection

Admin Endpoints

Method	Path	Description
`POST`	`/v1/admin/tokens`	Register a Service Token (201)
`GET`	`/v1/admin/tokens/{token}`	View config (API keys masked)
`POST`	`/v1/admin/tokens/{token}`	Update config (fields optional)
`DELETE`	`/v1/admin/tokens/{token}`	Delete a token (204)

Health

Method	Path	Description
`GET`	`/health`	Health check (bridge + Qdrant status)

Error Codes

Status	Meaning
`200`	Success
`201`	Token registered
`204`	Token deleted
`401`	Token missing, invalid, or config parse failed
`403`	Service token used on admin endpoint
`404`	Token not found
`422`	Request validation failed
`500`	Unexpected internal error
`502`	Memory retrieval failed or LLM provider unreachable
Other	Passed through from LLM provider (status code + headers + body)

Request Flow

Incoming request
 → TokenAuthMiddleware (auth + request-id + read raw JSON body)
 → router (parse messages, validate, create Handler instance)
 → ChatRequestHandler.execute()
 ├─ SessionStore.get() ← Read recent conversation history
 ├─ MemoryManager.search() ← Memory retrieval (Qdrant vector search)
 ├─ ContextBuilder.build() ← Assemble context (memory block + request)
 ├─ _inject_history() ← Prepend session history after system prompt
 ├─ response_sender.send_*() ← Forward enriched request to LLM
 └─ _store_memory() ← Async store (session + memories)

Project Structure

src/memory_bridge/
├── main.py # FastAPI app factory + lifespan
├── host_manager.py # Process manager (Qdrant + API subprocess)
├── host_process.py # Subprocess lifecycle management
├── host_init.py # First-time initialization
├── config.py # Infrastructure config (pydantic-settings)
├── exceptions.py # Custom exceptions
├── api/ # Routes + middleware + request handling
│ ├── response_sender.py # Non-stream / streaming LLM dispatch
├── core/ # Memory / session / context / token / logging
├── providers/ # HTTP client (transparent proxy)
└── models/ # Request / response models
templates/
 ├── memory_template.md # Memory injection template (English)
 └── memory_template_zh.md # Memory injection template (Chinese)

Development

uv sync --extra dev # Install dev dependencies
uv run mypy src/ # Type check (strict mode)
uv run ruff check src/ tests/ # Lint
uv run pytest -v # Unit tests
uv run pytest --cov=src/memory_bridge --cov-fail-under=95 # Coverage gate

The .env template (committed, no secrets) includes MEMORY_SEARCH_TIMEOUT=30 (to prevent memory searches from hanging indefinitely) and MEMORY_STORE_TIMEOUT=120 (to protect background memory store operations that involve LLM fact extraction).

Test structure mirrors source:

tests/
├── api/ ← src/memory_bridge/api/
├── core/ ← src/memory_bridge/core/
├── providers/ ← src/memory_bridge/providers/
├── architecture/ # Architecture guard tests (boundary checks)
└── integration/ # Integration smoke tests

Install from Release

Download the single-file .pyz from GitHub Releases:

mkdir -p /opt/memorybridge && cd /opt/memorybridge
wget https://github.com/51193/MemoryBridge/releases/latest/download/memorybridge.pyz
python3 memorybridge.pyz --init # First-time setup
python3 memorybridge.pyz # Start

License

Apache 2.0 — see also NOTICE for bundled open source attributions, and docs/LICENSES.md for a full third-party license summary.

Folders and files

Latest commit

History

Repository files navigation

MemoryBridge

Why MemoryBridge?

Quick Start

Prerequisites

3 Steps to Run

Register a Service Token

Make Your First Request

Multi-turn Conversation (with Streaming)

Token System

API Reference

Chat Endpoint

Admin Endpoints

Health

Error Codes

Request Flow

Project Structure

Development

Install from Release

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages