Name	Name	Last commit message	Last commit date
Latest commit History 178 Commits
.cursor/rules	.cursor/rules
.github/workflows	.github/workflows
api	api
deploy	deploy
docs	docs
scripts	scripts
src	src
.dockerignore	.dockerignore
.gitignore	.gitignore
.python-version	.python-version
CLAUDE.md	CLAUDE.md
Dockerfile	Dockerfile
Dockerfile.dev	Dockerfile.dev
Dockerfile.qdrant	Dockerfile.qdrant
README.md	README.md
README.zh-CN.md	README.zh-CN.md
deploy.sh	deploy.sh
docker-compose.dev.yml	docker-compose.dev.yml
docker-compose.yml	docker-compose.yml
env.example	env.example
main.py	main.py
pyproject.toml	pyproject.toml
ruff.toml	ruff.toml
test_lightrag_doc_status_api.py	test_lightrag_doc_status_api.py
uv.lock	uv.lock

🚀 RAG API

Multi-tenant Multimodal Document Intelligent Retrieval System

Enterprise-grade RAG service built on RAG-Anything and LightRAG

CI Python FastAPI LightRAG Docker License

Features • Quick Start • Architecture • API Documentation • Deployment

📖 Introduction

RAG API is an enterprise-grade Retrieval-Augmented Generation (RAG) service that combines the powerful document parsing capabilities of RAG-Anything with the efficient knowledge graph retrieval technology of LightRAG, providing intelligent Q&A capabilities for your documents.

🎯 Key Highlights

🏢 Multi-tenant Isolation - Complete tenant data isolation for enterprise multi-tenant scenarios
🎨 Multimodal Parsing - Support for PDF, Word, images and more, with full OCR, tables, and formulas coverage
⚡ High-performance Retrieval - Knowledge graph-based hybrid retrieval with 6-15 second query response
🔄 Flexible Deployment - Support for production and development modes with one-click switching
📦 Ready to Use - One-click Docker deployment, service starts in 3 minutes
🎛️ Multiple Parsing Engines - DeepSeek-OCR (Remote API) + MinerU (Local/Remote API) + Docling (Fast)
🎨 RAG-Anything VLM Enhancement - Three modes (off/selective/full) for deep chart understanding
💾 Task Persistence - Redis storage support, tasks recoverable after container restart/instance rebuild

✨ Features

📄 Document Processing

✅ Multiple Format Support
- PDF, Word, Excel, PPT
- PNG, JPG, WebP images
- TXT, Markdown text
✅ Intelligent Parsing
- Plain text (.txt, .md) → Direct insertion (ultra-fast ~1s, skip parser)
- OCR text recognition
- Structured table extraction
- Mathematical formula recognition
- Layout analysis
✅ RAG-Anything VLM Enhancement 🆕
- off - Markdown only (fastest)
- selective - Selective processing of important charts
- full - Complete context enhancement processing
- Smart filtering: with titles, large size, first page content
- ⚠️ Only supports remote MinerU mode, local mode uses RAG-Anything native methods
✅ Batch Processing
- Up to 100 files per batch
- Async task queue
- Real-time progress tracking

🔍 Intelligent Retrieval

✅ Multi-mode Query
- naive - Vector retrieval (fastest)
- local - Local graph
- global - Global graph
- hybrid - Hybrid retrieval
- mix - Full retrieval (most accurate)
✅ Knowledge Graph
- Automatic entity extraction
- Relationship reasoning
- Semantic understanding
- Context enhancement
✅ External Storage
- DragonflyDB (KV storage + task storage)
- Qdrant (vector storage)
- Memgraph (graph database)
- Task persistence (Redis mode)

🏗️ Architecture

System Architecture Diagram

graph TB
 subgraph "Client Layer"
 Client[Client Application]
 WebUI[Web Interface]
 end
 
 subgraph "API Gateway Layer"
 FastAPI[FastAPI Service]
 Auth[Tenant Authentication]
 end
 
 subgraph "Business Logic Layer"
 TenantMgr[Tenant Manager]
 TaskQueue[Task Queue]
 
 subgraph "Document Processing"
 DeepSeekOCR[DeepSeek-OCR<br/>Fast OCR 80% cases]
 MinerU[MinerU Parser<br/>Complex multimodal]
 Docling[Docling Parser<br/>Fast lightweight]
 FileRouter[Smart Router<br/>Complexity scoring]
 end
 
 subgraph "RAG Engine"
 LightRAG[LightRAG Instance Pool<br/>LRU Cache 50]
 KG[Knowledge Graph Engine]
 Vector[Vector Retrieval Engine]
 end
 end
 
 subgraph "Storage Layer"
 DragonflyDB[(DragonflyDB<br/>KV Storage)]
 Qdrant[(Qdrant<br/>Vector Database)]
 Memgraph[(Memgraph<br/>Graph Database)]
 Local[(Local Files<br/>Temp Storage)]
 end
 
 subgraph "External Services"
 LLM[LLM<br/>Entity Extraction/Generation]
 Embedding[Embedding<br/>Vectorization]
 Rerank[Rerank<br/>Reranking]
 end
 
 Client --> FastAPI
 WebUI --> FastAPI
 FastAPI --> Auth
 Auth --> TenantMgr
 TenantMgr --> TaskQueue
 TenantMgr --> LightRAG
 
 TaskQueue --> FileRouter
 FileRouter --> DeepSeekOCR
 FileRouter --> MinerU
 FileRouter --> Docling
 DeepSeekOCR --> LightRAG
 MinerU --> LightRAG
 Docling --> LightRAG
 
 LightRAG --> KG
 LightRAG --> Vector
 
 KG --> DragonflyDB
 KG --> Memgraph
 Vector --> Qdrant
 LightRAG --> Local
 
 LightRAG --> LLM
 LightRAG --> Embedding
 Vector --> Rerank
 
 style FastAPI fill:#00C7B7
 style LightRAG fill:#FF6B6B
 style DeepSeekOCR fill:#5DADE2
 style MinerU fill:#4ECDC4
 style Docling fill:#95E1D3
 style TenantMgr fill:#F38181

Multi-tenant Architecture

graph TB
 subgraph "Tenant A"
 A_Config[Tenant A Config<br/>Independent API Key]
 A_Instance[LightRAG Instance A<br/>Dedicated LLM/Embedding]
 A_Data[(Tenant A Data<br/>Fully Isolated)]
 A_Config --> A_Instance
 A_Instance --> A_Data
 end
 subgraph "Tenant B"
 B_Config[Tenant B Config<br/>Independent API Key]
 B_Instance[LightRAG Instance B<br/>Dedicated LLM/Embedding]
 B_Data[(Tenant B Data<br/>Fully Isolated)]
 B_Config --> B_Instance
 B_Instance --> B_Data
 end
 subgraph "Tenant C"
 C_Config[Using Global Config]
 C_Instance[LightRAG Instance C<br/>Shared LLM/Embedding]
 C_Data[(Tenant C Data<br/>Fully Isolated)]
 C_Config --> C_Instance
 C_Instance --> C_Data
 end
 Pool[Instance Pool Manager<br/>LRU Cache + Config Isolation]
 Global[Global Config<br/>Default API Key]
 Pool --> A_Instance
 Pool --> B_Instance
 Pool --> C_Instance
 C_Config -.fallback.-> Global
 style Pool fill:#F38181
 style Global fill:#95E1D3
 style A_Config fill:#FFD93D
 style B_Config fill:#FFD93D
 style C_Config fill:#E8E8E8

Core Technology Stack

🔧 Frameworks & Runtime

FastAPI 0.115+
Python 3.11+
Uvicorn
Docker & Docker Compose

🧠 AI & RAG

LightRAG 1.4.9.4
RAG-Anything
MinerU (PDF-Extract-Kit)
Docling

💾 Storage & Database

DragonflyDB(Redis compatible)
Qdrant(Vector Database)
Memgraph(Graph Database)
Local filesystem

🚀 Quick Start

Option 1: One-click Deployment (Recommended)

Suitable for production and testing environments:

# 1. Clone the project
git clone https://github.com/BukeLy/rag-api.git
cd rag-api
# 2. Configure environment variables
cp env.example .env
nano .env # Fill in your API keys
# 3. Run deployment script
chmod +x deploy.sh
./deploy.sh
# Select deployment mode:
# 1) Production Mode - Standard container deployment
# 2) Development Mode - Code hot-reload
# 4. Verify service
curl http://localhost:8000/

Access Swagger Documentation: http://localhost:8000/docs

Option 2: Docker Compose

Production Mode

# Configure environment variables
cp env.example .env
nano .env
# Start services
docker compose -f docker-compose.yml up -d
# View logs
docker compose -f docker-compose.yml logs -f

Development Mode (Code Hot-reload)

# Start development environment
docker compose -f docker-compose.dev.yml up -d
# Or use quick script
./scripts/dev.sh
# Code changes will auto-reload without restart

Option 3: Local Development

# Install uv (Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv sync
# Configure environment variables
cp env.example .env
nano .env
# Start services
uv run uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Environment Variable Configuration

Minimum configuration (required):

# LLM Configuration (Function-oriented naming)
LLM_API_KEY=your_llm_api_key
LLM_BASE_URL=https://ark.cn-beijing.volces.com/api/v3
LLM_MODEL=ep-xxx-xxx
# LLM_REQUESTS_PER_MINUTE=800 # Rate limit (optional)
# LLM_TOKENS_PER_MINUTE=40000 # Rate limit (optional)
# LLM_MAX_ASYNC=8 # [Optional, expert mode] Manual concurrency control
# # Auto-calculated when unset: min(RPM, TPM/3500) = 11
# Embedding Configuration (Function-oriented naming)
EMBEDDING_API_KEY=your_embedding_api_key
EMBEDDING_BASE_URL=https://api.siliconflow.cn/v1
EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
EMBEDDING_DIM=1024
# EMBEDDING_MAX_ASYNC=32 # [Optional, expert mode] Auto-calculated when unset: 800
# MinerU Mode (Remote recommended)
MINERU_MODE=remote
MINERU_API_TOKEN=your_token
MINERU_HTTP_TIMEOUT=60 # MinerU download timeout (seconds, default 60)
FILE_SERVICE_BASE_URL=http://your-ip:8000
# VLM Chart Enhancement Configuration 🆕
# ⚠️ Note: Only effective in MINERU_MODE=remote
RAG_VLM_MODE=off # off / selective / full
RAG_IMPORTANCE_THRESHOLD=0.5 # Importance threshold (selective mode)
RAG_CONTEXT_WINDOW=2 # Context window (full mode)
RAG_CONTEXT_MODE=page # page / chunk
RAG_MAX_CONTEXT_TOKENS=3000 # Max context tokens
# Task Storage Configuration 🆕
TASK_STORE_STORAGE=redis # memory / redis (production recommends redis)
# Document Insert Verification Configuration 🆕
DOC_INSERT_VERIFICATION_TIMEOUT=300 # Verification timeout (seconds, default 5 minutes)
DOC_INSERT_VERIFICATION_POLL_INTERVAL=0.5 # Poll interval (seconds, default 500ms)
# Model Call Timeout Configuration 🆕
MODEL_CALL_TIMEOUT=90 # Model call max timeout (seconds, default 90)

⚡ Auto Concurrency Calculation:

LLM: When LLM_MAX_ASYNC is unset, auto-calculated as min(RPM, TPM/3500) ≈ 11
Embedding: When EMBEDDING_MAX_ASYNC is unset, auto-calculated as min(RPM, TPM/500) ≈ 800
Rerank: When RERANK_MAX_ASYNC is unset, auto-calculated as min(RPM, TPM/500) ≈ 800

✅ Recommended: Don't set *_MAX_ASYNC, let the system auto-calculate to completely avoid 429 errors

See env.example for complete configuration.

📚 API Documentation

Core Endpoints

1️⃣ Upload Document

# Single file upload (default mode)
curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc1" \
 -F "file=@document.pdf" \
 -F "parser=auto"
# VLM chart enhancement mode 🆕
# off: Markdown only (fastest, default)
curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc2&vlm_mode=off" \
 -F "file=@document.pdf"
# selective: Selective processing of important charts (balance performance and quality)
curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc3&vlm_mode=selective" \
 -F "file=@document.pdf"
# full: Complete RAG-Anything processing (highest quality, context enhancement enabled)
curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc4&vlm_mode=full" \
 -F "file=@document.pdf"
# Response
{
 "task_id": "task-xxx-xxx",
 "doc_id": "doc1",
 "filename": "document.pdf",
 "vlm_mode": "off",
 "status": "pending"
}

2️⃣ Batch Upload

curl -X POST "http://localhost:8000/batch?tenant_id=your_tenant" \
 -F "files=@doc1.pdf" \
 -F "files=@doc2.docx" \
 -F "files=@image.png"
# Response
{
 "batch_id": "batch-xxx-xxx",
 "total_files": 3,
 "accepted_files": 3,
 "tasks": [...]
}

3️⃣ Intelligent Query (Query API v2.0)

New Advanced Features:

✨ Conversation History: Support for multi-turn conversation context
✨ Custom Prompts: Customize response style
✨ Response Format Control: paragraph/list/json
✨ Keyword Precision Retrieval: hl_keywords/ll_keywords
✨ Streaming Output: Real-time generation viewing

# Basic query
curl -X POST "http://localhost:8000/query?tenant_id=your_tenant" \
 -H "Content-Type: application/json" \
 -d '{
 "query": "What are the core viewpoints in the document?",
 "mode": "hybrid"
 }'
# Advanced query (multi-turn dialogue + custom prompt)
curl -X POST "http://localhost:8000/query?tenant_id=your_tenant" \
 -H "Content-Type: application/json" \
 -d '{
 "query": "Can you elaborate on the second point?",
 "mode": "hybrid",
 "conversation_history": [
 {"role": "user", "content": "What are the key points?"},
 {"role": "assistant", "content": "There are mainly three points..."}
 ],
 "user_prompt": "Please answer in professional academic language",
 "response_type": "list"
 }'
# Streaming query (SSE)
curl -N -X POST "http://localhost:8000/query/stream?tenant_id=your_tenant" \
 -H "Content-Type: application/json" \
 -d '{
 "query": "What are the core viewpoints in the document?",
 "mode": "hybrid"
 }'
# Response (real-time streaming output)
data: {"chunk": "Based on", "done": false}
data: {"chunk": "document content", "done": false}
data: {"done": true}

4️⃣ Task Status Query

curl "http://localhost:8000/task/task-xxx-xxx?tenant_id=your_tenant"
# Response
{
 "task_id": "task-xxx-xxx",
 "status": "completed",
 "progress": 100,
 "result": {...}
}

5️⃣ Tenant Management

# Get tenant statistics
curl "http://localhost:8000/tenants/stats?tenant_id=your_tenant"
# Clear tenant cache
curl -X DELETE "http://localhost:8000/tenants/cache?tenant_id=your_tenant"
# View instance pool status (admin)
curl "http://localhost:8000/tenants/pool/stats"

VLM Mode Comparison 🆕

Mode	Speed	Quality	Resource Usage	Use Case
`off`	⚡⚡⚡⚡⚡	⭐⭐⭐	Very Low	Plain text documents, fast batch processing
`selective`	⚡⚡⚡⚡	⭐⭐⭐⭐	Low	Documents with key charts (recommended)
`full`	⚡⚡	⭐⭐⭐⭐⭐	High	Chart-intensive research reports, papers

Processing Time Estimate (20-page PDF example):

off: ~10 seconds(Markdown only)
selective: ~30 seconds(5-10 important charts)
full: ~120 seconds(complete context processing)

Query Mode Comparison

Mode	Speed	Accuracy	Use Case
`naive`	⚡⚡⚡⚡⚡	⭐⭐⭐	Simple Q&A, fast retrieval
`local`	⚡⚡⚡⚡	⭐⭐⭐⭐	Local entity relationship queries
`global`	⚡⚡⚡	⭐⭐⭐⭐	Global knowledge graph reasoning
`hybrid`	⚡⚡⚡	⭐⭐⭐⭐⭐	Hybrid retrieval (recommended)
`mix`	⚡⚡	⭐⭐⭐⭐⭐	Complex questions, deep analysis

Query API v2.0 Advanced Parameters

Parameter	Type	Description	Example
`conversation_history`	List[Dict]	Multi-turn conversation context	`[{"role": "user", "content": "..."}]`
`user_prompt`	str	Custom prompt	"Please answer in professional academic language"
`response_type`	str	Response format	"paragraph", "list", "json"
`hl_keywords`	List[str]	High priority keywords	`["artificial intelligence", "machine learning"]`
`ll_keywords`	List[str]	Low priority keywords	`["application", "case study"]`
`only_need_context`	bool	Return context only (debug)	`true`
`max_entity_tokens`	int	Entity token limit	`6000`

Complete API documentation:http://localhost:8000/docs

🎯 Usage Examples

Python SDK

import requests
# Configuration
BASE_URL = "http://localhost:8000"
TENANT_ID = "your_tenant"
# Upload document
with open("document.pdf", "rb") as f:
 response = requests.post(
 f"{BASE_URL}/insert",
 params={"tenant_id": TENANT_ID, "doc_id": "doc1"},
 files={"file": f}
 )
 task_id = response.json()["task_id"]
 print(f"Task ID: {task_id}")
# Query
response = requests.post(
 f"{BASE_URL}/query",
 params={"tenant_id": TENANT_ID},
 json={
 "query": "What is the main content of the document?",
 "mode": "hybrid",
 "top_k": 10
 }
)
result = response.json()
print(f"Answer: {result['answer']}")

Complete cURL Example

# 1. Upload PDF document
TASK_ID=$(curl -X POST "http://localhost:8000/insert?tenant_id=demo&doc_id=report" \
 -F "file=@report.pdf" | jq -r '.task_id')
echo "Task ID: $TASK_ID"
# 2. Wait for processing completion
while true; do
 STATUS=$(curl -s "http://localhost:8000/task/$TASK_ID?tenant_id=demo" | jq -r '.status')
 echo "Status: $STATUS"
 if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ]; then
 break
 fi
 sleep 2
done
# 3. Query document content
curl -X POST "http://localhost:8000/query?tenant_id=demo" \
 -H "Content-Type: application/json" \
 -d '{
 "query": "What are the main conclusions of this report?",
 "mode": "hybrid"
 }' | jq '.answer'

🛠️ Deployment

System Requirements

Minimum Configuration:

CPU: 2 cores
RAM: 4GB
Disk: 40GB SSD
OS: Ubuntu 20.04+ / Debian 11+ / CentOS 8+

Recommended Configuration (Production):

CPU: 4 cores
RAM: 8GB
Disk: 100GB SSD
OS: Ubuntu 22.04 LTS

Server Deployment

Quick Deployment on Aliyun/Tencent Cloud

# SSH login to server
ssh root@your-server-ip
# Clone project
git clone https://github.com/BukeLy/rag-api.git
cd rag-api
# Run one-click deployment script
chmod +x deploy.sh
./deploy.sh
# The script will automatically:
# 1. Install Docker and Docker Compose
# 2. Configure environment variables
# 3. Optimize system parameters
# 4. Start services
# 5. Verify health status

External Storage Configuration

Supports DragonflyDB + Qdrant + Memgraph external storage (enabled by default):

# Configure in .env
USE_EXTERNAL_STORAGE=true
# DragonflyDB configuration (KV Storage)
KV_STORAGE=RedisKVStorage
REDIS_URI=redis://dragonflydb:6379/0
# Qdrant configuration (vector storage)
VECTOR_STORAGE=QdrantVectorDBStorage
QDRANT_URL=http://qdrant:6333
# Memgraph configuration (graph storage)
GRAPH_STORAGE=MemgraphStorage
MEMGRAPH_URI=bolt://memgraph:7687
MEMGRAPH_USERNAME=
MEMGRAPH_PASSWORD=

See External Storage Deployment Documentation。

Docker Compose Configuration

The project provides two configuration files:

File	Purpose	Features
`docker-compose.yml`	Production mode	Code packaged in image, optimal performance
`docker-compose.dev.yml`	Development mode	Code mounted externally, supports hot-reload

Select configuration file:

# Production mode
docker compose -f docker-compose.yml up -d
# Development mode
docker compose -f docker-compose.dev.yml up -d

Performance Optimization

Tuning Parameters

Configure in .env:

# ⚡ Concurrency Control (Recommended: use auto-calculation)
# LLM_MAX_ASYNC=8 # [Expert mode] Manually specify LLM concurrency
# # Auto-calculated when unset: min(RPM, TPM/3500) ≈ 11
# EMBEDDING_MAX_ASYNC=32 # [Expert mode] Manually specify Embedding concurrency
# # Auto-calculated when unset: min(RPM, TPM/500) ≈ 800
# RERANK_MAX_ASYNC=16 # [Expert mode] Manually specify Rerank concurrency
# # Auto-calculated when unset: min(RPM, TPM/500) ≈ 800
# Retrieval count (affects query quality and speed)
TOP_K=20 # Entity/relationship retrieval count
CHUNK_TOP_K=10 # Text chunk retrieval count
# Document processing concurrency
DOCUMENT_PROCESSING_CONCURRENCY=10 # Remote mode can be set high, local mode set to 1

🎯 Concurrency Configuration Recommendations:

Recommended: Don't set *_MAX_ASYNC, let the system auto-calculate based on TPM/RPM
Expert mode: If manual control needed, can set LLM_MAX_ASYNC and other parameters
Advantage: Auto-calculation completely avoids 429 errors (TPM limit reached)

Mode Selection

MinerU Remote Mode (Recommended): High concurrency, resource-efficient
MinerU Local Mode: Requires GPU, high memory usage
Docling Mode: Fast and lightweight, suitable for simple documents

🏢 Multi-tenant Usage

Tenant Isolation

Each tenant has:

✅ Independent LightRAG instance
✅ Isolated data storage space
✅ Independent vector index
✅ Dedicated knowledge graph
✅ Independent service configuration (LLM, Embedding, Rerank, DeepSeek-OCR, MinerU)🆕

Tenant Configuration Management 🆕

Each tenant can independently configure 5 services with hot-reload support:

# 1️⃣ Configure independent DeepSeek-OCR API key for Tenant A
curl -X PUT "http://localhost:8000/tenants/tenant_a/config" \
 -H "Content-Type: application/json" \
 -d '{
 "ds_ocr_config": {
 "api_key": "sk-tenant-a-ds-ocr-key",
 "base_url": "https://api.siliconflow.cn/v1",
 "model": "deepseek-ai/DeepSeek-OCR",
 "timeout": 90
 }
 }'
# 2️⃣ Configure independent MinerU API token for Tenant B
curl -X PUT "http://localhost:8000/tenants/tenant_b/config" \
 -H "Content-Type: application/json" \
 -d '{
 "mineru_config": {
 "api_token": "tenant-b-mineru-token",
 "base_url": "https://mineru.net",
 "model_version": "vlm"
 }
 }'
# 3️⃣ Configure multiple services simultaneously (LLM + Embedding + DeepSeek-OCR)
curl -X PUT "http://localhost:8000/tenants/tenant_c/config" \
 -H "Content-Type: application/json" \
 -d '{
 "llm_config": {
 "api_key": "sk-tenant-c-llm-key",
 "model": "gpt-4"
 },
 "embedding_config": {
 "api_key": "sk-tenant-c-embedding-key",
 "model": "Qwen/Qwen3-Embedding-0.6B",
 "dim": 1024
 },
 "ds_ocr_config": {
 "api_key": "sk-tenant-c-ds-ocr-key"
 }
 }'
# 4️⃣ Query tenant configuration (API key auto-masked)
curl "http://localhost:8000/tenants/tenant_a/config"
# Response example
{
 "tenant_id": "tenant_a",
 "ds_ocr_config": {
 "api_key": "sk-***-key", // Auto-masked
 "timeout": 90
 },
 "merged_config": {
 "llm": {...}, // Using Global Config
 "embedding": {...}, // Using Global Config
 "rerank": {...}, // Using Global Config
 "ds_ocr": {...}, // Using tenant config
 "mineru": {...} // Using Global Config
 }
}
# 5️⃣ Refresh config cache (config hot-reload)
curl -X POST "http://localhost:8000/tenants/tenant_a/config/refresh"
# 6️⃣ Delete tenant config (restore to global config)
curl -X DELETE "http://localhost:8000/tenants/tenant_a/config"

Supported Configuration Items:

Service	Config Field	Description
LLM	`llm_config`	Model, API key, base_url, etc.
Embedding	`embedding_config`	Model, API key, dimension, etc.
Rerank	`rerank_config`	Model, API key, etc.
DeepSeek-OCR	`ds_ocr_config`	API key, timeout, mode, etc.
MinerU	`mineru_config`	API token, version, timeout, etc.

Configuration Priority: Tenant config > Global config

Use Cases:

🔐 Multi-tenant SaaS: Each tenant uses their own API key
💰 Pay-per-use: Track tenant usage through independent API keys
🎯 Differentiated Services: Different tenants use different models (GPT-4 vs GPT-3.5)
🧪 A/B Testing: Compare different models/parameters

Usage

All APIs require tenant_id parameter:

# Tenant A upload document
curl -X POST "http://localhost:8000/insert?tenant_id=tenant_a&doc_id=doc1" \
 -F "file=@doc.pdf"
# Tenant B upload document (fully isolated)
curl -X POST "http://localhost:8000/insert?tenant_id=tenant_b&doc_id=doc1" \
 -F "file=@doc.pdf"
# Tenant A query (can only query own documents)
curl -X POST "http://localhost:8000/query?tenant_id=tenant_a" \
 -H "Content-Type: application/json" \
 -d '{"query": "document content", "mode": "hybrid"}'

Instance Pool Management

Capacity: Cache up to 50 tenant instances
Strategy: LRU (Least Recently Used) automatic cleanup
Config Isolation: Each tenant can use independent LLM, Embedding, parser configuration

📊 Monitoring & Maintenance

Common Commands

# View service status
docker compose ps
# View real-time logs
docker compose logs -f
# Restart services
docker compose restart
# Stop services
docker compose down
# View resource usage
docker stats
# Clean Docker resources
docker system prune -f

Maintenance Scripts

# Monitor service health
./scripts/monitor.sh
# Backup data
./scripts/backup.sh
# Update services
./scripts/update.sh
# Performance testing
./scripts/test_concurrent_perf.sh
# Performance monitoring
./scripts/monitor_performance.sh

Health Checks

# Complete health check (recommended)
./scripts/health_check.sh
./scripts/health_check.sh --verbose # verbose output
# API health check
curl http://localhost:8000/
# Tenant statistics
curl "http://localhost:8000/tenants/stats?tenant_id=your_tenant"
# Instance pool status
curl "http://localhost:8000/tenants/pool/stats"

🗂️ Project Structure

rag-api/
├── main.py # FastAPI application entry
├── api/ # API route modules
│ ├── __init__.py # Route aggregation
│ ├── insert.py # Document upload (single/batch)
│ ├── query.py # Intelligent query
│ ├── task.py # Task status query
│ ├── tenant.py # Tenant management
│ ├── files.py # File service
│ ├── models.py # Pydantic models
│ └── task_store.py # Task storage
├── src/ # Core business logic
│ ├── rag.py # LightRAG lifecycle management
│ ├── multi_tenant.py # Multi-tenant instance manager
│ ├── tenant_deps.py # Tenant dependency injection
│ ├── logger.py # Unified logging
│ ├── metrics.py # Performance metrics
│ ├── file_url_service.py # Temporary file service
│ ├── mineru_client.py # MinerU client
│ └── mineru_result_processor.py # Result processing
├── docs/ # Documentation
│ ├── ARCHITECTURE.md # Architecture design documentation
│ ├── USAGE.md # Detailed usage guide
│ ├── DEPLOY_MODES.md # Deployment mode description
│ ├── PR_WORKFLOW.md # PR workflow
│ └── ...
├── scripts/ # Maintenance scripts
│ ├── dev.sh # Development mode quick start
│ ├── monitor.sh # Service monitoring
│ ├── backup.sh # Data backup
│ ├── update.sh # Service update
│ └── ...
├── deploy.sh # One-click deployment script
├── docker-compose.yml # Production mode configuration
├── docker-compose.dev.yml # Development mode configuration
├── Dockerfile # Production image
├── Dockerfile.dev # Development image
├── pyproject.toml # Project dependencies
├── uv.lock # Dependency lock
├── env.example # Environment variable template
├── CLAUDE.md # Claude AI guide
└── README.md # This documentation

🐛 Troubleshooting

Common Issues

Q1: What to do if service fails to start?

# View detailed logs
docker compose logs
# Check port usage
netstat -tulpn | grep 8000
# Check Docker status
docker ps -a

Q2: multimodal_processed error?

Note: This issue has been fixed in LightRAG 1.4.9.4+. If you encounter this error, your version is outdated.

Solution:

# Option 1: Upgrade to latest version (recommended)
# Modify LightRAG version in pyproject.toml
# lightrag = "^1.4.9.4"
# Rebuild image
docker compose down
docker compose up -d --build
# Option 2: Clean old data (temporary solution)
rm -rf ./rag_local_storage
docker compose restart

Q3: File upload returns 400 error?

Check:

File format supported (PDF, DOCX, PNG, JPG, etc.)
File size exceeds 100MB
File is empty

# View supported formats
curl http://localhost:8000/docs

Q3.5: Embedding dimension error?

If you encounter dimension-related errors, need to clean data and rebuild:

# Stop services
docker compose down
# Delete all volumes (clear database)
docker volume rm rag-api_dragonflydb_data rag-api_qdrant_data rag-api_memgraph_data
# Modify EMBEDDING_DIM in .env
EMBEDDING_DIM=1024 # or 4096, must match the model
# Restart
docker compose up -d

Q4: Query is very slow (>30 seconds)?

Optimization suggestions:

Use naive or hybrid mode instead of mix
Increase MAX_ASYNC parameter (in .env)
Reduce TOP_K and CHUNK_TOP_K
Enable Reranker

# Modify .env
MAX_ASYNC=8
TOP_K=20
CHUNK_TOP_K=10

Q5: Out of memory (OOM)?

If using local MinerU:

# Switch to remote mode
# Modify in .env
MINERU_MODE=remote
MINERU_API_TOKEN=your_token
# Or limit concurrency
DOCUMENT_PROCESSING_CONCURRENCY=1

Q6: Tasks lost after container restart?

Problem Symptoms:

Cannot query previous task status after container restart
Tasks disappear after tenant instance evicted by LRU

Solution: Enable Redis task storage

# Modify .env
TASK_STORE_STORAGE=redis
# Restart services
docker compose restart
# Verify
docker compose logs api | grep TaskStore
# Should see: ✅ TaskStore: Redis connection successful

Configuration Description:

memory mode: In-memory storage, data lost after restart (default, suitable for development)
redis mode: Persistent storage, supports container restart and instance rebuild (production recommended)

TTL Strategy (Redis mode auto-cleanup):

completed tasks: 24 hours
failed tasks: 24 hours
pending/processing tasks: 6 hours

Q7: VLM mode processing failed?

Check Items:

vision_model_func not configured
- Check logs:vision_model_func not found, fallback to off mode
- Ensure LLM API is configured in .env
Image file does not exist
- Check logs:Image file not found: xxx
- Possibly corrupted MinerU ZIP or extraction failed
Timeout error
- full mode may timeout on large files
- Suggestion: Use selective mode first, or increase VLM_TIMEOUT

# Modify .env
VLM_TIMEOUT=300 # Increase to 5 minutes
RAG_VLM_MODE=selective # downgrade to selective

Debugging Tips:

# View detailed logs
docker compose logs -f | grep VLM
# Test single file
curl -X POST 'http://localhost:8000/insert?tenant_id=test&doc_id=test&vlm_mode=off' \
 -F 'file=@test.pdf'

Performance Tuning Recommendations

Scenario	MAX_ASYNC	TOP_K	CHUNK_TOP_K	MINERU_MODE
Fast response	8	10	5	remote
Balanced mode	8	20	10	remote
High accuracy	4	60	20	remote
Resource limited	4	20	10	remote

📖 Documentation

📘 Architecture Design Documentation - Detailed system architecture and design concepts
📗 Usage Guide - Complete API usage documentation and examples
📙 Deployment Mode Description - Production mode vs Development mode
📕 PR Workflow - Process guide for code contribution
📔 External Storage Deployment - Redis/PostgreSQL/Neo4j configuration
📊 API Comparison Analysis - rag-api vs LightRAG official API comparison
🌐 WebUI Integration Guide - Knowledge graph visualization integration

🤝 Contributing

We welcome all forms of contribution!

How to Contribute

Fork the project

git clone https://github.com/BukeLy/rag-api.git
cd rag-api

Create feature branch

git checkout -b feature/your-feature-name

Development and Testing

# Install dependencies
uv sync
# Run tests
uv run pytest
# Code formatting
uv run black .
uv run isort .

Submit code

git add .
git commit -m "feat: Add new feature"
git push origin feature/your-feature-name

Create Pull Request

Create a PR on GitHub with detailed description of your changes.

Commit Conventions

Use semantic commit messages:

feat: New feature
fix: Bug fix
docs: Documentation update
style: Code formatting
refactor: Code refactoring
perf: Performance optimization
test: Testing
chore: Build/tools

See PR Workflow Documentation。

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

🙏 Acknowledgments

This project is built on the following excellent open source projects:

LightRAG - Efficient knowledge graph RAG framework
RAG-Anything - Multimodal document parsing
MinerU - Powerful PDF parsing tool
Docling - Lightweight document parsing
FastAPI - Modern Python web framework

Special thanks to all contributors and users for their support! 🎉

📬 Contact Us

GitHub: @BukeLy
Email: buledream233@gmail.com
Issues: Submit Issue
Discussions: Join Discussion

⭐ If this project helps you, please give it a Star!

Made with ❤️ by BukeLy

Folders and files

Latest commit

History

Repository files navigation

🚀 RAG API

📖 Introduction

🎯 Key Highlights

✨ Features

📄 Document Processing

🔍 Intelligent Retrieval

🏗️ Architecture

System Architecture Diagram

Multi-tenant Architecture

Core Technology Stack

🚀 Quick Start

Option 1: One-click Deployment (Recommended)

Option 2: Docker Compose

Production Mode

Development Mode (Code Hot-reload)

Option 3: Local Development

Environment Variable Configuration

📚 API Documentation

Core Endpoints

1️⃣ Upload Document

2️⃣ Batch Upload

3️⃣ Intelligent Query (Query API v2.0)

4️⃣ Task Status Query

5️⃣ Tenant Management

VLM Mode Comparison 🆕

Query Mode Comparison

Query API v2.0 Advanced Parameters

🎯 Usage Examples

Python SDK

Complete cURL Example

🛠️ Deployment

System Requirements

Server Deployment

Quick Deployment on Aliyun/Tencent Cloud

External Storage Configuration

Docker Compose Configuration

Performance Optimization

Tuning Parameters

Mode Selection

🏢 Multi-tenant Usage

Tenant Isolation

Tenant Configuration Management 🆕

Usage

Instance Pool Management

📊 Monitoring & Maintenance

Common Commands

Maintenance Scripts

Health Checks

🗂️ Project Structure

🐛 Troubleshooting

Common Issues

Performance Tuning Recommendations

📖 Documentation

🤝 Contributing

How to Contribute

Commit Conventions

📄 License

🙏 Acknowledgments

📬 Contact Us

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages