Multi-tenant Multimodal Document Intelligent Retrieval System
Enterprise-grade RAG service built on RAG-Anything and LightRAG
CI Python FastAPI LightRAG Docker License
Features • Quick Start • Architecture • API Documentation • Deployment
RAG API is an enterprise-grade Retrieval-Augmented Generation (RAG) service that combines the powerful document parsing capabilities of RAG-Anything with the efficient knowledge graph retrieval technology of LightRAG, providing intelligent Q&A capabilities for your documents.
- 🏢 Multi-tenant Isolation - Complete tenant data isolation for enterprise multi-tenant scenarios
- 🎨 Multimodal Parsing - Support for PDF, Word, images and more, with full OCR, tables, and formulas coverage
- ⚡ High-performance Retrieval - Knowledge graph-based hybrid retrieval with 6-15 second query response
- 🔄 Flexible Deployment - Support for production and development modes with one-click switching
- 📦 Ready to Use - One-click Docker deployment, service starts in 3 minutes
- 🎛️ Multiple Parsing Engines - DeepSeek-OCR (Remote API) + MinerU (Local/Remote API) + Docling (Fast)
- 🎨 RAG-Anything VLM Enhancement - Three modes (off/selective/full) for deep chart understanding
- 💾 Task Persistence - Redis storage support, tasks recoverable after container restart/instance rebuild
-
✅ Multiple Format Support
- PDF, Word, Excel, PPT
- PNG, JPG, WebP images
- TXT, Markdown text
-
✅ Intelligent Parsing
- Plain text (.txt, .md) → Direct insertion (ultra-fast ~1s, skip parser)
- OCR text recognition
- Structured table extraction
- Mathematical formula recognition
- Layout analysis
-
✅ RAG-Anything VLM Enhancement 🆕
off- Markdown only (fastest)selective- Selective processing of important chartsfull- Complete context enhancement processing- Smart filtering: with titles, large size, first page content
⚠️ Only supports remote MinerU mode, local mode uses RAG-Anything native methods
-
✅ Batch Processing
- Up to 100 files per batch
- Async task queue
- Real-time progress tracking
-
✅ Multi-mode Query
naive- Vector retrieval (fastest)local- Local graphglobal- Global graphhybrid- Hybrid retrievalmix- Full retrieval (most accurate)
-
✅ Knowledge Graph
- Automatic entity extraction
- Relationship reasoning
- Semantic understanding
- Context enhancement
-
✅ External Storage
- DragonflyDB (KV storage + task storage)
- Qdrant (vector storage)
- Memgraph (graph database)
- Task persistence (Redis mode)
graph TB
subgraph "Client Layer"
Client[Client Application]
WebUI[Web Interface]
end
subgraph "API Gateway Layer"
FastAPI[FastAPI Service]
Auth[Tenant Authentication]
end
subgraph "Business Logic Layer"
TenantMgr[Tenant Manager]
TaskQueue[Task Queue]
subgraph "Document Processing"
DeepSeekOCR[DeepSeek-OCR<br/>Fast OCR 80% cases]
MinerU[MinerU Parser<br/>Complex multimodal]
Docling[Docling Parser<br/>Fast lightweight]
FileRouter[Smart Router<br/>Complexity scoring]
end
subgraph "RAG Engine"
LightRAG[LightRAG Instance Pool<br/>LRU Cache 50]
KG[Knowledge Graph Engine]
Vector[Vector Retrieval Engine]
end
end
subgraph "Storage Layer"
DragonflyDB[(DragonflyDB<br/>KV Storage)]
Qdrant[(Qdrant<br/>Vector Database)]
Memgraph[(Memgraph<br/>Graph Database)]
Local[(Local Files<br/>Temp Storage)]
end
subgraph "External Services"
LLM[LLM<br/>Entity Extraction/Generation]
Embedding[Embedding<br/>Vectorization]
Rerank[Rerank<br/>Reranking]
end
Client --> FastAPI
WebUI --> FastAPI
FastAPI --> Auth
Auth --> TenantMgr
TenantMgr --> TaskQueue
TenantMgr --> LightRAG
TaskQueue --> FileRouter
FileRouter --> DeepSeekOCR
FileRouter --> MinerU
FileRouter --> Docling
DeepSeekOCR --> LightRAG
MinerU --> LightRAG
Docling --> LightRAG
LightRAG --> KG
LightRAG --> Vector
KG --> DragonflyDB
KG --> Memgraph
Vector --> Qdrant
LightRAG --> Local
LightRAG --> LLM
LightRAG --> Embedding
Vector --> Rerank
style FastAPI fill:#00C7B7
style LightRAG fill:#FF6B6B
style DeepSeekOCR fill:#5DADE2
style MinerU fill:#4ECDC4
style Docling fill:#95E1D3
style TenantMgr fill:#F38181
graph TB
subgraph "Tenant A"
A_Config[Tenant A Config<br/>Independent API Key]
A_Instance[LightRAG Instance A<br/>Dedicated LLM/Embedding]
A_Data[(Tenant A Data<br/>Fully Isolated)]
A_Config --> A_Instance
A_Instance --> A_Data
end
subgraph "Tenant B"
B_Config[Tenant B Config<br/>Independent API Key]
B_Instance[LightRAG Instance B<br/>Dedicated LLM/Embedding]
B_Data[(Tenant B Data<br/>Fully Isolated)]
B_Config --> B_Instance
B_Instance --> B_Data
end
subgraph "Tenant C"
C_Config[Using Global Config]
C_Instance[LightRAG Instance C<br/>Shared LLM/Embedding]
C_Data[(Tenant C Data<br/>Fully Isolated)]
C_Config --> C_Instance
C_Instance --> C_Data
end
Pool[Instance Pool Manager<br/>LRU Cache + Config Isolation]
Global[Global Config<br/>Default API Key]
Pool --> A_Instance
Pool --> B_Instance
Pool --> C_Instance
C_Config -.fallback.-> Global
style Pool fill:#F38181
style Global fill:#95E1D3
style A_Config fill:#FFD93D
style B_Config fill:#FFD93D
style C_Config fill:#E8E8E8
🔧 Frameworks & Runtime
- FastAPI 0.115+
- Python 3.11+
- Uvicorn
- Docker & Docker Compose
🧠 AI & RAG
- LightRAG 1.4.9.4
- RAG-Anything
- MinerU (PDF-Extract-Kit)
- Docling
💾 Storage & Database
- DragonflyDB(Redis compatible)
- Qdrant(Vector Database)
- Memgraph(Graph Database)
- Local filesystem
Suitable for production and testing environments:
# 1. Clone the project git clone https://github.com/BukeLy/rag-api.git cd rag-api # 2. Configure environment variables cp env.example .env nano .env # Fill in your API keys # 3. Run deployment script chmod +x deploy.sh ./deploy.sh # Select deployment mode: # 1) Production Mode - Standard container deployment # 2) Development Mode - Code hot-reload # 4. Verify service curl http://localhost:8000/
Access Swagger Documentation: http://localhost:8000/docs
# Configure environment variables cp env.example .env nano .env # Start services docker compose -f docker-compose.yml up -d # View logs docker compose -f docker-compose.yml logs -f
# Start development environment docker compose -f docker-compose.dev.yml up -d # Or use quick script ./scripts/dev.sh # Code changes will auto-reload without restart
# Install uv (Python package manager) curl -LsSf https://astral.sh/uv/install.sh | sh # Install dependencies uv sync # Configure environment variables cp env.example .env nano .env # Start services uv run uvicorn main:app --host 0.0.0.0 --port 8000 --reload
Minimum configuration (required):
# LLM Configuration (Function-oriented naming) LLM_API_KEY=your_llm_api_key LLM_BASE_URL=https://ark.cn-beijing.volces.com/api/v3 LLM_MODEL=ep-xxx-xxx # LLM_REQUESTS_PER_MINUTE=800 # Rate limit (optional) # LLM_TOKENS_PER_MINUTE=40000 # Rate limit (optional) # LLM_MAX_ASYNC=8 # [Optional, expert mode] Manual concurrency control # # Auto-calculated when unset: min(RPM, TPM/3500) = 11 # Embedding Configuration (Function-oriented naming) EMBEDDING_API_KEY=your_embedding_api_key EMBEDDING_BASE_URL=https://api.siliconflow.cn/v1 EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B EMBEDDING_DIM=1024 # EMBEDDING_MAX_ASYNC=32 # [Optional, expert mode] Auto-calculated when unset: 800 # MinerU Mode (Remote recommended) MINERU_MODE=remote MINERU_API_TOKEN=your_token MINERU_HTTP_TIMEOUT=60 # MinerU download timeout (seconds, default 60) FILE_SERVICE_BASE_URL=http://your-ip:8000 # VLM Chart Enhancement Configuration 🆕 # ⚠️ Note: Only effective in MINERU_MODE=remote RAG_VLM_MODE=off # off / selective / full RAG_IMPORTANCE_THRESHOLD=0.5 # Importance threshold (selective mode) RAG_CONTEXT_WINDOW=2 # Context window (full mode) RAG_CONTEXT_MODE=page # page / chunk RAG_MAX_CONTEXT_TOKENS=3000 # Max context tokens # Task Storage Configuration 🆕 TASK_STORE_STORAGE=redis # memory / redis (production recommends redis) # Document Insert Verification Configuration 🆕 DOC_INSERT_VERIFICATION_TIMEOUT=300 # Verification timeout (seconds, default 5 minutes) DOC_INSERT_VERIFICATION_POLL_INTERVAL=0.5 # Poll interval (seconds, default 500ms) # Model Call Timeout Configuration 🆕 MODEL_CALL_TIMEOUT=90 # Model call max timeout (seconds, default 90)
⚡ Auto Concurrency Calculation:
- LLM: When
LLM_MAX_ASYNCis unset, auto-calculated asmin(RPM, TPM/3500)≈ 11 - Embedding: When
EMBEDDING_MAX_ASYNCis unset, auto-calculated asmin(RPM, TPM/500)≈ 800 - Rerank: When
RERANK_MAX_ASYNCis unset, auto-calculated asmin(RPM, TPM/500)≈ 800
✅ Recommended: Don't set *_MAX_ASYNC, let the system auto-calculate to completely avoid 429 errors
See env.example for complete configuration.
# Single file upload (default mode) curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc1" \ -F "file=@document.pdf" \ -F "parser=auto" # VLM chart enhancement mode 🆕 # off: Markdown only (fastest, default) curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc2&vlm_mode=off" \ -F "file=@document.pdf" # selective: Selective processing of important charts (balance performance and quality) curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc3&vlm_mode=selective" \ -F "file=@document.pdf" # full: Complete RAG-Anything processing (highest quality, context enhancement enabled) curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc4&vlm_mode=full" \ -F "file=@document.pdf" # Response { "task_id": "task-xxx-xxx", "doc_id": "doc1", "filename": "document.pdf", "vlm_mode": "off", "status": "pending" }
curl -X POST "http://localhost:8000/batch?tenant_id=your_tenant" \ -F "files=@doc1.pdf" \ -F "files=@doc2.docx" \ -F "files=@image.png" # Response { "batch_id": "batch-xxx-xxx", "total_files": 3, "accepted_files": 3, "tasks": [...] }
New Advanced Features:
- ✨ Conversation History: Support for multi-turn conversation context
- ✨ Custom Prompts: Customize response style
- ✨ Response Format Control: paragraph/list/json
- ✨ Keyword Precision Retrieval: hl_keywords/ll_keywords
- ✨ Streaming Output: Real-time generation viewing
# Basic query curl -X POST "http://localhost:8000/query?tenant_id=your_tenant" \ -H "Content-Type: application/json" \ -d '{ "query": "What are the core viewpoints in the document?", "mode": "hybrid" }' # Advanced query (multi-turn dialogue + custom prompt) curl -X POST "http://localhost:8000/query?tenant_id=your_tenant" \ -H "Content-Type: application/json" \ -d '{ "query": "Can you elaborate on the second point?", "mode": "hybrid", "conversation_history": [ {"role": "user", "content": "What are the key points?"}, {"role": "assistant", "content": "There are mainly three points..."} ], "user_prompt": "Please answer in professional academic language", "response_type": "list" }' # Streaming query (SSE) curl -N -X POST "http://localhost:8000/query/stream?tenant_id=your_tenant" \ -H "Content-Type: application/json" \ -d '{ "query": "What are the core viewpoints in the document?", "mode": "hybrid" }' # Response (real-time streaming output) data: {"chunk": "Based on", "done": false} data: {"chunk": "document content", "done": false} data: {"done": true}
curl "http://localhost:8000/task/task-xxx-xxx?tenant_id=your_tenant" # Response { "task_id": "task-xxx-xxx", "status": "completed", "progress": 100, "result": {...} }
# Get tenant statistics curl "http://localhost:8000/tenants/stats?tenant_id=your_tenant" # Clear tenant cache curl -X DELETE "http://localhost:8000/tenants/cache?tenant_id=your_tenant" # View instance pool status (admin) curl "http://localhost:8000/tenants/pool/stats"
| Mode | Speed | Quality | Resource Usage | Use Case |
|---|---|---|---|---|
off |
⚡⚡⚡⚡⚡ | ⭐⭐⭐ | Very Low | Plain text documents, fast batch processing |
selective |
⚡⚡⚡⚡ | ⭐⭐⭐⭐ | Low | Documents with key charts (recommended) |
full |
⚡⚡ | ⭐⭐⭐⭐⭐ | High | Chart-intensive research reports, papers |
Processing Time Estimate (20-page PDF example):
off: ~10 seconds(Markdown only)selective: ~30 seconds(5-10 important charts)full: ~120 seconds(complete context processing)
| Mode | Speed | Accuracy | Use Case |
|---|---|---|---|
naive |
⚡⚡⚡⚡⚡ | ⭐⭐⭐ | Simple Q&A, fast retrieval |
local |
⚡⚡⚡⚡ | ⭐⭐⭐⭐ | Local entity relationship queries |
global |
⚡⚡⚡ | ⭐⭐⭐⭐ | Global knowledge graph reasoning |
hybrid |
⚡⚡⚡ | ⭐⭐⭐⭐⭐ | Hybrid retrieval (recommended) |
mix |
⚡⚡ | ⭐⭐⭐⭐⭐ | Complex questions, deep analysis |
| Parameter | Type | Description | Example |
|---|---|---|---|
conversation_history |
List[Dict] | Multi-turn conversation context | [{"role": "user", "content": "..."}] |
user_prompt |
str | Custom prompt | "Please answer in professional academic language" |
response_type |
str | Response format | "paragraph", "list", "json" |
hl_keywords |
List[str] | High priority keywords | ["artificial intelligence", "machine learning"] |
ll_keywords |
List[str] | Low priority keywords | ["application", "case study"] |
only_need_context |
bool | Return context only (debug) | true |
max_entity_tokens |
int | Entity token limit | 6000 |
Complete API documentation:http://localhost:8000/docs
import requests # Configuration BASE_URL = "http://localhost:8000" TENANT_ID = "your_tenant" # Upload document with open("document.pdf", "rb") as f: response = requests.post( f"{BASE_URL}/insert", params={"tenant_id": TENANT_ID, "doc_id": "doc1"}, files={"file": f} ) task_id = response.json()["task_id"] print(f"Task ID: {task_id}") # Query response = requests.post( f"{BASE_URL}/query", params={"tenant_id": TENANT_ID}, json={ "query": "What is the main content of the document?", "mode": "hybrid", "top_k": 10 } ) result = response.json() print(f"Answer: {result['answer']}")
# 1. Upload PDF document TASK_ID=$(curl -X POST "http://localhost:8000/insert?tenant_id=demo&doc_id=report" \ -F "file=@report.pdf" | jq -r '.task_id') echo "Task ID: $TASK_ID" # 2. Wait for processing completion while true; do STATUS=$(curl -s "http://localhost:8000/task/$TASK_ID?tenant_id=demo" | jq -r '.status') echo "Status: $STATUS" if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ]; then break fi sleep 2 done # 3. Query document content curl -X POST "http://localhost:8000/query?tenant_id=demo" \ -H "Content-Type: application/json" \ -d '{ "query": "What are the main conclusions of this report?", "mode": "hybrid" }' | jq '.answer'
Minimum Configuration:
- CPU: 2 cores
- RAM: 4GB
- Disk: 40GB SSD
- OS: Ubuntu 20.04+ / Debian 11+ / CentOS 8+
Recommended Configuration (Production):
- CPU: 4 cores
- RAM: 8GB
- Disk: 100GB SSD
- OS: Ubuntu 22.04 LTS
# SSH login to server ssh root@your-server-ip # Clone project git clone https://github.com/BukeLy/rag-api.git cd rag-api # Run one-click deployment script chmod +x deploy.sh ./deploy.sh # The script will automatically: # 1. Install Docker and Docker Compose # 2. Configure environment variables # 3. Optimize system parameters # 4. Start services # 5. Verify health status
Supports DragonflyDB + Qdrant + Memgraph external storage (enabled by default):
# Configure in .env USE_EXTERNAL_STORAGE=true # DragonflyDB configuration (KV Storage) KV_STORAGE=RedisKVStorage REDIS_URI=redis://dragonflydb:6379/0 # Qdrant configuration (vector storage) VECTOR_STORAGE=QdrantVectorDBStorage QDRANT_URL=http://qdrant:6333 # Memgraph configuration (graph storage) GRAPH_STORAGE=MemgraphStorage MEMGRAPH_URI=bolt://memgraph:7687 MEMGRAPH_USERNAME= MEMGRAPH_PASSWORD=
See External Storage Deployment Documentation。
The project provides two configuration files:
| File | Purpose | Features |
|---|---|---|
docker-compose.yml |
Production mode | Code packaged in image, optimal performance |
docker-compose.dev.yml |
Development mode | Code mounted externally, supports hot-reload |
Select configuration file:
# Production mode docker compose -f docker-compose.yml up -d # Development mode docker compose -f docker-compose.dev.yml up -d
Configure in .env:
# ⚡ Concurrency Control (Recommended: use auto-calculation) # LLM_MAX_ASYNC=8 # [Expert mode] Manually specify LLM concurrency # # Auto-calculated when unset: min(RPM, TPM/3500) ≈ 11 # EMBEDDING_MAX_ASYNC=32 # [Expert mode] Manually specify Embedding concurrency # # Auto-calculated when unset: min(RPM, TPM/500) ≈ 800 # RERANK_MAX_ASYNC=16 # [Expert mode] Manually specify Rerank concurrency # # Auto-calculated when unset: min(RPM, TPM/500) ≈ 800 # Retrieval count (affects query quality and speed) TOP_K=20 # Entity/relationship retrieval count CHUNK_TOP_K=10 # Text chunk retrieval count # Document processing concurrency DOCUMENT_PROCESSING_CONCURRENCY=10 # Remote mode can be set high, local mode set to 1
🎯 Concurrency Configuration Recommendations:
- Recommended: Don't set
*_MAX_ASYNC, let the system auto-calculate based on TPM/RPM - Expert mode: If manual control needed, can set
LLM_MAX_ASYNCand other parameters - Advantage: Auto-calculation completely avoids 429 errors (TPM limit reached)
- MinerU Remote Mode (Recommended): High concurrency, resource-efficient
- MinerU Local Mode: Requires GPU, high memory usage
- Docling Mode: Fast and lightweight, suitable for simple documents
Each tenant has:
- ✅ Independent LightRAG instance
- ✅ Isolated data storage space
- ✅ Independent vector index
- ✅ Dedicated knowledge graph
- ✅ Independent service configuration (LLM, Embedding, Rerank, DeepSeek-OCR, MinerU)🆕
Each tenant can independently configure 5 services with hot-reload support:
# 1️⃣ Configure independent DeepSeek-OCR API key for Tenant A curl -X PUT "http://localhost:8000/tenants/tenant_a/config" \ -H "Content-Type: application/json" \ -d '{ "ds_ocr_config": { "api_key": "sk-tenant-a-ds-ocr-key", "base_url": "https://api.siliconflow.cn/v1", "model": "deepseek-ai/DeepSeek-OCR", "timeout": 90 } }' # 2️⃣ Configure independent MinerU API token for Tenant B curl -X PUT "http://localhost:8000/tenants/tenant_b/config" \ -H "Content-Type: application/json" \ -d '{ "mineru_config": { "api_token": "tenant-b-mineru-token", "base_url": "https://mineru.net", "model_version": "vlm" } }' # 3️⃣ Configure multiple services simultaneously (LLM + Embedding + DeepSeek-OCR) curl -X PUT "http://localhost:8000/tenants/tenant_c/config" \ -H "Content-Type: application/json" \ -d '{ "llm_config": { "api_key": "sk-tenant-c-llm-key", "model": "gpt-4" }, "embedding_config": { "api_key": "sk-tenant-c-embedding-key", "model": "Qwen/Qwen3-Embedding-0.6B", "dim": 1024 }, "ds_ocr_config": { "api_key": "sk-tenant-c-ds-ocr-key" } }' # 4️⃣ Query tenant configuration (API key auto-masked) curl "http://localhost:8000/tenants/tenant_a/config" # Response example { "tenant_id": "tenant_a", "ds_ocr_config": { "api_key": "sk-***-key", // Auto-masked "timeout": 90 }, "merged_config": { "llm": {...}, // Using Global Config "embedding": {...}, // Using Global Config "rerank": {...}, // Using Global Config "ds_ocr": {...}, // Using tenant config "mineru": {...} // Using Global Config } } # 5️⃣ Refresh config cache (config hot-reload) curl -X POST "http://localhost:8000/tenants/tenant_a/config/refresh" # 6️⃣ Delete tenant config (restore to global config) curl -X DELETE "http://localhost:8000/tenants/tenant_a/config"
Supported Configuration Items:
| Service | Config Field | Description |
|---|---|---|
| LLM | llm_config |
Model, API key, base_url, etc. |
| Embedding | embedding_config |
Model, API key, dimension, etc. |
| Rerank | rerank_config |
Model, API key, etc. |
| DeepSeek-OCR | ds_ocr_config |
API key, timeout, mode, etc. |
| MinerU | mineru_config |
API token, version, timeout, etc. |
Configuration Priority: Tenant config > Global config
Use Cases:
- 🔐 Multi-tenant SaaS: Each tenant uses their own API key
- 💰 Pay-per-use: Track tenant usage through independent API keys
- 🎯 Differentiated Services: Different tenants use different models (GPT-4 vs GPT-3.5)
- 🧪 A/B Testing: Compare different models/parameters
All APIs require tenant_id parameter:
# Tenant A upload document curl -X POST "http://localhost:8000/insert?tenant_id=tenant_a&doc_id=doc1" \ -F "file=@doc.pdf" # Tenant B upload document (fully isolated) curl -X POST "http://localhost:8000/insert?tenant_id=tenant_b&doc_id=doc1" \ -F "file=@doc.pdf" # Tenant A query (can only query own documents) curl -X POST "http://localhost:8000/query?tenant_id=tenant_a" \ -H "Content-Type: application/json" \ -d '{"query": "document content", "mode": "hybrid"}'
- Capacity: Cache up to 50 tenant instances
- Strategy: LRU (Least Recently Used) automatic cleanup
- Config Isolation: Each tenant can use independent LLM, Embedding, parser configuration
# View service status docker compose ps # View real-time logs docker compose logs -f # Restart services docker compose restart # Stop services docker compose down # View resource usage docker stats # Clean Docker resources docker system prune -f
# Monitor service health ./scripts/monitor.sh # Backup data ./scripts/backup.sh # Update services ./scripts/update.sh # Performance testing ./scripts/test_concurrent_perf.sh # Performance monitoring ./scripts/monitor_performance.sh
# Complete health check (recommended) ./scripts/health_check.sh ./scripts/health_check.sh --verbose # verbose output # API health check curl http://localhost:8000/ # Tenant statistics curl "http://localhost:8000/tenants/stats?tenant_id=your_tenant" # Instance pool status curl "http://localhost:8000/tenants/pool/stats"
rag-api/
├── main.py # FastAPI application entry
├── api/ # API route modules
│ ├── __init__.py # Route aggregation
│ ├── insert.py # Document upload (single/batch)
│ ├── query.py # Intelligent query
│ ├── task.py # Task status query
│ ├── tenant.py # Tenant management
│ ├── files.py # File service
│ ├── models.py # Pydantic models
│ └── task_store.py # Task storage
├── src/ # Core business logic
│ ├── rag.py # LightRAG lifecycle management
│ ├── multi_tenant.py # Multi-tenant instance manager
│ ├── tenant_deps.py # Tenant dependency injection
│ ├── logger.py # Unified logging
│ ├── metrics.py # Performance metrics
│ ├── file_url_service.py # Temporary file service
│ ├── mineru_client.py # MinerU client
│ └── mineru_result_processor.py # Result processing
├── docs/ # Documentation
│ ├── ARCHITECTURE.md # Architecture design documentation
│ ├── USAGE.md # Detailed usage guide
│ ├── DEPLOY_MODES.md # Deployment mode description
│ ├── PR_WORKFLOW.md # PR workflow
│ └── ...
├── scripts/ # Maintenance scripts
│ ├── dev.sh # Development mode quick start
│ ├── monitor.sh # Service monitoring
│ ├── backup.sh # Data backup
│ ├── update.sh # Service update
│ └── ...
├── deploy.sh # One-click deployment script
├── docker-compose.yml # Production mode configuration
├── docker-compose.dev.yml # Development mode configuration
├── Dockerfile # Production image
├── Dockerfile.dev # Development image
├── pyproject.toml # Project dependencies
├── uv.lock # Dependency lock
├── env.example # Environment variable template
├── CLAUDE.md # Claude AI guide
└── README.md # This documentation
Q1: What to do if service fails to start?
# View detailed logs docker compose logs # Check port usage netstat -tulpn | grep 8000 # Check Docker status docker ps -a
Q2: multimodal_processed error?
Note: This issue has been fixed in LightRAG 1.4.9.4+. If you encounter this error, your version is outdated.
Solution:
# Option 1: Upgrade to latest version (recommended) # Modify LightRAG version in pyproject.toml # lightrag = "^1.4.9.4" # Rebuild image docker compose down docker compose up -d --build # Option 2: Clean old data (temporary solution) rm -rf ./rag_local_storage docker compose restart
Q3: File upload returns 400 error?
Check:
- File format supported (PDF, DOCX, PNG, JPG, etc.)
- File size exceeds 100MB
- File is empty
# View supported formats
curl http://localhost:8000/docsQ3.5: Embedding dimension error?
If you encounter dimension-related errors, need to clean data and rebuild:
# Stop services docker compose down # Delete all volumes (clear database) docker volume rm rag-api_dragonflydb_data rag-api_qdrant_data rag-api_memgraph_data # Modify EMBEDDING_DIM in .env EMBEDDING_DIM=1024 # or 4096, must match the model # Restart docker compose up -d
Q4: Query is very slow (>30 seconds)?
Optimization suggestions:
- Use
naiveorhybridmode instead ofmix - Increase
MAX_ASYNCparameter (in.env) - Reduce
TOP_KandCHUNK_TOP_K - Enable Reranker
# Modify .env
MAX_ASYNC=8
TOP_K=20
CHUNK_TOP_K=10Q5: Out of memory (OOM)?
If using local MinerU:
# Switch to remote mode # Modify in .env MINERU_MODE=remote MINERU_API_TOKEN=your_token # Or limit concurrency DOCUMENT_PROCESSING_CONCURRENCY=1
Q6: Tasks lost after container restart?
Problem Symptoms:
- Cannot query previous task status after container restart
- Tasks disappear after tenant instance evicted by LRU
Solution: Enable Redis task storage
# Modify .env TASK_STORE_STORAGE=redis # Restart services docker compose restart # Verify docker compose logs api | grep TaskStore # Should see: ✅ TaskStore: Redis connection successful
Configuration Description:
memorymode: In-memory storage, data lost after restart (default, suitable for development)redismode: Persistent storage, supports container restart and instance rebuild (production recommended)
TTL Strategy (Redis mode auto-cleanup):
- completed tasks: 24 hours
- failed tasks: 24 hours
- pending/processing tasks: 6 hours
Q7: VLM mode processing failed?
Check Items:
-
vision_model_func not configured
- Check logs:
vision_model_func not found, fallback to off mode - Ensure LLM API is configured in
.env
- Check logs:
-
Image file does not exist
- Check logs:
Image file not found: xxx - Possibly corrupted MinerU ZIP or extraction failed
- Check logs:
-
Timeout error
fullmode may timeout on large files- Suggestion: Use
selectivemode first, or increaseVLM_TIMEOUT
# Modify .env VLM_TIMEOUT=300 # Increase to 5 minutes RAG_VLM_MODE=selective # downgrade to selective
Debugging Tips:
# View detailed logs docker compose logs -f | grep VLM # Test single file curl -X POST 'http://localhost:8000/insert?tenant_id=test&doc_id=test&vlm_mode=off' \ -F 'file=@test.pdf'
| Scenario | MAX_ASYNC | TOP_K | CHUNK_TOP_K | MINERU_MODE |
|---|---|---|---|---|
| Fast response | 8 | 10 | 5 | remote |
| Balanced mode | 8 | 20 | 10 | remote |
| High accuracy | 4 | 60 | 20 | remote |
| Resource limited | 4 | 20 | 10 | remote |
- 📘 Architecture Design Documentation - Detailed system architecture and design concepts
- 📗 Usage Guide - Complete API usage documentation and examples
- 📙 Deployment Mode Description - Production mode vs Development mode
- 📕 PR Workflow - Process guide for code contribution
- 📔 External Storage Deployment - Redis/PostgreSQL/Neo4j configuration
- 📊 API Comparison Analysis - rag-api vs LightRAG official API comparison
- 🌐 WebUI Integration Guide - Knowledge graph visualization integration
We welcome all forms of contribution!
- Fork the project
git clone https://github.com/BukeLy/rag-api.git
cd rag-api- Create feature branch
git checkout -b feature/your-feature-name
- Development and Testing
# Install dependencies uv sync # Run tests uv run pytest # Code formatting uv run black . uv run isort .
- Submit code
git add . git commit -m "feat: Add new feature" git push origin feature/your-feature-name
- Create Pull Request
Create a PR on GitHub with detailed description of your changes.
Use semantic commit messages:
feat:New featurefix:Bug fixdocs:Documentation updatestyle:Code formattingrefactor:Code refactoringperf:Performance optimizationtest:Testingchore:Build/tools
See PR Workflow Documentation。
This project is licensed under the MIT License. See the LICENSE file for details.
This project is built on the following excellent open source projects:
- LightRAG - Efficient knowledge graph RAG framework
- RAG-Anything - Multimodal document parsing
- MinerU - Powerful PDF parsing tool
- Docling - Lightweight document parsing
- FastAPI - Modern Python web framework
Special thanks to all contributors and users for their support! 🎉
- GitHub: @BukeLy
- Email: buledream233@gmail.com
- Issues: Submit Issue
- Discussions: Join Discussion
⭐ If this project helps you, please give it a Star!
Made with ❤️ by BukeLy
© 2025 RAG API. All rights reserved.