Complete knowledge base system with semantic search for Claude Code CLI
Features • Quick Start • Architecture • Usage • Docs
Python 3.11+ MIT License Claude Code MCP Compatible CocoIndex
Give your Claude Code CLI semantic search superpowers over your local documents. This project integrates CocoIndex with Claude Code via the Model Context Protocol (MCP), enabling intelligent document search, LLM-powered metadata extraction, and a complete local knowledge base.
- Semantic Search - Find documents by meaning, not just keywords
- MCP Integration - Direct access from Claude Code CLI via MCP protocol
- LLM Extraction - Automatically extract titles, summaries, and key points using Claude
- Vector Database - PostgreSQL with pgvector for fast similarity search
- Live Updates - Documents are re-indexed automatically when changed
- FastAPI Server - REST API endpoints for programmatic access
- CocoInsight - Web UI for flow visualization
graph TB
subgraph "Claude Code CLI"
CC[Claude Code] -->|MCP Protocol| MCP[CocoIndex MCP Server]
end
subgraph "CocoIndex Engine"
MCP --> Search[Semantic Search]
MCP --> Index[Document Indexing]
MCP --> Extract[LLM Extraction]
end
subgraph "Data Layer"
Search --> PG[(PostgreSQL + pgvector)]
Index --> PG
Extract --> PG
end
subgraph "External APIs"
Index -->|Embeddings| OpenAI[OpenAI API]
Extract -->|Extraction| Anthropic[Anthropic Claude]
end
Docs[Your Documents] -->|Watch| Index
- Docker Desktop - For PostgreSQL with pgvector
- Python 3.11+ - Required by CocoIndex
- OpenAI API Key - For text embeddings
- Anthropic API Key - For LLM extraction (optional)
# Clone the repository git clone https://github.com/puneet8800/cocoindex-claude-code.git cd cocoindex-claude-code # Run the setup script ./scripts/setup.sh
This will:
- Start PostgreSQL with pgvector via Docker
- Create Python virtual environment
- Install all dependencies
- Setup CocoIndex backend tables
# Copy the example environment file cp .env.example .env # Edit .env and add your API keys # OPENAI_API_KEY=sk-... # ANTHROPIC_API_KEY=sk-ant-...
# Add markdown, text, or Python files to be indexed cp your-docs/*.md data/documents/
source .venv/bin/activate
cocoindex update main.pyAdd to your ~/.claude.json:
{
"mcpServers": {
"cocoindex": {
"command": "/path/to/cocoindex-claude-code/.venv/bin/python",
"args": ["/path/to/cocoindex-claude-code/mcp_server.py"],
"env": {
"COCOINDEX_DATABASE_URL": "postgres://cocoindex:cocoindex@localhost/cocoindex"
}
}
}
}Now in Claude Code, you can use semantic search:
> Search my documents for authentication best practices
> What documents mention API rate limiting?
> Find all content related to database migrations
sequenceDiagram
participant U as User
participant CC as Claude Code
participant MCP as MCP Server
participant CI as CocoIndex
participant DB as PostgreSQL
participant OAI as OpenAI
U->>CC: "Search for auth docs"
CC->>MCP: cocoindex_search(query)
MCP->>CI: search(query)
CI->>OAI: Generate query embedding
OAI-->>CI: Vector [1536 dims]
CI->>DB: Vector similarity search
DB-->>CI: Top K results
CI-->>MCP: Formatted results
MCP-->>CC: JSON response
CC-->>U: Relevant document chunks
flowchart LR
A[📄 Documents] -->|LocalFile Source| B[Split into Chunks]
B -->|2000 chars, 500 overlap| C[Generate Embeddings]
C -->|text-embedding-3-small| D[Store Vectors]
D -->|pgvector| E[(PostgreSQL)]
When connected to Claude Code, these tools become available:
| Tool | Description |
|---|---|
cocoindex_search |
Semantic search across indexed documents |
cocoindex_index |
Re-index all documents |
cocoindex_list |
List all indexed documents |
cocoindex_metadata |
Get LLM-extracted metadata for a document |
cocoindex_add |
Add a new document to the index |
# Activate virtual environment source .venv/bin/activate # Interactive search python scripts/query.py # Single query python scripts/query.py "machine learning best practices" # With result limit python scripts/query.py -l 10 "python async patterns"
# Start the API server python main.py # Or with live reloading uvicorn main:app --reload
API Endpoints:
GET /- API informationGET /health- Health checkGET /flows- List registered flowsGET /search?q=query&limit=5- Search documentsGET /docs- Swagger UI
# Start server with CocoInsight support cocoindex server main.py -ci # Open https://cocoindex.io and connect to localhost:49344
cocoindex-claude-code/
├── flows/ # CocoIndex flow definitions
│ ├── text_embedding.py # Vector search flow (OpenAI embeddings)
│ └── llm_extraction.py # LLM metadata extraction (Claude)
├── scripts/ # Utility scripts
│ ├── setup.sh # Initial setup
│ ├── start.sh # Start services
│ ├── stop.sh # Stop services
│ └── query.py # Interactive search CLI
├── docker/ # Docker configuration
│ └── compose.yaml # PostgreSQL with pgvector
├── data/documents/ # Your documents go here
├── docs/ # Documentation
├── main.py # FastAPI entry point
├── mcp_server.py # MCP server for Claude Code
├── pyproject.toml # Python dependencies
└── .env.example # Environment template
- Getting Started Guide - Detailed setup instructions
- Architecture - System design and components
- MCP Integration - Claude Code configuration
- Flows Explained - CocoIndex flows documentation
- Troubleshooting - Common issues and solutions
Indexes documents for semantic search:
- Reads files from
data/documents/ - Splits into chunks (2000 chars, 500 overlap)
- Generates embeddings with OpenAI
text-embedding-3-small - Stores in PostgreSQL with vector index
Supported file types: .md, .txt, .py
Extracts structured metadata using Claude:
- Title
- Summary
- Key points
- Topics
- Document type
Requires: ANTHROPIC_API_KEY in .env
# List all flows cocoindex ls main.py # Show flow details cocoindex show main.py:TextEmbedding # Setup backend tables cocoindex setup main.py -f # Update index (one-time) cocoindex update main.py # Update index (continuous/live) cocoindex update main.py -L # Evaluate without exporting cocoindex evaluate main.py -o ./output # Start server with CocoInsight cocoindex server main.py -ci -L --reload # Drop all backend tables cocoindex drop main.py
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
MIT License - see LICENSE for details.
- CocoIndex - The indexing engine powering this project
- Claude Code - Anthropic's CLI for Claude
- pgvector - Vector similarity search for PostgreSQL
Made with ❤️ for the Claude Code community