No embeddings. No vector DB. Pure LLM reasoning.
Traditional Vector RAG
PDF β chunks β embeddings β cosine similarity β hope it's relevant β answer
Vectorless RAG (this project)
PDF β document TREE β LLM reasons over tree β picks exact sections β answer
Step 1 β PARSE
PyMuPDF extracts text page by page.
Font-size analysis detects headings β builds a hierarchical Document Tree.
Step 2 β TREE SEARCH (the key insight)
LLM receives the compact tree (like a Table of Contents with section IDs).
LLM reasons: "Which sections most likely contain the answer?"
Returns a JSON array of section_ids.
β No similarity math. No embeddings. Just reasoning.
Step 3 β ANSWER
Full text of the chosen sections is retrieved (with page numbers).
LLM synthesises a cited, markdown-formatted answer.
Response is streamed token-by-token to the UI.
- Python 3.10+
- Ollama installed and running
- At least one model pulled:
ollama pull llama3.2 # recommended (fast, good reasoning) # or ollama pull mistral ollama pull phi3 ollama pull gemma2
git clone https://github.com/Sdinzsh/RAG.git
cd RAGpip install -r requirements.txt
ollama serve # if not already running as a servicepython app.py
Open http://localhost:5000 in your browser.
| Feature | Detail |
|---|---|
| PDF Upload | Drag-and-drop or click to browse (up to 50 MB) |
| Document Tree | Sidebar shows all sections/pages with page numbers |
| Section Highlight | Sections used for each answer are highlighted in the tree |
| Streaming Chat | Answers stream token-by-token like ChatGPT |
| Source Badges | Each answer shows which sections/pages were used |
| Multi-turn Chat | Conversation history maintained per session |
| Model Selector | Switch between any Ollama model at the top |
| Clear History | Reset conversation context without re-uploading |
RAG/
βββ app.py β Flask backend + REST API
βββ rag_engine.py β Core RAG logic
β βββ build_document_tree() β PDF β Section tree
β βββ find_relevant_sections() β LLM tree search
β βββ answer_query_stream() β Full RAG pipeline
βββ templates/
β βββ index.html β Web UI (dark industrial theme)
βββ uploads/ β Uploaded PDFs (auto-created)
βββ requirements.txt
Edit the top of rag_engine.py:
OLLAMA_BASE = "http://localhost:11434" # Ollama server URL DEFAULT_MODEL = "llama3.2" # Default model TOP_K_SECTIONS = 4 # Max sections per query MAX_SECTION_CHARS = 3000 # Chars per section in context
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Web UI |
GET |
/api/models |
List available Ollama models |
POST |
/api/upload |
Upload PDF + build document tree |
POST |
/api/chat |
Chat query β SSE stream |
GET |
/api/session/<id> |
Session info |
POST |
/api/session/<id>/clear |
Clear chat history |
- Best models for reasoning:
llama3.2,mistral,gemma2,phi3 - Large PDFs: The tree search is O(sections) not O(pages) β works well on big docs
- Structured PDFs (reports, textbooks): heading detection works best
- Scanned PDFs: Won't work β needs OCR pre-processing