A fully local Retrieval-Augmented Generation (RAG) system built using:
- Ollama
- Phi3
- Nomic Embeddings
- ChromaDB
- Streamlit
This project demonstrates the core architecture behind modern AI systems:
- Local LLM inference
- Embedding generation
- Semantic search
- Vector databases
- Retrieval pipelines
- External memory systems
- Context-augmented generation
- Fully local AI stack
- Semantic document retrieval
- Vector similarity search
- Local embedding generation
- Retrieval-Augmented Generation (RAG)
- Streamlit-based UI
- ChromaDB vector storage
- Ollama local inference runtime
Documents
β
Embedding Model (nomic-embed-text)
β
Vector Embeddings
β
ChromaDB Vector Store
β
User Query
β
Query Embedding
β
Semantic Similarity Search
β
Retrieved Context
β
Phi3 LLM
β
Generated Response
| Component | Technology |
|---|---|
| LLM Runtime | Ollama |
| Generation Model | phi3 |
| Embedding Model | nomic-embed-text |
| Vector Database | ChromaDB |
| Frontend | Streamlit |
| Language | Python |
local-rag-system/
β
βββ app.py
βββ knowledge.txt
βββ requirements.txt
βββ README.md
βββ .gitignore
git clone https://github.com/IMRANDIL/local-rag-system-phi
cd local-rag-systempython -m venv venv venv\Scripts\activate
source venv/Scripts/activatepip install -r requirements.txt
ollama pull phi3
ollama pull nomic-embed-text
streamlit run app.py
Application runs on:
http://localhost:8501
Documents from knowledge.txt are loaded and converted into embeddings using:
nomic-embed-text
Embeddings are stored in ChromaDB for semantic retrieval.
User question is converted into embedding vector.
ChromaDB performs vector similarity search to retrieve the most relevant context.
Retrieved documents are injected into the final prompt.
Phi3 generates the final response using retrieved context.
This project demonstrates:
- Transformer inference
- Embeddings
- Semantic search
- Vector similarity
- Retrieval-Augmented Generation
- External memory systems
- Local AI infrastructure
- Vector databases
- Context injection
What is semantic search?
Explain RAG architecture
What is vector similarity?
How does Redis caching work?
- PDF ingestion
- Chunking pipeline
- Persistent vector database
- Streaming responses
- Chat memory
- Hybrid retrieval
- Metadata filtering
- Reranking
- Multi-document support
- Agent workflows
This project is designed to teach:
- AI systems engineering
- Retrieval pipelines
- Semantic memory architecture
- Local inference systems
- Vector retrieval
- Modern RAG architecture
MIT License