Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

IMRANDIL/local-rag-system-phi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

2 Commits

Repository files navigation

🧠 Local RAG System with ChromaDB + Ollama

A fully local Retrieval-Augmented Generation (RAG) system built using:

  • Ollama
  • Phi3
  • Nomic Embeddings
  • ChromaDB
  • Streamlit

This project demonstrates the core architecture behind modern AI systems:

  • Local LLM inference
  • Embedding generation
  • Semantic search
  • Vector databases
  • Retrieval pipelines
  • External memory systems
  • Context-augmented generation

πŸš€ Features

  • Fully local AI stack
  • Semantic document retrieval
  • Vector similarity search
  • Local embedding generation
  • Retrieval-Augmented Generation (RAG)
  • Streamlit-based UI
  • ChromaDB vector storage
  • Ollama local inference runtime

🧠 Architecture

Documents
↓
Embedding Model (nomic-embed-text)
↓
Vector Embeddings
↓
ChromaDB Vector Store
↓
User Query
↓
Query Embedding
↓
Semantic Similarity Search
↓
Retrieved Context
↓
Phi3 LLM
↓
Generated Response

πŸ“¦ Tech Stack

Component Technology
LLM Runtime Ollama
Generation Model phi3
Embedding Model nomic-embed-text
Vector Database ChromaDB
Frontend Streamlit
Language Python

πŸ“ Project Structure

local-rag-system/
β”‚
β”œβ”€β”€ app.py
β”œβ”€β”€ knowledge.txt
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
└── .gitignore

βš™οΈ Installation

1. Clone Repository

git clone https://github.com/IMRANDIL/local-rag-system-phi
cd local-rag-system

2. Create Virtual Environment

Windows

python -m venv venv
venv\Scripts\activate

Git Bash

source venv/Scripts/activate

3. Install Dependencies

pip install -r requirements.txt

πŸ€– Install Ollama Models

Pull Phi3

ollama pull phi3

Pull Embedding Model

ollama pull nomic-embed-text

▢️ Run Application

streamlit run app.py

Application runs on:

http://localhost:8501

🧩 How It Works

Step 1 β€” Document Ingestion

Documents from knowledge.txt are loaded and converted into embeddings using:

nomic-embed-text

Step 2 β€” Vector Storage

Embeddings are stored in ChromaDB for semantic retrieval.


Step 3 β€” Query Embedding

User question is converted into embedding vector.


Step 4 β€” Semantic Retrieval

ChromaDB performs vector similarity search to retrieve the most relevant context.


Step 5 β€” Context Injection

Retrieved documents are injected into the final prompt.


Step 6 β€” Local LLM Generation

Phi3 generates the final response using retrieved context.


🧠 Core AI Concepts Demonstrated

This project demonstrates:

  • Transformer inference
  • Embeddings
  • Semantic search
  • Vector similarity
  • Retrieval-Augmented Generation
  • External memory systems
  • Local AI infrastructure
  • Vector databases
  • Context injection

πŸ“Œ Example Questions

What is semantic search?
Explain RAG architecture
What is vector similarity?
How does Redis caching work?

πŸ”₯ Future Improvements

  • PDF ingestion
  • Chunking pipeline
  • Persistent vector database
  • Streaming responses
  • Chat memory
  • Hybrid retrieval
  • Metadata filtering
  • Reranking
  • Multi-document support
  • Agent workflows

🧠 Learning Goals

This project is designed to teach:

  • AI systems engineering
  • Retrieval pipelines
  • Semantic memory architecture
  • Local inference systems
  • Vector retrieval
  • Modern RAG architecture

πŸ“œ License

MIT License

About

local semantic retrieval + RAG system.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /