Lightweight, private, and customizable retrieval-augmented chatbot running entirely on your Mac.
Based on the excellent work by pruthvirajcyn and his Medium article.
This is my personal implementation of a local RAG (Retrieval-Augmented Generation) chatbot using:
- Ollama for running open-source LLMs and embedding models locally.
- Streamlit for a clean and interactive chat UI.
- ChromaDB for storing and querying vector embeddings.
As of 2025εΉ΄07ζ17ζ₯, I'm using:
- π Embedding model:
nomic-embed-text-v2-moe - π§ LLM:
gemma3n
- π Privacy: No data is sent to the cloud. Upload and query your documents entirely offline.
- πΈ Cost-effective: No API tokens or cloud GPU costs. You only pay electricity.
- π Better than summarizing: With long PDFs or multiple documents, even summaries may not contain the context you need. A RAG chatbot can drill deeper and provide contextual answers.
β Recommended: At least 16GB of RAM on your Mac. Preferably 24GB+ for smoother experience.
git clone https://github.com/eplt/RAG_Ollama_Mac.git
cd RAG_Ollama_Macpython3 -m venv venv
source venv/bin/activatepip install -r ./src/requirements.txt
ollama serve ollama pull gemma3n ollama pull toshk0/nomic-embed-text-v2-moe:Q6_K
Place your .pdf files in the data/ directory.
python ./src/load_docs.py
To reset and reload the vector database:
python ./src/load_docs.py --reset
streamlit run ./src/UI.py
Ask questions and the chatbot will respond using relevant context retrieved from your documents.
-
βοΈ Modify Prompts
Update prompt templates inUI.pyto guide the chatbotβs tone or behavior. -
π Try Different Models
Ollama supports various LLMs and embedding models. Runollama listto see whatβs available or try pulling new ones. -
βοΈ Tune Retrieval Parameters
Adjust chunk size, overlaps, or top-K retrieval values inload_docs.pyfor improved performance. -
π Extend the Interface
Add features like file upload, chat history, user authentication, or export options using Streamlitβs powerful features.
-
Ollama not running?
Make sureollama serveis active in a terminal tab. -
Missing models?
Runollama listto verify models are downloaded correctly. -
Dependency issues?
Double-check your Python version (3.7+) and re-create the virtual environment. -
Streamlit errors?
Ensure you're running the app from the correct path and activate your virtual environment.
- Planning to support non-PDF formats (Markdown, .txt, maybe HTML).
- Will experiment with additional LLMs like
phi-3,mistral, andllama3. - Might integrate chat history persistence and better document management.
Local RAG is now more accessible than ever. With powerful small models and tools like Ollama, anyone can build a private, intelligent assistant β no cloud needed.
If you found this useful or have ideas to improve it, feel free to open a PR or drop a star βοΈ