Commit 9adb41d

committed

Create RAG boilerplate using langchain

1 parent eb60836 commit 9adb41dCopy full SHA for 9adb41d

File tree

2 files changed

+82

-0

lines changed

R/RAG_with_langchain_boilerplate
- RAG_boilerplate.py
- README.md

2 files changed

+82

-0

lines changed

`‎R/RAG_with_langchain_boilerplate/RAG_boilerplate.py`

Lines changed: 50 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,50 @@`
	`1`	`+from langchain_community.llms import Ollama`
	`2`	`+from langchain_community.document_loaders import PyPDFLoader`
	`3`	`+from langchain_community.embeddings import OllamaEmbeddings`
	`4`	`+from langchain_community.vectorstores import FAISS`
	`5`	`+from langchain_core.prompts import ChatPromptTemplate`
	`6`	`+from langchain_text_splitters import RecursiveCharacterTextSplitter`
	`7`	`+from langchain.chains.combine_documents import create_stuff_documents_chain`
	`8`	`+from langchain.chains import create_retrieval_chain`
	`9`	`+`
	`10`	`+def create_RAG_model(input_file, llm):`
	`11`	`+ # Create the LLM (Large Language Model)`
	`12`	`+ llm = Ollama(model="dolphin-phi")`
	`13`	`+ # Define model used to embed the info`
	`14`	`+ embeddings = OllamaEmbeddings(model="nomic-embed-text")`
	`15`	`+ # Load the PDF`
	`16`	`+ loader = PyPDFLoader(input_file)`
	`17`	`+ doc = loader.load()`
	`18`	`+ # Split the text and embed it into the vector DB`
	`19`	`+ text_splitter = RecursiveCharacterTextSplitter()`
	`20`	`+ split = text_splitter.split_documents(doc)`
	`21`	`+ vector_store = FAISS.from_documents(split, embeddings)`
	`22`	`+`
	`23`	`+`
	`24`	`+ # Prompt generation: Giving the LLM character and purpose`
	`25`	`+ prompt = ChatPromptTemplate.from_template(`
	`26`	`+ """`
	`27`	`+ Answer the following questions only based on the given context`
	`28`	`+`
	`29`	`+ <context>`
	`30`	`+ {context}`
	`31`	`+ </context>`
	`32`	`+`
	`33`	`+ Question: {input}`
	`34`	`+ """`
	`35`	`+ )`
	`36`	`+ # Linking the LLM, vector DB and the prompt`
	`37`	`+ docs_chain = create_stuff_documents_chain(llm, prompt)`
	`38`	`+ retriever = vector_store.as_retriever()`
	`39`	`+ retrieval_chain = create_retrieval_chain(retriever, docs_chain)`
	`40`	`+ return retrieval_chain`
	`41`	`+`
	`42`	`+# Using the retrieval chain`
	`43`	`+# Example:`
	`44`	`+`
	`45`	`+'''`
	`46`	`+chain = create_RAG_model("your_file_here.pdf", "mistral")`
	`47`	`+output = chain.invoke({"input":"What is the purpose of RAG?"})`
	`48`	`+print(output["answer"])`
	`49`	`+'''`
	`50`	`+`

`‎R/RAG_with_langchain_boilerplate/README.md`

Lines changed: 32 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,32 @@`
	`1`	`+# Rag boilerplate in Python using LangChain #`
	`2`	`+`
	`3`	`+### About RAG ###`
	`4`	`+Hi there! This is a RAG(Retrieval Augmented Generation) boilerplate/template in Python. RAG is an amazing technology that links input sources(PDFs in this case) to LLMs(Large Language Models). Then, the LLMs gain the ability to answer questions that they initially didn't know about.`
	`5`	`+`
	`6`	`+### How RAG works ###`
	`7`	`+RAG works by splitting the input file(s) into semantically related chunks and embeds these chunks into a vector database. A vector database stores data in a mathematical representation to enable speedy access(of course, machines are more fluent in math). When a prompt or query is given by the user, it gets embedded into the vector DB too. Then, the data stored alongside the query are returned(the data stored alongside the query are the most related because the data is embedded semantically).`
	`8`	`+`
	`9`	`+### Installation and use ###`
	`10`	`+1) Install dependencies`
	`11`	`+`
	`12`	+```
	`13`	`+pip3 install langchain-community`
	`14`	`+pip3 install langchain-core`
	`15`	`+pip3 install langchain-text-splitters`
	`16`	`+pip3 install langchain`
	`17`	+```
	`18`	`+`
	`19`	`+(excuse me if you find any missing dependencies)`
	`20`	`+`
	`21`	`+2) Use the template`
	`22`	`+`
	`23`	+Refer to the commented code at the end of the ```RAG_boilerplate.py``` file and modify it to suit your needs (remember to uncomment the code block).
	`24`	`+`
	`25`	`+3) Download a PDF input file and execute`
	`26`	`+`
	`27`	+Download a PDF to implement RAG on (also, specify it in the code) and execute it using ```python3 RAG_boilerplate.py```
	`28`	`+`
	`29`	`+4) Happy learning!`
	`30`	`+`
	`31`	`+`
	`32`	`+`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 9adb41d

File tree

2 files changed

2 files changed

`‎R/RAG_with_langchain_boilerplate/RAG_boilerplate.py`

`‎R/RAG_with_langchain_boilerplate/README.md`

0 commit comments