RAGFlowChainR

License: MIT CRAN status Total Downloads Codecov test coverage Last Commit Issues

Overview

RAGFlowChainR is an R package that brings Retrieval-Augmented Generation (RAG) capabilities to R, inspired by LangChain. It enables intelligent retrieval of documents from a local vector store (DuckDB), optional web search, and seamless integration with Large Language Models (LLMs).

Features include:

Python version: RAGFlowChain (PyPI)
GitHub (Python): RAGFlowChain


Installation

 install.packages("RAGFlowChainR")

Development version

To get the latest features or bug fixes, you can install the development version of RAGFlowChainR from GitHub:

 # If needed
 install.packages("remotes")
 
remotes::install_github("knowusuboaky/RAGFlowChainR")

See the full function reference or the package website for more details.


πŸ” Environment Setup

 Sys.setenv(TAVILY_API_KEY = "your-tavily-api-key")
 Sys.setenv(OPENAI_API_KEY = "your-openai-api-key")
 Sys.setenv(GROQ_API_KEY = "your-groq-api-key")
 Sys.setenv(ANTHROPIC_API_KEY = "your-anthropic-api-key")

To persist across sessions, add these to your ~/.Renviron file.


Usage

1. Data Ingestion

 library(RAGFlowChainR)
 
local_files <- c("tests/testthat/test-data/sprint.pdf", 
 "tests/testthat/test-data/introduction.pptx",
 "tests/testthat/test-data/overview.txt")
website_urls <- c("https://www.r-project.org")
crawl_depth <- 1
 
response <- fetch_data(
 local_paths = local_files,
 website_urls = website_urls,
 crawl_depth = crawl_depth
)
response
 #> source title ...
 #> 1 documents/sprint.pdf <NA> ...
 #> 2 documents/introduction.pptx <NA> ...
 #> 3 documents/overview.txt <NA> ...
 #> 4 https://www.r-project.org R: The R Project for Statistical Computing ...
 #> ...
 
 cat(response$content[1])
 #> Getting Started with Scrum\nCodeWithPraveen.com ...

con <- create_vectorstore("tests/testthat/test-data/my_vectors.duckdb", overwrite = TRUE)
 
docs <- data.frame(head(response)) # reuse from fetch_data()
 
 insert_vectors(
 con = con,
 df = docs,
 embed_fun = embed_openai(),
 chunk_chars = 12000
)
 
 build_vector_index(con, type = c("vss", "fts"))
 
response <- search_vectors(con, query_text = "Tell me about R?", top_k = 5)
response
 #> id page_content dist
 #> 1 5 [Home]\nDownload\nCRAN\nR Project...\n... 0.2183
 #> 2 6 [Home]\nDownload\nCRAN\nR Project...\n... 0.2183
 #> ...
 
 cat(response$page_content[1])
 #> [Home]\nDownload\nCRAN\nR Project\nAbout R\nLogo\n...

3. RAG Chain Querying

rag_chain <- create_rag_chain(
 llm = call_llm,
 vector_database_directory = "tests/testthat/test-data/my_vectors.duckdb",
 method = "DuckDB",
 embedding_function = embed_openai(),
 use_web_search = FALSE
)
 
response <- rag_chain$invoke("Tell me about R")
response
 #> $input
 #> [1] "Tell me about R"
 #>
 #> $chat_history
 #> [[1]] $role: "human", $content: "Tell me about R"
 #> [[2]] $role: "assistant", $content: "R is a programming language..."
 #>
 #> $answer
 #> [1] "R is a programming language and software environment commonly used for statistical computing and graphics..."
 
 cat(response$answer)
 #> R is a programming language and software environment commonly used for statistical computing and graphics...

LLM Support

 call_llm(
 prompt = "Summarize the capital of France.",
 provider = "groq",
 model = "llama3-8b",
 temperature = 0.7,
 max_tokens = 200
)

The chatLLM package (now available on CRAN πŸŽ‰) offers a modular interface for interacting with LLM providers including OpenAI, Groq, Anthropic, DeepSeek, DashScope, and GitHub Models.

 install.packages("chatLLM")

Features:


License

MIT Β© Kwadwo Daddy Nyame Owusu Boakye

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /