VuBacktracking/bert-faiss-qa-system

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
app		app
assets		assets
cfg		cfg
utils		utils
.gitignore		.gitignore
README.md		README.md
app.py		app.py
faiss_index.py		faiss_index.py
qa-system.py		qa-system.py
requirements.txt		requirements.txt
trainer.py		trainer.py

Repository files navigation

Q&A System using BERT and Faiss Vector Database

Q&A System using BERT and Faiss Vector Database

Overview

This project is a Question & Answer system implemented using DistilBERT for text representation and Faiss (Facebook AI Similarity Search) for efficient similarity search in a vector database. The system is designed to provide accurate and relevant answers to user queries by searching through a large collection of documents.

workflow

Features

DistilBERT-based Text Representation: Utilizes the DistilBERT model to convert questions and documents into dense vector representations.
Faiss Vector Database: Stores the vector representations of the documents for fast similarity search.
Efficient Retrieval: Finds the most relevant documents to a given question by performing efficient similarity searches in the Faiss vector database.

Installation

Requirements

Python 3.x
PyTorch
Transformers
Faiss
Streamlit (for the web-based interface)

Setup

Clone the repository:

git clone https://github.com/VuBacktracking/bert-faiss-qa-sytem.git

Clone the repository:

pip install -r requirements.txt

Train and Download the DistilBERT model:

python3 trainer.py

Note: You can check my model in the link: https://huggingface.co/vubacktracking/distilbert-base-uncased-finetuned-squad2

Build the Faiss vector database:

python3 faiss_index.py

workflow

Usage

Streamlit Web App Interface

streamlit run app.py

Open your web browser and navigate to http://localhost:8501/ to use the web-based Q&A system.

How it Works

BERT Embeddings:
- The preprocessed text is converted into vector embeddings using the DistilBERT model.
Faiss Indexing:
- The DistilBERT embeddings of the documents are indexed in the Faiss vector database.
Query Processing:
- When a user inputs a question, the question is converted into a DistilBERT embedding.
- Faiss is used to find the most similar embeddings (i.e., the most relevant documents) to the question embedding.
Answer Extraction:
- The relevant documents are ranked, and the most relevant answer passages are extracted and presented to the user.

Demo

Extractive Q&A

workflow

Closed Generative Q&A

workflow

Acknowledgments

About

Q&A System using BERT and Faiss Vector Database

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VuBacktracking/bert-faiss-qa-system

Folders and files

Latest commit

History

Repository files navigation

Q&A System using BERT and Faiss Vector Database

Table of Contents

Overview

Features

Installation

Requirements

Setup

Usage

Streamlit Web App Interface

How it Works

Demo

Extractive Q&A

Closed Generative Q&A

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

VuBacktracking/bert-faiss-qa-system

Folders and files

Latest commit

History

Repository files navigation

Q&A System using BERT and Faiss Vector Database

Table of Contents

Overview

Features

Installation

Requirements

Setup

Usage

Streamlit Web App Interface

How it Works

Demo

Extractive Q&A

Closed Generative Q&A

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages