Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

VuBacktracking/bert-faiss-qa-system

Repository files navigation

Q&A System using BERT and Faiss Vector Database


Table of Contents


Overview

This project is a Question & Answer system implemented using DistilBERT for text representation and Faiss (Facebook AI Similarity Search) for efficient similarity search in a vector database. The system is designed to provide accurate and relevant answers to user queries by searching through a large collection of documents.

workflow

Features

  • DistilBERT-based Text Representation: Utilizes the DistilBERT model to convert questions and documents into dense vector representations.

  • Faiss Vector Database: Stores the vector representations of the documents for fast similarity search.

  • Efficient Retrieval: Finds the most relevant documents to a given question by performing efficient similarity searches in the Faiss vector database.


Installation

Requirements

  • Python 3.x
  • PyTorch
  • Transformers
  • Faiss
  • Streamlit (for the web-based interface)

Setup

  1. Clone the repository:
git clone https://github.com/VuBacktracking/bert-faiss-qa-sytem.git
  1. Clone the repository:
pip install -r requirements.txt
  1. Train and Download the DistilBERT model:
python3 trainer.py

Note: You can check my model in the link: https://huggingface.co/vubacktracking/distilbert-base-uncased-finetuned-squad2

  1. Build the Faiss vector database:
python3 faiss_index.py

workflow


Usage

Streamlit Web App Interface

streamlit run app.py

Open your web browser and navigate to http://localhost:8501/ to use the web-based Q&A system.

How it Works

  1. BERT Embeddings:

    • The preprocessed text is converted into vector embeddings using the DistilBERT model.
  2. Faiss Indexing:

    • The DistilBERT embeddings of the documents are indexed in the Faiss vector database.
  3. Query Processing:

    • When a user inputs a question, the question is converted into a DistilBERT embedding.
    • Faiss is used to find the most similar embeddings (i.e., the most relevant documents) to the question embedding.
  4. Answer Extraction:

    • The relevant documents are ranked, and the most relevant answer passages are extracted and presented to the user.

Demo

Extractive Q&A

workflow

Closed Generative Q&A

workflow


Acknowledgments

About

Q&A System using BERT and Faiss Vector Database

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

AltStyle によって変換されたページ (->オリジナル) /