vector-embedding-api
provides a Flask API server and client to generate text embeddings using either OpenAI's embedding model or the SentenceTransformers library. The API server now supports in-memory LRU caching for faster retrievals, batch processing for handling multiple texts at once, and a health status endpoint for monitoring the server status.
SentenceTransformers supports over 500 models via HuggingFace Hub.
- POST endpoint to create text embeddings
- sentence_transformers
- OpenAI text-embedding-ada-002
- In-memory LRU cache for quick retrieval of embeddings
- Batch processing to handle multiple texts in a single request
- Easy setup with configuration file
- Health status endpoint
- Python client utility for submitting text or files
To run this server locally, follow the steps below:
Clone the repository: π¦
git clone https://github.com/deadbits/vector-embedding-api.git
cd vector-embedding-api
Set up a virtual environment (optional but recommended): π
virtualenv -p /usr/bin/python3.10 venv
source venv/bin/activate
Install the required dependencies: π οΈ
pip install -r requirements.txt
Modify the server.conf configuration file: βοΈ
[main] openai_api_key = YOUR_OPENAI_API_KEY sent_transformers_model = sentence-transformers/all-MiniLM-L6-v2 use_cache = true/false
Start the server: π
python server.py
The server should now be running on http://127.0.0.1:5000/.
A small Python client is provided to assist with submitting text strings or files.
Usage
python3 client.py -t "Your text here" -m local
python3 client.py -f /path/to/yourfile.txt -m openai
Submits an individual text string or a list of text strings for embedding generation.
Request Parameters
- text: The text string or list of text strings to generate the embedding for. (Required)
- model: Type of model to be used, either local for SentenceTransformer models or openai for OpenAI's model. Default is local.
Response
- embedding: The generated embedding array.
- status: Status of the request, either success or error.
- elapsed: The elapsed time taken for generating the embedding (in milliseconds).
- model: The model used to generate the embedding.
- cache: Boolean indicating if the result was retrieved from cache. (Optional)
- message: Error message if the status is error. (Optional)
Checks the server's health status.
Response
- cache.enabled: Boolean indicating status of the cache
- cache.max_size: Maximum cache size
- cache.size: Current cache size
- models.openai: Boolean indicating if OpenAI embeddings are enabled. (Optional)
- models.sentence-transformers: Name of sentence-transformers model in use.
{ "cache": { "enabled": true, "max_size": 500, "size": 0 }, "models": { "openai": true, "sentence-transformers": "sentence-transformers/all-MiniLM-L6-v2" } }
Send a POST request to the /submit endpoint with JSON payload:
{ "text": "Your text here", "model": "local" } // multi text submission { "text": ["Text1 goes here", "Text2 goes here"], "model": "openai" }
You'll receive a response containing the embedding and additional information:
[ { "embedding": [...], "status": "success", "elapsed": 123, "model": "sentence-transformers/all-MiniLM-L6-v2" } ] [ { "embedding": [...], "status": "success", "elapsed": 123, "model": "openai" }, { "embedding": [...], "status": "success", "elapsed": 123, "model": "openai" }, ]