Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

carsonpo/haystackdb

Repository files navigation

HaystackDB

Minimal but performant Vector DB

Features

  • Binary embeddings by default (soon int8 reranking)
  • JSON filtering for queries
  • Scalable, distributed architecture for use with multi replica deployments
  • Durable (WAL), persistent data, mem mapped for fast access in the client

Benchmarks

On a MacBook with an M2, 1024 dimension, binary quantized.

FAISS is using a flat index, so brute force, but it's in memory. Haystack is storing the data on disk, and also brute forces.

TLDR is Haystack is ~10x faster despite being stored on disk.

100,000 Vectors
Haystack — 3.44ms
FAISS — 29.67ms
500,000 Vectors
Haystack — 11.98ms
FAISS - 146.50ms
1,000,000 Vectors
Haystack — 22.65ms
FAISS — 293.91ms

Roadmap

  • Quickstart Guide
  • Quality benchmarks (this is in progress)
  • Int8 reranking
  • (削除) Better queries with more than simple equality (削除ここまで) (this is done now)
  • Full text search
  • (削除) Better insertion performance with batch B+Tree insertion (削除ここまで) (could probably be further improved, but good for now)
  • (削除) Point in time backups/rollback (削除ここまで)
    • currently this is destructive (ie you cannot return forward after you go backwards), so a nondestructive version is next on the todo list.
  • Cursor based pagination
  • Schema migrations
  • Vector Kmeans clustering with centroid similarity for improved search perf

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /