Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Can I do Vector Embeddings #5318

Unanswered
damobrisbane asked this question in General
Oct 22, 2024 · 1 comments · 3 replies
Discussion options

Interested in the current state of doing "vector embeddings" for documents in Couchdb. Not an expert and not really trying to be smart or anything like that, just that these type embeddings and need for them seem to be more common nowadays. So seeking as much a comment or perhaps a "roadmap" indication, so that I can make some choices on going down a path of experimenting and using vector embeddings side-by-side with a typical Couchdb document.

[update] can see lucene example re vector search - SearchScale/lucene-examples - github, may be an approach?

Thank you!

You must be logged in to vote

Replies: 1 comment 3 replies

Comment options

I'm not at all familiar with "vector embeddings" but they do not sound anything like Lucene's vector support after a cursory glance at an article or two.

Safe to say there's nothing like this on our roadmap.

You must be logged in to vote
3 replies
Comment options

janl Oct 25, 2024
Collaborator

@damobrisbane could you do a pretend HTTP+JSON API example to demonstrate what you mean?

Comment options

Thanks; concept only at moment. And I have to preface that I am neither a Couchdb, Machine Learning or LLM practitioner so my terminology may be bit off, but I hope not concept;

assumption is a model for a ?database exists, which has a learnt vector matrix that represents all those docs in the database.
each document includes two items: a reference to the model and a json array of floats that represents the vector for the doc.

I believe if the learnt model may be (corpus size X dimension), eg 3000X50, then the embedded vector in doc will be 50 float. Learning the model is independent of any vector embedding and adding a embedding vector to doc is also independent, but I understand if I add any new doc, the docs vector will get similarity scoring with other docs in that database, using the referenced model. I guess crucially, exactly what determines eg higher matching scores is hidden in the model, which is where all the power comes from: the model for the database, with all the learnt vector embeddings for the corpus of documents in the database, matches new documents or queries, using an inferencing process for a document on the corpus.

All this will then allow similarity matching between documents in a database. From what I can see in lucene/Couchdb this may be doable. To what end? I'm not exactly sure at the moment ;) more interested in the idea and how this type of thing might be done in Couchdb and guess from there, any possibilities that might open up.

Comment options

is it distinct from an n-dimensional geo-index?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

AltStyle によって変換されたページ (->オリジナル) /