-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Can I do Vector Embeddings #5318
-
Interested in the current state of doing "vector embeddings" for documents in Couchdb. Not an expert and not really trying to be smart or anything like that, just that these type embeddings and need for them seem to be more common nowadays. So seeking as much a comment or perhaps a "roadmap" indication, so that I can make some choices on going down a path of experimenting and using vector embeddings side-by-side with a typical Couchdb document.
[update] can see lucene example re vector search - SearchScale/lucene-examples - github, may be an approach?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment 3 replies
-
I'm not at all familiar with "vector embeddings" but they do not sound anything like Lucene's vector support after a cursory glance at an article or two.
Safe to say there's nothing like this on our roadmap.
Beta Was this translation helpful? Give feedback.
All reactions
-
@damobrisbane could you do a pretend HTTP+JSON API example to demonstrate what you mean?
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks; concept only at moment. And I have to preface that I am neither a Couchdb, Machine Learning or LLM practitioner so my terminology may be bit off, but I hope not concept;
assumption is a model for a ?database exists, which has a learnt vector matrix that represents all those docs in the database.
each document includes two items: a reference to the model and a json array of floats that represents the vector for the doc.
I believe if the learnt model may be (corpus size X dimension), eg 3000X50, then the embedded vector in doc will be 50 float. Learning the model is independent of any vector embedding and adding a embedding vector to doc is also independent, but I understand if I add any new doc, the docs vector will get similarity scoring with other docs in that database, using the referenced model. I guess crucially, exactly what determines eg higher matching scores is hidden in the model, which is where all the power comes from: the model for the database, with all the learnt vector embeddings for the corpus of documents in the database, matches new documents or queries, using an inferencing process for a document on the corpus.
All this will then allow similarity matching between documents in a database. From what I can see in lucene/Couchdb this may be doable. To what end? I'm not exactly sure at the moment ;) more interested in the idea and how this type of thing might be done in Couchdb and guess from there, any possibilities that might open up.
Beta Was this translation helpful? Give feedback.
All reactions
-
is it distinct from an n-dimensional geo-index?
Beta Was this translation helpful? Give feedback.