Can I do Vector Embeddings · apache/couchdb · Discussion #5318

damobrisbane
Oct 22, 2024

Interested in the current state of doing "vector embeddings" for documents in Couchdb. Not an expert and not really trying to be smart or anything like that, just that these type embeddings and need for them seem to be more common nowadays. So seeking as much a comment or perhaps a "roadmap" indication, so that I can make some choices on going down a path of experimenting and using vector embeddings side-by-side with a typical Couchdb document.

[update] can see lucene example re vector search - SearchScale/lucene-examples - github, may be an approach?

Thank you!

Replies: 1 comment 3 replies

rnewson
Oct 25, 2024
Collaborator

I'm not at all familiar with "vector embeddings" but they do not sound anything like Lucene's vector support after a cursory glance at an article or two.

Safe to say there's nothing like this on our roadmap.

3 replies

@janl

janl Oct 25, 2024
Collaborator

@damobrisbane could you do a pretend HTTP+JSON API example to demonstrate what you mean?

@damobrisbane

damobrisbane Oct 26, 2024
Author

Thanks; concept only at moment. And I have to preface that I am neither a Couchdb, Machine Learning or LLM practitioner so my terminology may be bit off, but I hope not concept;

assumption is a model for a ?database exists, which has a learnt vector matrix that represents all those docs in the database.
each document includes two items: a reference to the model and a json array of floats that represents the vector for the doc.

I believe if the learnt model may be (corpus size X dimension), eg 3000X50, then the embedded vector in doc will be 50 float. Learning the model is independent of any vector embedding and adding a embedding vector to doc is also independent, but I understand if I add any new doc, the docs vector will get similarity scoring with other docs in that database, using the referenced model. I guess crucially, exactly what determines eg higher matching scores is hidden in the model, which is where all the power comes from: the model for the database, with all the learnt vector embeddings for the corpus of documents in the database, matches new documents or queries, using an inferencing process for a document on the corpus.

All this will then allow similarity matching between documents in a database. From what I can see in lucene/Couchdb this may be doable. To what end? I'm not exactly sure at the moment ;) more interested in the idea and how this type of thing might be done in Couchdb and guess from there, any possibilities that might open up.

@rjharmon

rjharmon Dec 4, 2024

is it distinct from an n-dimensional geo-index?

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can I do Vector Embeddings #5318

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

damobrisbane
Oct 22, 2024

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

rnewson
Oct 25, 2024
Collaborator

Uh oh!

{{title}}

Uh oh!

janl Oct 25, 2024
Collaborator

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

damobrisbane Oct 26, 2024
Author

Uh oh!

{{title}}

Uh oh!

rjharmon Dec 4, 2024

Select a reply

Uh oh!

Can I do Vector Embeddings #5318

Uh oh!

Uh oh!

damobrisbane Oct 22, 2024

Replies: 1 comment · 3 replies

Uh oh!

rnewson Oct 25, 2024 Collaborator

Uh oh!

janl Oct 25, 2024 Collaborator

Uh oh!

Uh oh!

damobrisbane Oct 26, 2024 Author

Uh oh!

rjharmon Dec 4, 2024

damobrisbane
Oct 22, 2024

Replies: 1 comment 3 replies

rnewson
Oct 25, 2024
Collaborator

janl Oct 25, 2024
Collaborator

damobrisbane Oct 26, 2024
Author