Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Is there any guide or documentation on how to use pgvecto.rs with Citus? #571

Unanswered
cho-thinkfree-com asked this question in Q&A
Discussion options

First of all, thank you for developing such a great open-source project.

I am currently developing a service using pgvecto.rs.

In the documentation provided by the team managing this project, I found a guide on how to install and operate pgvecto.rs in a Kubernetes environment. Due to the size of the data, I am planning to use Citus for sharding.

Is there any guide or documentation on how to use pgvecto.rs with Citus?

As far as I know, pgvecto.rs has some separately managed files, unlike pgvector. (Please correct me if I am mistaken.) Even if Citus can be used, I am curious about how to handle backups in this setup.

You must be logged in to vote

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

We haven't tested with Citus yet. Have you tested pgvector with Citus? And can you share with us your vector scale? How many vectors and dimension you're going to store? Thanks

You must be logged in to vote
3 replies
Comment options

Thank you for your prompt response.

I encountered an issue with incorrect behavior in the WHERE clause while using pgvector in SQL. Therefore, I am preparing to use pgvecto.rs.
I am using a single row with two 1024-dimensional dense vector columns and two 25,000-dimensional sparse vectors. The total number of rows is expected to exceed 100 million.
I believe this might be too large to store in a single database. (I lack expertise in databases, so please feel free to correct me if I’m wrong.)
Therefore, I am considering applying sharding.

For your reference, it seems that Citus supports pgvector.
(pgvector is mentioned on the Citus website - https://www.citusdata.com/product/community )

Comment options

Hi @cho-thinkfree-com, I'm a maintainer with @tensorchord/pgvecto-rs-maintainers. Would you be interested in a meeting with us? I’d love to learn more about your use case and see how we can assist you.

Comment options

Thank you for your response.

First of all, please note that I am using a translation tool to ask questions and provide answers as I am not good at English. :-) Even if we proceed with a meeting, real-time conversation might be very challenging due to the "language" barrier. :-)

We are literally trying to build a document search service.
The basic search methods we have in mind are very similar to those described in the following links: Sparse Vector Use Case and Adaptive Retrieval Use Case.

The details of what we are using are documented below:
GitHub Issue Comment

The scale of data we are considering involves creating 160,000 tables. We are deliberating whether to create 160,000 separate tables or partition a single table into multiple parts. We are unsure which option would be better if we use pgvecto.rs. (The reason we inquired about Citus was also to handle large volumes of data.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /