0

I've got a schema for customer records a bit like:

account_id - UUID
name - text

I currently have a GIN index that looks like:

... USING GIN (account_id, name gin_trgm_ops)

We have a mix of accounts. Some have a lot of customers (10m+) and some only have a few (10k+). We have a lot of queries that look like:

SELECT from customers WHERE account_id=<account_id> AND 
(
 name LIKE 'Bob%'
 OR name LIKE 'Alice%'
 OR name LIKE 'Dave%'
 OR name LIKE 'Carol%'
 OR name LIKE 'Edward%'
 OR name LIKE 'Fay%'
)

For large customers the index performance is good (queries ~2s). For small customers the index performance is comparatively poor (also ~2s). This matters due to how often we run these queries.

Is there a way to improve this for smaller customers? We've noticed that replacing the index with a simple btree index on account_id is faster - scanning all the records for a single small account is faster than doing the index bitmap work across all accounts. Obviously, this is a lot slower on the large accounts.

I think partitioning the table is the only way forward here. However, I'm hoping someone has a bright idea :)

asked Jun 29, 2024 at 10:56
2
  • replacing the index with a simple btree index on account_id is faster: why replace? Two indexes seem to make sense here: a btree index on account_id and the gin index on name alone. Commented Jun 29, 2024 at 13:21
  • What happens if one of the search terms is for 'Cy%'? Commented Jun 29, 2024 at 17:52

1 Answer 1

0

Your index doesn't look very sensible to me. A multicolumn GIN index is pretty much identical to multiple single-column GIN indexes. It isn't like a Btree index, where selectivity on multiple columns can be multiplied by being in the same index.

Since all your LIKE queries are fixed prefix, you could get a benefit from a multicolumn btree index.

create index on customers (account_id , name text_pattern_ops);

If they aren't really all fixed prefix, you might want to try the multicolumn GiST index. I'm not a fan of gist_trgm_ops, but this might be one case where it would work well.

As you noted, partitioning could also work well. But if you have a large number of accounts this could become hard to manage it.

answered Jun 29, 2024 at 18:06
1
  • This looks like it might work. Downside that %search% won't work, but I think we can live without that. Thank you! Ultimately, what I really want is a btree of gin indexes. e.g. a btree on account_id and then each leaf is a pointer to a gin index on name :) Commented Jun 29, 2024 at 19:34

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.