For a system we're building, we store discount codes as strings in a Postgres table. We have a system where we support multiple workspaces that share the same database, and have a special value ('*'
) that is used as a wildcard.
For discount codes, we store the following information:
CREATE TABLE discount_codes
id uuid PRIMARY KEY
, workspace_id character varying
, code character varying
, case_sensitive bool
);
Sometimes we have to generate thousands of codes to be exported to external systems, where they can be sent out in e-mails and such. When generating these codes, we need to check if none of the codes overlap.
Currently, when inserting a code, we query existing codes like this:
The new discount code is case sensitive:
SELECT
COUNT(*)
FROM
discount_codes
WHERE
(
workspace_id = '*' OR
workspace_id = :workspaceId
) AND
(
(
LOWER(code) = LOWER(:code) AND
case_sensitive = false
) OR (
code = :code AND
case_sensitive = true
)
)
The new discount code is not case sensitive:
SELECT
COUNT(*)
FROM
discount_codes
WHERE
(workspace_id = '*' OR workspace_id = :workspaceId) AND
LOWER(code) = LOWER(:code)
If this query returns a count of more than 0, we know that there is a collision.
I would like to know if it would be useful to create an index on the length of the code
, so that we can filter out all rows where code
has a different length. We're talking about hundred thousands of codes being present in the database. If it would help, how would I create an index like this?
During bulk generation, I would like to generate 1000 codes at a time, and query the database to see if there is overlap with any of these codes. Would it be better to do this per 100 codes, or per 10000 codes?
2 Answers 2
This set of indices and queries should give you the best overall performance:
Queries
New discount code is case sensitive:
SELECT EXISTS (
SELECT FROM discount_codes
WHERE case_sensitive
AND code = :code
AND workspace_id IN ('*', :workspaceId)
)
OR EXISTS (
SELECT FROM discount_codes
WHERE NOT case_sensitive
AND lower(code) = lower(:code)
AND workspace_id IN ('*', :workspaceId)
);
Counting is generally more expensive than EXISTS
.
And both subqueries would always be executed to get a count. You only need to know if there is any conflict at all. This query will not even execute the second subquery if the first one returns true
.
New discount code is not case sensitive:
SELECT EXISTS (
SELECT FROM discount_codes
WHERE lower(code) = lower(:code)
AND workspace_id IN ('*', :workspaceId)
);
Indices
Note that the UNIQUE
aspect in below indices enforces your requirements only in parts and is hence optional. I would still throw it in as very cheap second layer of defense.
CREATE INDEX discount_codes_idx1 ON discount_codes (lower(code), workspace_id); -- can't be unique
CREATE UNIQUE INDEX discount_codes_idx2 ON discount_codes (code, workspace_id)
WHERE case_sensitive;
Theoretically, you might add another one:
CREATE UNIQUE INDEX discount_codes_idx3 ON discount_codes (lower(code), workspace_id)
WHERE NOT case_sensitive;
But, assuming the combination (lower(code), workspace_id)
is already hugely selective,discount_codes_idx1
should cover the job of discount_codes_idx3
pretty well, and you don't have to maintain another index in your write-heavy table.
-
Hi, the only issue I see with this is that
workspace_id = '*', code = 'CHRISTMAS10'
andworkspace_id = 'acme', code = 'CHRISTMAS10'
can't co-exist, so that's why that unique index might not be right. I might be optimizing for a case I shouldn't, but if I know that all my codes generated in a batch are of lengthx
, I can already filter out all codes that are not lengthx
, and that's the index I'd be looking for.Ruben– Ruben2024年10月24日 12:17:27 +00:00Commented Oct 24, 2024 at 12:17 -
@Ruben:
('*', 'CHRISTMAS10')
and('acme', 'CHRISTMAS10')
can co-exist with either of my multicolumn indices, which only enforce your requirements in part. (That's why we still need the sophisticated queries.)UNIQUE
is really optional. But I would keep it as second layer of defense. I updated to clarify.Erwin Brandstetter– Erwin Brandstetter2024年10月24日 14:57:54 +00:00Commented Oct 24, 2024 at 14:57
If you want good performance for your first query, rewrite it to avoid OR
:
SELECT (SELECT count(*)
FROM discount_codes
WHERE workspace_id = '*'
AND LOWER(code) = lower(:code)
AND case_sensitive)
+ (SELECT count(*)
FROM discount_codes
WHERE workspace_id = '*'
AND code = :code
AND NOT case_sensitive)
+ (SELECT count(*)
FROM discount_codes
WHERE workspace_id = :workspaceId
AND LOWER(code) = lower(:code)
AND case_sensitive)
+ (SELECT count(*)
FROM discount_codes
WHERE workspace_id = :workspaceId
AND code = :code
AND NOT case_sensitive);
For the best performance, use two partial indexes:
CREATE INDEX ON discount_codes (workspace_id, lower(code))
WHERE case_sensitive;
CREATE INDEX ON discount_codes (workspace_id, code)
WHERE NOT case_sensitive;
Explore related questions
See similar questions with these tags.
explain(analyze, verbose, buffers, settings)
to get the query plan and see where the time is spent. And you can create an index on LOWER(code):CREATE INDEX idx_ discount_codes_lower_code ON discount_codes(LOWER(code));