I have a PostgreSQL table named orders, which I have partitioned based on the user_id
column using LIST partitioning using,
PARITION BY LIST lower(right(user_id, 2))
The table has columns customer_id
and order_id
, among others. Before partition, I was frequently querying the table using the following query pattern:
SELECT *
FROM orders
WHERE customer_id = 234234 AND order_id = 234234;
SELECT *
FROM orders
WHERE order_id = 234234;
Now, due to partition I am using as below:
SELECT *
FROM orders
WHERE user_id = 234234 AND customer_id = 234234 AND order_id = 234234 AND lower(right(user_id, 2)) = '34';
I've noticed that partition pruning is not fully utilized because I must include lower(right(user_id, 2)) = '34'
in the WHERE clause to trigger pruning.
However, I don't have a specific need to filter on the user_id directly in this query.
My questions are:
Should I stop passing
user_id
in the WHERE clause since I have already partitioned the table based on it?Would it be beneficial to create an index on
(user_id, customer_id, order_id)
to optimize the query performance?Alternatively, should I create an index on
(lower(right(user_id, 2)), customer_id, order_id)
and omit passing user_id in the WHERE clause for better pruning?Should I also have an index on
(order_id, lower(right(user_id,2)))
for the second query? Selectivity oforder_id
will be higher thanuser_id
.
I want to ensure that the partition pruning is utilized optimally while maintaining good query performance. Any advice or best practices regarding indexing and partitioning in this scenario would be greatly appreciated.
1 Answer 1
Partitioning is a data management tool (such as when you want to DROP
a whole Partition of data at one time). It is not a tool intended for improving lookup performance, such as for SELECT
type of queries, and there are cases where it actually hurts performance a little.
Indexes are meant to improve performance for lookups, and are exponentially more efficient than Partitioning, because Partitioning divides the data linearly, and Indexes do so logarithmically.
- Should I stop passing
user_id
in the WHERE clause since I have already partitioned the table based on it
Yes, because it sounds like you don't need Partitioning at all, and user_id
doesn't appear to be needed for your use cases, assuming an order_id
is rather unique, because it doesn't reduce the data any further.
- Would it be beneficial to create an index on
(user_id, customer_id, order_id)
to optimize the query performance?
No, since user_id
wouldn't filter the data down any further anyway (again based on the cardinality of the order_id
field relative to it) it would be redundant to add user_id
to your queries and indexes. Instead, you should create an index on (order_id)
or on (order_id, customer_id)
which will cover your example queries. (Important that you lead with order_id
first, so it covers both queries.)
- Alternatively, should I create an index on
(lower(right(user_id, 2)), customer_id, order_id)
and omit passing user_id in the WHERE clause for better pruning?
No. Not even sure what looking at only the right 2 digits of the user_id
is intended to do.
- Should I also have an index on
(order_id, lower(right(user_id,2)))
for the second query? Selectivity oforder_id
will be higher thanuser_id
.
No. Stick to the simple aforementioned index of (order_id, customer_id)
. This covers both your queries and can't really be made much more efficient unless there's another field to reduce the data being returned further. Also, selectivity doesn't matter for equality searches, which is what both your example queries are doing.
Explore related questions
See similar questions with these tags.
user_id
, you still have to do it with a partitioned table. Please try to be specific: do you want to tune the query you are showing? Through all my confusion one thing emerges: your partitioning strategy is quite weird, and I suspect that it is wrong for your case. It looks like a home-grown kind of hash partitioning. What benefits do you expect from partitioning? Why did you choose this strange kind of list partitioning?