PostgreSQL Partitioning and Indexing - Optimal WHERE Clause for Partition Pruning

Question 1

I have a PostgreSQL table named orders, which I have partitioned based on the user_idcolumn using LIST partitioning using,

PARITION BY LIST lower(right(user_id, 2))

The table has columns customer_id and order_id, among others. Before partition, I was frequently querying the table using the following query pattern:

SELECT *
FROM orders
WHERE customer_id = 234234 AND order_id = 234234;
SELECT *
FROM orders
WHERE order_id = 234234;

Now, due to partition I am using as below:

SELECT *
FROM orders
WHERE user_id = 234234 AND customer_id = 234234 AND order_id = 234234 AND lower(right(user_id, 2)) = '34';

I've noticed that partition pruning is not fully utilized because I must include lower(right(user_id, 2)) = '34' in the WHERE clause to trigger pruning.

However, I don't have a specific need to filter on the user_id directly in this query.

My questions are:

Should I stop passing user_id in the WHERE clause since I have already partitioned the table based on it?
Would it be beneficial to create an index on (user_id, customer_id, order_id) to optimize the query performance?
Alternatively, should I create an index on (lower(right(user_id, 2)), customer_id, order_id) and omit passing user_id in the WHERE clause for better pruning?
Should I also have an index on (order_id, lower(right(user_id,2))) for the second query? Selectivity of order_id will be higher than user_id.

I want to ensure that the partition pruning is utilized optimally while maintaining good query performance. Any advice or best practices regarding indexing and partitioning in this scenario would be greatly appreciated.

Question 2

That's a lot of questions in one. I don't understand your problem: partitioning does not influence query results. So if you need to filter by user_id, you still have to do it with a partitioned table. Please try to be specific: do you want to tune the query you are showing? Through all my confusion one thing emerges: your partitioning strategy is quite weird, and I suspect that it is wrong for your case. It looks like a home-grown kind of hash partitioning. What benefits do you expect from partitioning? Why did you choose this strange kind of list partitioning?

Question 3

Partitioning is a data management tool (such as when you want to DROP a whole Partition of data at one time). It is not a tool intended for improving lookup performance, such as for SELECT type of queries, and there are cases where it actually hurts performance a little.

Indexes are meant to improve performance for lookups, and are exponentially more efficient than Partitioning, because Partitioning divides the data linearly, and Indexes do so logarithmically.

Should I stop passing user_id in the WHERE clause since I have already partitioned the table based on it

Yes, because it sounds like you don't need Partitioning at all, and user_id doesn't appear to be needed for your use cases, assuming an order_id is rather unique, because it doesn't reduce the data any further.

Would it be beneficial to create an index on (user_id, customer_id, order_id) to optimize the query performance?

No, since user_id wouldn't filter the data down any further anyway (again based on the cardinality of the order_id field relative to it) it would be redundant to add user_id to your queries and indexes. Instead, you should create an index on (order_id) or on (order_id, customer_id) which will cover your example queries. (Important that you lead with order_id first, so it covers both queries.)

Alternatively, should I create an index on (lower(right(user_id, 2)), customer_id, order_id) and omit passing user_id in the WHERE clause for better pruning?

No. Not even sure what looking at only the right 2 digits of the user_id is intended to do.

Should I also have an index on (order_id, lower(right(user_id,2))) for the second query? Selectivity of order_id will be higher than user_id.

No. Stick to the simple aforementioned index of (order_id, customer_id). This covers both your queries and can't really be made much more efficient unless there's another field to reduce the data being returned further. Also, selectivity doesn't matter for equality searches, which is what both your example queries are doing.

J.D. J.D. 41.1k12 gold badges63 silver badges145 bronze badges · Accepted Answer · 2023-07-20 12:26:27Z

Partitioning is a data management tool (such as when you want to DROP a whole Partition of data at one time). It is not a tool intended for improving lookup performance, such as for SELECT type of queries, and there are cases where it actually hurts performance a little.

Indexes are meant to improve performance for lookups, and are exponentially more efficient than Partitioning, because Partitioning divides the data linearly, and Indexes do so logarithmically.

Should I stop passing user_id in the WHERE clause since I have already partitioned the table based on it

Yes, because it sounds like you don't need Partitioning at all, and user_id doesn't appear to be needed for your use cases, assuming an order_id is rather unique, because it doesn't reduce the data any further.

Would it be beneficial to create an index on (user_id, customer_id, order_id) to optimize the query performance?

No, since user_id wouldn't filter the data down any further anyway (again based on the cardinality of the order_id field relative to it) it would be redundant to add user_id to your queries and indexes. Instead, you should create an index on (order_id) or on (order_id, customer_id) which will cover your example queries. (Important that you lead with order_id first, so it covers both queries.)

Alternatively, should I create an index on (lower(right(user_id, 2)), customer_id, order_id) and omit passing user_id in the WHERE clause for better pruning?

No. Not even sure what looking at only the right 2 digits of the user_id is intended to do.

Should I also have an index on (order_id, lower(right(user_id,2))) for the second query? Selectivity of order_id will be higher than user_id.

No. Stick to the simple aforementioned index of (order_id, customer_id). This covers both your queries and can't really be made much more efficient unless there's another field to reduce the data being returned further. Also, selectivity doesn't matter for equality searches, which is what both your example queries are doing.

Stack Exchange Network

PostgreSQL Partitioning and Indexing - Optimal WHERE Clause for Partition Pruning

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

PostgreSQL Partitioning and Indexing - Optimal WHERE Clause for Partition Pruning

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions