How to optimize multiple ORDER BYs?

Question 1

I'm getting sluggish performance on a table with ~300,000 rows and B-Trees on each column.

This is for a dynamic pagination page where the query is constructed on demand, and the application caches the primary key in the query's specified order.

For this query

explain analyze 
 SELECT supplier_management.buyer_purchase_order_id 
 FROM supplier_management 
 ORDER BY item_description DESC,
 item_number DESC,
 order_type ASC,
 possession_date DESC,
 shipment_type DESC,
 store_type DESC

I get these results:

Sort (cost=51026.98..51750.35 rows=289348 width=72) (actual time=8229.280..12349.596 rows=289348 loops=1)

Sort Key: item_description, item_number, order_type, possession_date, shipment_type, store_type

Sort Method: external merge Disk: 24744kB

-> Seq Scan on supplier_management (cost=0.00..10876.48 rows=289348 width=72) (actual time=0.015..187.426 rows=289348 loops=1)

Total runtime: 12407.064 ms

How can the performance of a multi-column sort be improved? Or should I just do it in C++?

Table structure

buyer_purchase_order_id bigint
supplier_number bigint
supplier_name character varying
purchase_order_number bigint
store_number integer
item_number bigint
item_description character varying
project_type character varying
order_date integer
requested_arrival_date integer
department character varying
store_type character varying
shipment_type character varying
order_type character varying
quantity_ordered integer
quantity_allocation integer
quantity_staged integer
quantity_shipped integer
quantity_received integer
in_stock_date integer
in_stock_date_visible_on integer
show_red boolean
requested_arrival_date_plus_four_business_days integer
supplier_status character varying
notes_comments character varying
requested_arrival_date_color character varying
grand_opening_date integer
possession_date integer
consolidator_name character varying
real_requested_arrival_date_plus_four_business_days integer

No character varying exceeds the length limit for indexing; however, there is some duplication. Would it be more efficient to put the actual values in another table, normalize, and join on the related integer columns?

Question 2

This related answer has an extensive benchmark for ORDER BY with long strings with and without collation: stackoverflow.com/questions/9888096/…

Question 3

@ErwinBrandstetter Yup, that & work_mem tamed it. Thanks geniuses!!!

Question 4

A few things you can do:

Use enums or lookups keyed by integer values, or a simple "char" field, instead of varchar sort keys where possible. I'd use an enum because you can control the sort order easily.

The only serious downside with an enum is that you can't currently drop values from an enum type. You can add them (including inserting them in the middle of the sort order) but not remove them. If that's a problem, you'll want to use a lookup table, or just a field declared "char" that has single character codes.

Also, if you don't need proper language collation, specify COLLATE "C" for character fields, e.g.

CREATE INDEX itemdesc_c ON supplier_management (item_description ASC COLLATE "C");

and then:

ORDER BY ...
 itemdesc COLLATE "C",
 ...

Important things to note:

Pg can combine indexes for predicates (WHERE clauses etc) but not sorting. You can't use a bitmap index scan for a sort. So it can use at most one of the candidate indexes, then it has to sort the rows within each group.
Low-selectivity indexes are a waste of time. If the values aren't widely distributed, don't index the column.
Pg's doing an on-disk sort. Throw more memory at the problem - try SET work_mem = '20MB' to start with. But see my comments below re thrashing with high max_connections. Use a connection pool.
Use a connection pool.
Indexes have a cost - they slow down insert/update/delete and increase vacuum work. So if the index isn't being used lots, get rid of it.
pg_catalog.pg_stat_user_indexes will help you tell which indexes are used.
pg_stat_statements (in contrib) and pg_stat_plans (the latter is an external module) are very useful for capturing data about query patterns, slow queries, etc.
Learn to love the auto_explain module.

Also, if you always do this sort, creating a composite index to match it will help.

CREATE INDEX bigindex ON supplier_management (
 item_description DESC,
 item_number DESC,
 order_type ASC,
 possession_date DESC,
 shipment_type DESC,
 store_type DESC
);

... but be aware that it's only useful for this particular sort, and it'll be a big index so it's only worth having if you do this a lot. In fact, you might as well add supplier_management.buyer_purchase_order_id too, so it can do an index-only scan:

CREATE INDEX bigindex ON supplier_management (
 item_description DESC,
 item_number DESC,
 order_type ASC,
 possession_date DESC,
 shipment_type DESC,
 store_type DESC,
 buyer_purchase_order_id
);

Question 5

Thank you Craig Ringer! Would you mind detailing the enem/char suggestion or linking to a page with details? I can't do composite indexes because I'd be up to my eyeballs in them. I edited the question to give the reason why. Thank you so much in advance!

Question 6

@Cincinnatus You might at least want to decide whether it's worth having a DESC, ASC, or both indexes for each target column, depending on the dominant search order.

Question 7

@Cincinnatus That wasn't what I was saying, but if the client wants all the data and it has lots of spare CPU to throw around, and if the DB is busy then sure, it's a useful piece of work to offload from the DB. It might also let you maintain fewer indexes (which are expensive for update/insert/delete and vacuum) too; that'd be a real win.

Question 8

@Cincinnatus Looking at your plan, it's not using any indexes anyway. The first thing I'd do is throw lots of work_mem at the problem - if I had the resources anyway. See what SET work_mem = '20MB' does, before the query. Do beware that work_mem is per-sort, so high work_mem + lots of max_connections = boom.

Question 9

@Cincinnatus No amount of work_mem will cause index use. It'll just let Pg do the sort in memory instead of using an on-disk merge sort. Playing with the cost parameters and enable_ params might permit use of one index - though it'll likely be slower than the seqscan and sort.

Craig Ringer Craig Ringer 57.9k6 gold badges162 silver badges193 bronze badges · Accepted Answer · 2014-08-14 14:10:57Z

A few things you can do:

Use enums or lookups keyed by integer values, or a simple "char" field, instead of varchar sort keys where possible. I'd use an enum because you can control the sort order easily.

The only serious downside with an enum is that you can't currently drop values from an enum type. You can add them (including inserting them in the middle of the sort order) but not remove them. If that's a problem, you'll want to use a lookup table, or just a field declared "char" that has single character codes.

Also, if you don't need proper language collation, specify COLLATE "C" for character fields, e.g.

CREATE INDEX itemdesc_c ON supplier_management (item_description ASC COLLATE "C");

and then:

ORDER BY ...
 itemdesc COLLATE "C",
 ...

Important things to note:

Pg can combine indexes for predicates (WHERE clauses etc) but not sorting. You can't use a bitmap index scan for a sort. So it can use at most one of the candidate indexes, then it has to sort the rows within each group.
Low-selectivity indexes are a waste of time. If the values aren't widely distributed, don't index the column.
Pg's doing an on-disk sort. Throw more memory at the problem - try SET work_mem = '20MB' to start with. But see my comments below re thrashing with high max_connections. Use a connection pool.
Use a connection pool.
Indexes have a cost - they slow down insert/update/delete and increase vacuum work. So if the index isn't being used lots, get rid of it.
pg_catalog.pg_stat_user_indexes will help you tell which indexes are used.
pg_stat_statements (in contrib) and pg_stat_plans (the latter is an external module) are very useful for capturing data about query patterns, slow queries, etc.
Learn to love the auto_explain module.

Also, if you always do this sort, creating a composite index to match it will help.

CREATE INDEX bigindex ON supplier_management (
 item_description DESC,
 item_number DESC,
 order_type ASC,
 possession_date DESC,
 shipment_type DESC,
 store_type DESC
);

... but be aware that it's only useful for this particular sort, and it'll be a big index so it's only worth having if you do this a lot. In fact, you might as well add supplier_management.buyer_purchase_order_id too, so it can do an index-only scan:

CREATE INDEX bigindex ON supplier_management (
 item_description DESC,
 item_number DESC,
 order_type ASC,
 possession_date DESC,
 shipment_type DESC,
 store_type DESC,
 buyer_purchase_order_id
);

Thank you Craig Ringer! Would you mind detailing the enem/char suggestion or linking to a page with details? I can't do composite indexes because I'd be up to my eyeballs in them. I edited the question to give the reason why. Thank you so much in advance!
@Cincinnatus You might at least want to decide whether it's worth having a DESC, ASC, or both indexes for each target column, depending on the dominant search order.
@Cincinnatus That wasn't what I was saying, but if the client wants all the data and it has lots of spare CPU to throw around, and if the DB is busy then sure, it's a useful piece of work to offload from the DB. It might also let you maintain fewer indexes (which are expensive for update/insert/delete and vacuum) too; that'd be a real win.
@Cincinnatus Looking at your plan, it's not using any indexes anyway. The first thing I'd do is throw lots of work_mem at the problem - if I had the resources anyway. See what SET work_mem = '20MB' does, before the query. Do beware that work_mem is per-sort, so high work_mem + lots of max_connections = boom.
@Cincinnatus No amount of work_mem will cause index use. It'll just let Pg do the sort in memory instead of using an on-disk merge sort. Playing with the cost parameters and enable_ params might permit use of one index - though it'll likely be slower than the seqscan and sort.

Stack Exchange Network

How to optimize multiple ORDER BYs?

Table structure

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to optimize multiple ORDER BYs?

Table structure

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions