I'm getting sluggish performance on a table with ~300,000 rows and B-Trees on each column.
This is for a dynamic pagination page where the query is constructed on demand, and the application caches the primary key in the query's specified order.
For this query
explain analyze
SELECT supplier_management.buyer_purchase_order_id
FROM supplier_management
ORDER BY item_description DESC,
item_number DESC,
order_type ASC,
possession_date DESC,
shipment_type DESC,
store_type DESC
I get these results:
Sort (cost=51026.98..51750.35 rows=289348 width=72) (actual time=8229.280..12349.596 rows=289348 loops=1)
Sort Key: item_description, item_number, order_type, possession_date, shipment_type, store_type
Sort Method: external merge Disk: 24744kB
-> Seq Scan on supplier_management (cost=0.00..10876.48 rows=289348 width=72) (actual time=0.015..187.426 rows=289348 loops=1)
Total runtime: 12407.064 ms
How can the performance of a multi-column sort be improved? Or should I just do it in C++?
Table structure
buyer_purchase_order_id bigint
supplier_number bigint
supplier_name character varying
purchase_order_number bigint
store_number integer
item_number bigint
item_description character varying
project_type character varying
order_date integer
requested_arrival_date integer
department character varying
store_type character varying
shipment_type character varying
order_type character varying
quantity_ordered integer
quantity_allocation integer
quantity_staged integer
quantity_shipped integer
quantity_received integer
in_stock_date integer
in_stock_date_visible_on integer
show_red boolean
requested_arrival_date_plus_four_business_days integer
supplier_status character varying
notes_comments character varying
requested_arrival_date_color character varying
grand_opening_date integer
possession_date integer
consolidator_name character varying
real_requested_arrival_date_plus_four_business_days integer
No character varying
exceeds the length limit for indexing; however, there is some duplication. Would it be more efficient to put the actual values in another table, normalize, and join on the related integer columns?
1 Answer 1
A few things you can do:
Use enum
s or lookups keyed by integer
values, or a simple "char"
field, instead of varchar sort keys where possible. I'd use an enum
because you can control the sort order easily.
The only serious downside with an enum is that you can't currently drop values from an enum type. You can add them (including inserting them in the middle of the sort order) but not remove them. If that's a problem, you'll want to use a lookup table, or just a field declared "char"
that has single character codes.
Also, if you don't need proper language collation, specify COLLATE "C"
for character fields, e.g.
CREATE INDEX itemdesc_c ON supplier_management (item_description ASC COLLATE "C");
and then:
ORDER BY ...
itemdesc COLLATE "C",
...
Important things to note:
Pg can combine indexes for predicates (
WHERE
clauses etc) but not sorting. You can't use a bitmap index scan for a sort. So it can use at most one of the candidate indexes, then it has to sort the rows within each group.Low-selectivity indexes are a waste of time. If the values aren't widely distributed, don't index the column.
Pg's doing an on-disk sort. Throw more memory at the problem - try
SET work_mem = '20MB'
to start with. But see my comments below re thrashing with highmax_connections
. Use a connection pool.Use a connection pool.
Indexes have a cost - they slow down insert/update/delete and increase vacuum work. So if the index isn't being used lots, get rid of it.
pg_catalog.pg_stat_user_indexes
will help you tell which indexes are used.pg_stat_statements
(in contrib) andpg_stat_plans
(the latter is an external module) are very useful for capturing data about query patterns, slow queries, etc.Learn to love the
auto_explain
module.
Also, if you always do this sort, creating a composite index to match it will help.
CREATE INDEX bigindex ON supplier_management (
item_description DESC,
item_number DESC,
order_type ASC,
possession_date DESC,
shipment_type DESC,
store_type DESC
);
... but be aware that it's only useful for this particular sort, and it'll be a big index so it's only worth having if you do this a lot. In fact, you might as well add supplier_management.buyer_purchase_order_id
too, so it can do an index-only scan:
CREATE INDEX bigindex ON supplier_management (
item_description DESC,
item_number DESC,
order_type ASC,
possession_date DESC,
shipment_type DESC,
store_type DESC,
buyer_purchase_order_id
);
-
Thank you Craig Ringer! Would you mind detailing the
enem
/char
suggestion or linking to a page with details? I can't do composite indexes because I'd be up to my eyeballs in them. I edited the question to give the reason why. Thank you so much in advance!user32234– user322342014年08月14日 14:15:10 +00:00Commented Aug 14, 2014 at 14:15 -
@Cincinnatus You might at least want to decide whether it's worth having a
DESC
,ASC
, or both indexes for each target column, depending on the dominant search order.Craig Ringer– Craig Ringer2014年08月14日 14:16:27 +00:00Commented Aug 14, 2014 at 14:16 -
1@Cincinnatus That wasn't what I was saying, but if the client wants all the data and it has lots of spare CPU to throw around, and if the DB is busy then sure, it's a useful piece of work to offload from the DB. It might also let you maintain fewer indexes (which are expensive for update/insert/delete and vacuum) too; that'd be a real win.Craig Ringer– Craig Ringer2014年08月14日 14:23:07 +00:00Commented Aug 14, 2014 at 14:23
-
1@Cincinnatus Looking at your plan, it's not using any indexes anyway. The first thing I'd do is throw lots of
work_mem
at the problem - if I had the resources anyway. See whatSET work_mem = '20MB'
does, before the query. Do beware thatwork_mem
is per-sort, so highwork_mem
+ lots ofmax_connections
= boom.Craig Ringer– Craig Ringer2014年08月14日 14:25:18 +00:00Commented Aug 14, 2014 at 14:25 -
1@Cincinnatus No amount of
work_mem
will cause index use. It'll just let Pg do the sort in memory instead of using an on-disk merge sort. Playing with the cost parameters and enable_ params might permit use of one index - though it'll likely be slower than the seqscan and sort.Craig Ringer– Craig Ringer2014年08月14日 15:13:27 +00:00Commented Aug 14, 2014 at 15:13
ORDER BY
with long strings with and without collation: stackoverflow.com/questions/9888096/…