Poor performance on query with LIMIT when I add an ORDER BY?

Question 1

Given the table:

Column	Type	Modifiers	Storage
id	bigint	not null default nextval('items_id_seq'::regclass)	plain
data	text	not null	extended
object_id	bigint	not null	plain

and indexes:

"items_pkey" PRIMARY KEY, btree (id)
"items_object_id_idx" btree (object_id)

When I execute:

SELECT *
FROM items
WHERE object_id = 123
LIMIT 1;

It returns 0 rows very quickly. However, when I execute this query with ORDER BY, it hangs for a very long time:

SELECT *
FROM items
WHERE object_id = 123
ORDER BY id DESC -- I added the ORDER BY
LIMIT 1;

What explains this discrepancy?

Query Plans

Fast Query (without `ORDER BY`)

 QUERY PLAN 
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=0.56..3.34 rows=1 width=63) (actual time=0.014..0.014 rows=0 loops=1)
 -> Index Scan using items_object_id_operation_idx on items (cost=0.56..2579.16 rows=929 width=63) (actual time=0.013..0.013 rows=0 loops=1)
 Index Cond: (object_id = 123::bigint)
 Total runtime: 0.029 ms

Slow query (with the `ORDER BY`)

 QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=0.44..1269.14 rows=1 width=63) (actual time=873796.061..873796.061 rows=0 loops=1)
 -> Index Scan Backward using items_pkey on items (cost=0.44..1164670.11 rows=918 width=63) (actual time=873796.059..873796.059 rows=0 loops=1)
 Filter: (object_id = 123::bigint)
 Rows Removed by Filter: 27942522
 Total runtime: 873796.113 ms

Question 2

Trying to explain why there is a difference in performance between the two queries.

This one: SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1 is satisfied by any one row with the matching object_id, so the index on object_id is a natural choice. The query requires minimal I/O: an index scan to find the first matching value, plus one heap read to fetch the entire row.

The alternative: SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1 requires all rows with the matching object_id to be sorted by another column, id, then the row with the maximum value of id to be returned. If you were to use the index on object_id, you would need to perform the following operations: scan the index to find every matching object_id; for every match go fetch the actual row; then sort all fetched rows by id and return the one with the largest id.

The alternative chosen by the optimizer, presumably based on the object_id histogram, is: scan the index on id backwards, in its entirety; for each value go fetch the row and check if the value of object_id matches; return the first matching row, which will have the maximum possible id value. This alternative avoids sorting the rows, so I guess the optimizer prefers it to using the index on object_id.

The presence of an index on (object_id asc, id desc) allows for yet another alternative: scan this new index for the first entry matching the supplied object_id value, which by definition will have the highest id value; go fetch one matching row and return. Obviously, this is the most efficient approach.

Question 3

There are two methods I found to make this faster,

Add a better index.
Optimize the query, with an optimization fence.

Index

One method is to add a better index, as found in mustaccio's answer. This has the advantage of resulting in the fastest query.

Optimization Fence

Another method is to fence the query, by wrapping it in a subselect. Notice the inner query does NOT have the LIMIT. This solution can be very slow. You can see there are 4239 rows that match object_id = 123. This means that while you can get those rows back near instantly (since it's an index scan and very fast), you STILL have to sort them afterward. Mustaccio solution involves having them sorted on an index (making it obviously much faster).

SELECT *
FROM (
 SELECT *
 FROM items 
 WHERE object_id = 123
 ORDER BY id DESC
) AS items
ORDER BY id DESC
LIMIT 1;
 QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Limit (cost=16629.84..16629.86 rows=1 width=59)
 -> Sort (cost=16629.84..16640.44 rows=4239 width=59)
 Sort Key: items.id
 -> Bitmap Heap Scan on items (cost=125.42..16374.45 rows=4239 width=59)
 Recheck Cond: (object_id = 123::bigint)
 -> Bitmap Index Scan on items_object_id_idx (cost=0.00..124.36 rows=4239 width=0)
 Index Cond: (object_id = 123::bigint)

mustaccio mustaccio 28.6k24 gold badges60 silver badges77 bronze badges · Answer 1 · 2015-08-13 13:50:54Z

Trying to explain why there is a difference in performance between the two queries.

This one: SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1 is satisfied by any one row with the matching object_id, so the index on object_id is a natural choice. The query requires minimal I/O: an index scan to find the first matching value, plus one heap read to fetch the entire row.

The alternative: SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1 requires all rows with the matching object_id to be sorted by another column, id, then the row with the maximum value of id to be returned. If you were to use the index on object_id, you would need to perform the following operations: scan the index to find every matching object_id; for every match go fetch the actual row; then sort all fetched rows by id and return the one with the largest id.

The alternative chosen by the optimizer, presumably based on the object_id histogram, is: scan the index on id backwards, in its entirety; for each value go fetch the row and check if the value of object_id matches; return the first matching row, which will have the maximum possible id value. This alternative avoids sorting the rows, so I guess the optimizer prefers it to using the index on object_id.

The presence of an index on (object_id asc, id desc) allows for yet another alternative: scan this new index for the first entry matching the supplied object_id value, which by definition will have the highest id value; go fetch one matching row and return. Obviously, this is the most efficient approach.

happy_marmoset happy_marmoset 5392 gold badges6 silver badges12 bronze badges · Answer 2 · 2015-11-16 12:55:18Z

There are two methods I found to make this faster,

Add a better index.
Optimize the query, with an optimization fence.

Index

One method is to add a better index, as found in mustaccio's answer. This has the advantage of resulting in the fastest query.

Optimization Fence

Another method is to fence the query, by wrapping it in a subselect. Notice the inner query does NOT have the LIMIT. This solution can be very slow. You can see there are 4239 rows that match object_id = 123. This means that while you can get those rows back near instantly (since it's an index scan and very fast), you STILL have to sort them afterward. Mustaccio solution involves having them sorted on an index (making it obviously much faster).

SELECT *
FROM (
 SELECT *
 FROM items 
 WHERE object_id = 123
 ORDER BY id DESC
) AS items
ORDER BY id DESC
LIMIT 1;
 QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Limit (cost=16629.84..16629.86 rows=1 width=59)
 -> Sort (cost=16629.84..16640.44 rows=4239 width=59)
 Sort Key: items.id
 -> Bitmap Heap Scan on items (cost=125.42..16374.45 rows=4239 width=59)
 Recheck Cond: (object_id = 123::bigint)
 -> Bitmap Index Scan on items_object_id_idx (cost=0.00..124.36 rows=4239 width=0)
 Index Cond: (object_id = 123::bigint)

Stack Exchange Network

Poor performance on query with LIMIT when I add an ORDER BY?

Query Plans

Fast Query (without `ORDER BY`)

Slow query (with the `ORDER BY`)

2 Answers 2

Index

Optimization Fence

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Poor performance on query with LIMIT when I add an ORDER BY?

Query Plans

Fast Query (without ORDER BY)

Slow query (with the ORDER BY)

2 Answers 2

Index

Optimization Fence

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

Fast Query (without `ORDER BY`)

Slow query (with the `ORDER BY`)