19

Given the table:

Column Type Modifiers Storage
id bigint not null default nextval('items_id_seq'::regclass) plain
data text not null extended
object_id bigint not null plain

and indexes:

  • "items_pkey" PRIMARY KEY, btree (id)
  • "items_object_id_idx" btree (object_id)

When I execute:

SELECT *
FROM items
WHERE object_id = 123
LIMIT 1;

It returns 0 rows very quickly. However, when I execute this query with ORDER BY, it hangs for a very long time:

SELECT *
FROM items
WHERE object_id = 123
ORDER BY id DESC -- I added the ORDER BY
LIMIT 1;

What explains this discrepancy?

Query Plans

Fast Query (without ORDER BY)

 QUERY PLAN 
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=0.56..3.34 rows=1 width=63) (actual time=0.014..0.014 rows=0 loops=1)
 -> Index Scan using items_object_id_operation_idx on items (cost=0.56..2579.16 rows=929 width=63) (actual time=0.013..0.013 rows=0 loops=1)
 Index Cond: (object_id = 123::bigint)
 Total runtime: 0.029 ms

Slow query (with the ORDER BY)

 QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=0.44..1269.14 rows=1 width=63) (actual time=873796.061..873796.061 rows=0 loops=1)
 -> Index Scan Backward using items_pkey on items (cost=0.44..1164670.11 rows=918 width=63) (actual time=873796.059..873796.059 rows=0 loops=1)
 Filter: (object_id = 123::bigint)
 Rows Removed by Filter: 27942522
 Total runtime: 873796.113 ms
Paul White
95.3k30 gold badges439 silver badges689 bronze badges
asked Aug 11, 2015 at 7:54
0

2 Answers 2

23

Trying to explain why there is a difference in performance between the two queries.

This one: SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1 is satisfied by any one row with the matching object_id, so the index on object_id is a natural choice. The query requires minimal I/O: an index scan to find the first matching value, plus one heap read to fetch the entire row.

The alternative: SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1 requires all rows with the matching object_id to be sorted by another column, id, then the row with the maximum value of id to be returned. If you were to use the index on object_id, you would need to perform the following operations: scan the index to find every matching object_id; for every match go fetch the actual row; then sort all fetched rows by id and return the one with the largest id.

The alternative chosen by the optimizer, presumably based on the object_id histogram, is: scan the index on id backwards, in its entirety; for each value go fetch the row and check if the value of object_id matches; return the first matching row, which will have the maximum possible id value. This alternative avoids sorting the rows, so I guess the optimizer prefers it to using the index on object_id.

The presence of an index on (object_id asc, id desc) allows for yet another alternative: scan this new index for the first entry matching the supplied object_id value, which by definition will have the highest id value; go fetch one matching row and return. Obviously, this is the most efficient approach.

answered Aug 13, 2015 at 13:50
0
10

There are two methods I found to make this faster,

  • Add a better index.
  • Optimize the query, with an optimization fence.

Index

One method is to add a better index, as found in mustaccio's answer. This has the advantage of resulting in the fastest query.

Optimization Fence

Another method is to fence the query, by wrapping it in a subselect. Notice the inner query does NOT have the LIMIT. This solution can be very slow. You can see there are 4239 rows that match object_id = 123. This means that while you can get those rows back near instantly (since it's an index scan and very fast), you STILL have to sort them afterward. Mustaccio solution involves having them sorted on an index (making it obviously much faster).

SELECT *
FROM (
 SELECT *
 FROM items 
 WHERE object_id = 123
 ORDER BY id DESC
) AS items
ORDER BY id DESC
LIMIT 1;
 QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Limit (cost=16629.84..16629.86 rows=1 width=59)
 -> Sort (cost=16629.84..16640.44 rows=4239 width=59)
 Sort Key: items.id
 -> Bitmap Heap Scan on items (cost=125.42..16374.45 rows=4239 width=59)
 Recheck Cond: (object_id = 123::bigint)
 -> Bitmap Index Scan on items_object_id_idx (cost=0.00..124.36 rows=4239 width=0)
 Index Cond: (object_id = 123::bigint)
Paul White
95.3k30 gold badges439 silver badges689 bronze badges
answered Nov 16, 2015 at 12:55
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.