PostgreSQL not using index scan when a second ORDER BY column is present

Question 1

I need to query the latest entry from a table with>130mn entries using PostgreSQL 9.5. There are two columns of interest, row_id (integer, the primary key), and row_time (timestamp, not null, with btree index).

This query works very fast:

EXPLAIN SELECT * FROM tbl ORDER BY row_time DESC LIMIT 1;
 QUERY PLAN 
----------------------------------------------------------------------------------------------------------
 Limit (cost=0.57..0.88 rows=1 width=423)
 -> Index Scan Backward using tbl_row_time on tbl (cost=0.57..40503877.59 rows=131390688 width=423)

In case there are several rows with the same row_time I would like to sort by row_id as a tie-breaker. Unfortunately, the query plan then regresses to a sequential scan:

EXPLAIN SELECT * FROM tbl ORDER BY row_time DESC, row_id DESC LIMIT 1;
 QUERY PLAN 
---------------------------------------------------------------------------------
 Limit (cost=9602337.32..9602337.32 rows=1 width=423)
 -> Sort (cost=9602337.32..9930814.04 rows=131390688 width=423)
 Sort Key: row_time DESC, row_id DESC
 -> Seq Scan on tbl (cost=0.00..8945383.88 rows=131390688 width=423)

I don't understand why this is happening. It would be easy to retrieve the rows with the maximum row_time using the index (typically less than 10 rows) and then sort just these. Instead, the full table is scanned, which takes minutes instead of the sub-millisecond time of the first query.

Creating a multicolumn index on (row_time, row_id) makes PostgreSQL chose an index scan again, but increases the size of the index considerably. A subquery to get MAX(row_time), then filter by it and sort the result by row_id is also fast but seems overly complicated for my simple goal. Am I missing something here?

Question 2

You write "10mn entries" and I assume this is supposed to mean 10 million? Yet, Postgres expects to find rows=131390688 rows. Something does not add up here, or your table statistics are seriously off track. Is autovacuum on? / Have you tried a manual ANALYZE tbl?

Question 3

This was just a typo -- I meant to say over 130 million entries. The table is vacuumed and analyzed and this did not change the query plan.

Question 4

Seems to me that Postgres isn't as smart at query planning as it could be?

Question 5

If you create an index on (row_time DESC, row_id DESC), then the second case will will act just as the first one.

LIMIT will always operate after sorting is taken care of, so when sorting without an index is necessary, it will sort the entire recordset prior to processing it through LIMIT.

I wonder what is it you want to achieve? Applying LIMIT over a primary indexed order, then including all rows matching the sorted field(s) in the limited result, and finally sorting the limited results by any other fields and limit again. This is the way to go in this case:

SELECT *
FROM tbl
WHERE row_time = (
 SELECT row_time
 FROM tbl
 ORDER BY row_time DESC
 LIMIT 1)
ORDER BY row_id DESC
LIMIT 1;

Whereas for LIMIT n where n> 1:

SELECT *
FROM tbl
WHERE row_time IN (
 SELECT row_time
 FROM tbl
 ORDER BY row_time DESC
 LIMIT n)
ORDER BY row_id DESC
LIMIT n;

Question 6

@ypercubeTM not if row_time is not unique (which in this case is expected because there's an extra ORDER BY element)

Question 7

@ypercubeTM ah, gotcha, I get what you mean. I fixed the code :)

Question 8

And the IN could be = ;)

score 3 · Answer 1 · 2016-06-14 01:54:24Z

If you create an index on (row_time DESC, row_id DESC), then the second case will will act just as the first one.

LIMIT will always operate after sorting is taken care of, so when sorting without an index is necessary, it will sort the entire recordset prior to processing it through LIMIT.

I wonder what is it you want to achieve? Applying LIMIT over a primary indexed order, then including all rows matching the sorted field(s) in the limited result, and finally sorting the limited results by any other fields and limit again. This is the way to go in this case:

SELECT *
FROM tbl
WHERE row_time = (
 SELECT row_time
 FROM tbl
 ORDER BY row_time DESC
 LIMIT 1)
ORDER BY row_id DESC
LIMIT 1;

Whereas for LIMIT n where n> 1:

SELECT *
FROM tbl
WHERE row_time IN (
 SELECT row_time
 FROM tbl
 ORDER BY row_time DESC
 LIMIT n)
ORDER BY row_id DESC
LIMIT n;

@ypercubeTM not if row_time is not unique (which in this case is expected because there's an extra ORDER BY element)
@ypercubeTM ah, gotcha, I get what you mean. I fixed the code :)

Stack Exchange Network

PostgreSQL not using index scan when a second ORDER BY column is present

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

PostgreSQL not using index scan when a second ORDER BY column is present

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions