Why seq-scan can be much faster than index-scan and index-only-scan in this simple query?

Question 1

I am using PostgreSQL 9.4.4. I have a query like this:

 SELECT COUNT(*) FROM A,B WHERE A.a = B.b

a and b are the Primary Keys of tables A and B, so there are B-indexes on a & b

By default, PostgreSQL will use seq-scan on AB and use hash join, I force it to do the index-scan and index-only-scan.

The result showed that, seq scan is much faster than the other two, it takes more time to do the full scan on a,b for index-scan and index-only-scan.

EXPLAIN ANALYZE SELECT COUNT(*) FROM journal,paper WHERE journal.paper_id = paper.paper_id;

enter image description here

Can someone explain it?

Thank you so much!

Question 2

BTW the size of the two tables are not small, at least 5k tuples

Question 3

5k is small

Question 4

Post the output of explain (analyze, buffers, verbose) for both executions

Question 5

I post the explain result, hope for your answer, thank you so much !

Question 6

Gah, screenshot? I can work with it, but in future please just post the text. It's searchable that way, and easier to reformat, read, paste to explain.depesz.com, etc.

Question 7

This is quite a common query (pardon the pun! :-) ) from people running queries which perform full table scans (FTS), when the poster feels that the system should make use of the index(es).

Basically, it boils down to the explanation given here. If the tables are so small, the optimiser will say that "it's not worth the bother of going to the index, doing a lookup and then fetching the data, instead, I'll just slurp in all the data and pick out what I need", i.e. perform an FTS.

[EDIT in answer to @txsing's comment]

For an MVCC (multi-version concurrency control) database, you have to traverse every record for a count at a given moment - that's why, for example, a COUNT(*) is much more expensive for MySQL's InnoDB rather than MyISAM.

An excellent explantion (for PostgreSQL) is available here. The guy who wrote this post is a "major contributor" to PostgreSQL (thanks to @dezso for leading me to that post).

Question 8

since it is COUNT(*) in select clause, so there is no need to fetch data from the table, right? so I think index-only-scan should be the fastest one, but the result is not :-(

Question 9

@ChrisTien An index-only scan should usually be fastest for a simple count, if the visibility map is up to date so not many heap pages have to be fetched. This is not a simple count though, there's a join involved.

Question 10

I know the reason now.

I need to vacuum the tables before using index-only-scan, otherwise, if a sufficiently high number of heap pages where modified since the last vacuum, the planner will not choose to use index-only scan. When only a small amount of pages have been changed, an index-only scan may happen, which then involves heap fetches.

If I force it to use index-only-scan, it will fetch data from table for each tuple scanned, which can cause great cost.

Question 11

@dezso I know the index-only-scan is designed to not fetch data from heaps. In my experiments, I let index-only-scan be the only one on (turn off seq-scan and index-scan), then I explain analyze my query, it shows that it indeed use index-only-scan, but the heap fetches is quite high, equal to the number of tuples in scanned table.

Question 12

@dezso However, after i vacuum all tables involved in the query, the heap fetches become zero, which means index-only-scan doesn't touch heaps this time. So we can draw the conclusion, index-only-scan will still try to fetch data from table in some case, it depends on the visibility map, when u vacuum your tables, the visibility map is set to visible to all users which can result in best performance of index-only-scan.

Question 13

maybe u need do experiments to verify your thoughts, use "explain analyze verbose YOUR QUERY"

Question 14

Hm, now I found an excellent post from Robert Haas, where he describes these in detail. I have to admit you are right - index-only scans are not always index-only, but they try as long as it is possible. At the same time, an index-only scan with additional heap fetches is possible when a sufficient portion of the table pages remains unchanged since the last vacuum.

Question 15

@ypercube an index-only scan does not touch the table. False. It might not touch the table, but only if the visibility map shows that all heap pages are 100% visible for the scanned index leaves.

Vérace Vérace 31k9 gold badges73 silver badges86 bronze badges · Answer 1 · 2015-07-26 09:14:57Z

This is quite a common query (pardon the pun! :-) ) from people running queries which perform full table scans (FTS), when the poster feels that the system should make use of the index(es).

Basically, it boils down to the explanation given here. If the tables are so small, the optimiser will say that "it's not worth the bother of going to the index, doing a lookup and then fetching the data, instead, I'll just slurp in all the data and pick out what I need", i.e. perform an FTS.

[EDIT in answer to @txsing's comment]

For an MVCC (multi-version concurrency control) database, you have to traverse every record for a count at a given moment - that's why, for example, a COUNT(*) is much more expensive for MySQL's InnoDB rather than MyISAM.

An excellent explantion (for PostgreSQL) is available here. The guy who wrote this post is a "major contributor" to PostgreSQL (thanks to @dezso for leading me to that post).

since it is COUNT(*) in select clause, so there is no need to fetch data from the table, right? so I think index-only-scan should be the fastest one, but the result is not :-(
@ChrisTien An index-only scan should usually be fastest for a simple count, if the visibility map is up to date so not many heap pages have to be fetched. This is not a simple count though, there's a join involved.

Chris Tien Chris Tien 1711 silver badge7 bronze badges · Answer 2 · 2015-08-04 05:48:26Z

4

I know the reason now.

I need to vacuum the tables before using index-only-scan, otherwise, if a sufficiently high number of heap pages where modified since the last vacuum, the planner will not choose to use index-only scan. When only a small amount of pages have been changed, an index-only scan may happen, which then involves heap fetches.

If I force it to use index-only-scan, it will fetch data from table for each tuple scanned, which can cause great cost.

Share

Improve this answer

edited Aug 4, 2015 at 16:18

András Váczi's user avatar

András Váczi

31.8k13 gold badges103 silver badges152 bronze badges

answered Aug 4, 2015 at 5:48

Chris Tien's user avatar

Chris Tien Chris Tien

1711 silver badge7 bronze badges

9

@dezso I know the index-only-scan is designed to not fetch data from heaps. In my experiments, I let index-only-scan be the only one on (turn off seq-scan and index-scan), then I explain analyze my query, it shows that it indeed use index-only-scan, but the heap fetches is quite high, equal to the number of tuples in scanned table.

Chris Tien
– Chris Tien

2015年08月04日 14:14:33 +00:00
Commented Aug 4, 2015 at 14:14
@dezso However, after i vacuum all tables involved in the query, the heap fetches become zero, which means index-only-scan doesn't touch heaps this time. So we can draw the conclusion, index-only-scan will still try to fetch data from table in some case, it depends on the visibility map, when u vacuum your tables, the visibility map is set to visible to all users which can result in best performance of index-only-scan.

Chris Tien
– Chris Tien

2015年08月04日 14:20:45 +00:00
Commented Aug 4, 2015 at 14:20
maybe u need do experiments to verify your thoughts, use "explain analyze verbose YOUR QUERY"

Chris Tien
– Chris Tien

2015年08月04日 15:31:26 +00:00
Commented Aug 4, 2015 at 15:31
1

Hm, now I found an excellent post from Robert Haas, where he describes these in detail. I have to admit you are right - index-only scans are not always index-only, but they try as long as it is possible. At the same time, an index-only scan with additional heap fetches is possible when a sufficient portion of the table pages remains unchanged since the last vacuum.

András Váczi
– András Váczi

2015年08月04日 16:14:49 +00:00
Commented Aug 4, 2015 at 16:14
2

@ypercube an index-only scan does not touch the table. False. It might not touch the table, but only if the visibility map shows that all heap pages are 100% visible for the scanned index leaves.

Craig Ringer
– Craig Ringer

2015年08月06日 12:18:40 +00:00
Commented Aug 6, 2015 at 12:18

| Show 4 more comments

Stack Exchange Network

Why seq-scan can be much faster than index-scan and index-only-scan in this simple query?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Why seq-scan can be much faster than index-scan and index-only-scan in this simple query?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions