3

I am using PostgreSQL 9.4.4. I have a query like this:

 SELECT COUNT(*) FROM A,B WHERE A.a = B.b

a and b are the Primary Keys of tables A and B, so there are B-indexes on a & b

By default, PostgreSQL will use seq-scan on AB and use hash join, I force it to do the index-scan and index-only-scan.

The result showed that, seq scan is much faster than the other two, it takes more time to do the full scan on a,b for index-scan and index-only-scan.

EXPLAIN ANALYZE SELECT COUNT(*) FROM journal,paper WHERE journal.paper_id = paper.paper_id;

enter image description here

Can someone explain it?

Thank you so much!

asked Jul 26, 2015 at 8:42
5
  • BTW the size of the two tables are not small, at least 5k tuples Commented Jul 26, 2015 at 9:28
  • 3
    5k is small Commented Jul 26, 2015 at 9:33
  • Post the output of explain (analyze, buffers, verbose) for both executions Commented Jul 26, 2015 at 9:52
  • I post the explain result, hope for your answer, thank you so much ! Commented Jul 26, 2015 at 15:46
  • 2
    Gah, screenshot? I can work with it, but in future please just post the text. It's searchable that way, and easier to reformat, read, paste to explain.depesz.com, etc. Commented Aug 6, 2015 at 12:13

2 Answers 2

6

This is quite a common query (pardon the pun! :-) ) from people running queries which perform full table scans (FTS), when the poster feels that the system should make use of the index(es).

Basically, it boils down to the explanation given here. If the tables are so small, the optimiser will say that "it's not worth the bother of going to the index, doing a lookup and then fetching the data, instead, I'll just slurp in all the data and pick out what I need", i.e. perform an FTS.

[EDIT in answer to @txsing's comment]

For an MVCC (multi-version concurrency control) database, you have to traverse every record for a count at a given moment - that's why, for example, a COUNT(*) is much more expensive for MySQL's InnoDB rather than MyISAM.

An excellent explantion (for PostgreSQL) is available here. The guy who wrote this post is a "major contributor" to PostgreSQL (thanks to @dezso for leading me to that post).

answered Jul 26, 2015 at 9:14
2
  • since it is COUNT(*) in select clause, so there is no need to fetch data from the table, right? so I think index-only-scan should be the fastest one, but the result is not :-( Commented Jul 26, 2015 at 9:23
  • @ChrisTien An index-only scan should usually be fastest for a simple count, if the visibility map is up to date so not many heap pages have to be fetched. This is not a simple count though, there's a join involved. Commented Aug 6, 2015 at 12:13
4

I know the reason now.

I need to vacuum the tables before using index-only-scan, otherwise, if a sufficiently high number of heap pages where modified since the last vacuum, the planner will not choose to use index-only scan. When only a small amount of pages have been changed, an index-only scan may happen, which then involves heap fetches.

If I force it to use index-only-scan, it will fetch data from table for each tuple scanned, which can cause great cost.

András Váczi
31.8k13 gold badges103 silver badges152 bronze badges
answered Aug 4, 2015 at 5:48
9
  • @dezso I know the index-only-scan is designed to not fetch data from heaps. In my experiments, I let index-only-scan be the only one on (turn off seq-scan and index-scan), then I explain analyze my query, it shows that it indeed use index-only-scan, but the heap fetches is quite high, equal to the number of tuples in scanned table. Commented Aug 4, 2015 at 14:14
  • @dezso However, after i vacuum all tables involved in the query, the heap fetches become zero, which means index-only-scan doesn't touch heaps this time. So we can draw the conclusion, index-only-scan will still try to fetch data from table in some case, it depends on the visibility map, when u vacuum your tables, the visibility map is set to visible to all users which can result in best performance of index-only-scan. Commented Aug 4, 2015 at 14:20
  • maybe u need do experiments to verify your thoughts, use "explain analyze verbose YOUR QUERY" Commented Aug 4, 2015 at 15:31
  • 1
    Hm, now I found an excellent post from Robert Haas, where he describes these in detail. I have to admit you are right - index-only scans are not always index-only, but they try as long as it is possible. At the same time, an index-only scan with additional heap fetches is possible when a sufficient portion of the table pages remains unchanged since the last vacuum. Commented Aug 4, 2015 at 16:14
  • 2
    @ypercube an index-only scan does not touch the table. False. It might not touch the table, but only if the visibility map shows that all heap pages are 100% visible for the scanned index leaves. Commented Aug 6, 2015 at 12:18

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.