Postgresql query slowed with table growth

Question 1

I have a very simple table in Postgresql, and need some guidance to speed up my queries. Recent large expansion of the data in the table has made everything grind to halt.

Table body_feature:

 Column | Type | Modifiers
 -----------+------------------+-----------
 feature | text | not null
 body | text | not null
 start | integer | not null
 stop | integer | not null
Indexes:
 "t_feature_ind" btree (feature)

In the old version of the database, there were ~32 million rows in the database.

Here is a simple query:

explain analyze select * from body_feature where feature='ABC'; 
 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_feature_ind on body_feature (cost=0.00..49544.92 rows=15986 width=69) (actual time=175.967..261.175 rows=7285 loops=1)
 Index Cond: (feature = 'ABC'::text)
 Total runtime: 261.555 ms

With the recent expansion, number of rows has grown to more than 110 million, and the same query gives:

 QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on body_feature (cost=944.33..155543.90 rows=42307 width=73) (actual time=1118.598..204052.203 rows=23132 loops=1)
 Recheck Cond: (feature = 'ABC'::text)
 -> Bitmap Index Scan on t_feature_ind (cost=0.00..933.75 rows=42307 width=0) (actual time=952.948..952.948 rows=23132 loops=1)
 Index Cond: (feature = 'ABC'::text)
 Total runtime: 204065.405 ms
(5 rows)

The original machine running the server is quite old and has 8 Gb of RAM, but I have access to a 1Tb RAM machine for testing.

What would you suggest I change in the server configuration to make such simple query revert back to the simple index scan?

I tried extracting feature and body values (which are text) into separate tables, and using simple integer IDs in the body_feature table (which cuts the raw size down significantly), but I still get the same results.

Thank you for any suggestions/help.

Question 2

What's the runtime difference? Also, read stackoverflow.com/tags/postgresql-performance/info and add the appropriate extra information.

Question 3

Runtime difference is huge, from instant to more than 1 minute.

Question 4

EXPLAIN ANALYZE please.

Question 5

@jishi Please don't just ask people to post elsewhere; they'll copy & paste the question. Flag for migration instead please.

Question 6

Updated to EXPLAIN ANALYZE

Question 7

You are assuming that the problem is the switch to the bitmap scan. That is very unlikely.

The likely problem is that your data had to be read from disk, either because it has grown too large to fit in cache (in which case more RAM would help), or the cache was cold when you happened to run the test.

The switch to bitmap scan is likely a response to the same thing that caused the problem, the growth in data.

You could cluster the table on the index t_feature_ind, which would make it more efficient to pull out all of the data with any given feature.

Question 8

Is there a way of checking if the data is being read from the disk? Which parameters do I need to change to try and fit the table/index into memory? BTW, the database is completely static, ie. no user initiated writes ever happen.

Question 9

@Zathar The best way to monitor this is at the operating system level, for example with sar and vmstat. This only works well if the system is rather idle other than your work, though. You can also get "block read" information from PostgreSQL, but if the blocks came from the kernel's cache rather than from disk, PostgreSQL has no way of knowing that so it still counts it as a read. In newer versions of PostgreSQL, another strategy is to turn on track_io_timing. If not much time is spent on IO, it doesn't matter that there was a lot of it.

jjanes jjanes 42.4k3 gold badges44 silver badges54 bronze badges · Answer 1 · 2014-02-26 17:30:48Z

1

You are assuming that the problem is the switch to the bitmap scan. That is very unlikely.

The likely problem is that your data had to be read from disk, either because it has grown too large to fit in cache (in which case more RAM would help), or the cache was cold when you happened to run the test.

The switch to bitmap scan is likely a response to the same thing that caused the problem, the growth in data.

You could cluster the table on the index t_feature_ind, which would make it more efficient to pull out all of the data with any given feature.

Share

Improve this answer

answered Feb 26, 2014 at 17:30

jjanes's user avatar

jjanes jjanes

42.4k3 gold badges44 silver badges54 bronze badges

2

Is there a way of checking if the data is being read from the disk? Which parameters do I need to change to try and fit the table/index into memory? BTW, the database is completely static, ie. no user initiated writes ever happen.

Zathar
– Zathar

2014年02月26日 23:08:30 +00:00
Commented Feb 26, 2014 at 23:08
@Zathar The best way to monitor this is at the operating system level, for example with sar and vmstat. This only works well if the system is rather idle other than your work, though. You can also get "block read" information from PostgreSQL, but if the blocks came from the kernel's cache rather than from disk, PostgreSQL has no way of knowing that so it still counts it as a read. In newer versions of PostgreSQL, another strategy is to turn on track_io_timing. If not much time is spent on IO, it doesn't matter that there was a lot of it.

jjanes
– jjanes

2014年02月27日 17:02:27 +00:00
Commented Feb 27, 2014 at 17:02

Add a comment |

Stack Exchange Network

Postgresql query slowed with table growth

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Postgresql query slowed with table growth

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions