3

I have a very simple table in Postgresql, and need some guidance to speed up my queries. Recent large expansion of the data in the table has made everything grind to halt.

Table body_feature:

 Column | Type | Modifiers
 -----------+------------------+-----------
 feature | text | not null
 body | text | not null
 start | integer | not null
 stop | integer | not null
Indexes:
 "t_feature_ind" btree (feature)

In the old version of the database, there were ~32 million rows in the database.

Here is a simple query:

explain analyze select * from body_feature where feature='ABC'; 
 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_feature_ind on body_feature (cost=0.00..49544.92 rows=15986 width=69) (actual time=175.967..261.175 rows=7285 loops=1)
 Index Cond: (feature = 'ABC'::text)
 Total runtime: 261.555 ms

With the recent expansion, number of rows has grown to more than 110 million, and the same query gives:

 QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on body_feature (cost=944.33..155543.90 rows=42307 width=73) (actual time=1118.598..204052.203 rows=23132 loops=1)
 Recheck Cond: (feature = 'ABC'::text)
 -> Bitmap Index Scan on t_feature_ind (cost=0.00..933.75 rows=42307 width=0) (actual time=952.948..952.948 rows=23132 loops=1)
 Index Cond: (feature = 'ABC'::text)
 Total runtime: 204065.405 ms
(5 rows)

The original machine running the server is quite old and has 8 Gb of RAM, but I have access to a 1Tb RAM machine for testing.

What would you suggest I change in the server configuration to make such simple query revert back to the simple index scan?

I tried extracting feature and body values (which are text) into separate tables, and using simple integer IDs in the body_feature table (which cuts the raw size down significantly), but I still get the same results.

Thank you for any suggestions/help.

asked Feb 26, 2014 at 10:56
7
  • What's the runtime difference? Also, read stackoverflow.com/tags/postgresql-performance/info and add the appropriate extra information. Commented Feb 26, 2014 at 11:29
  • Runtime difference is huge, from instant to more than 1 minute. Commented Feb 26, 2014 at 11:45
  • EXPLAIN ANALYZE please. Commented Feb 26, 2014 at 12:09
  • @jishi Please don't just ask people to post elsewhere; they'll copy & paste the question. Flag for migration instead please. Commented Feb 26, 2014 at 12:15
  • Updated to EXPLAIN ANALYZE Commented Feb 26, 2014 at 12:34

1 Answer 1

1

You are assuming that the problem is the switch to the bitmap scan. That is very unlikely.

The likely problem is that your data had to be read from disk, either because it has grown too large to fit in cache (in which case more RAM would help), or the cache was cold when you happened to run the test.

The switch to bitmap scan is likely a response to the same thing that caused the problem, the growth in data.

You could cluster the table on the index t_feature_ind, which would make it more efficient to pull out all of the data with any given feature.

answered Feb 26, 2014 at 17:30
2
  • Is there a way of checking if the data is being read from the disk? Which parameters do I need to change to try and fit the table/index into memory? BTW, the database is completely static, ie. no user initiated writes ever happen. Commented Feb 26, 2014 at 23:08
  • @Zathar The best way to monitor this is at the operating system level, for example with sar and vmstat. This only works well if the system is rather idle other than your work, though. You can also get "block read" information from PostgreSQL, but if the blocks came from the kernel's cache rather than from disk, PostgreSQL has no way of knowing that so it still counts it as a read. In newer versions of PostgreSQL, another strategy is to turn on track_io_timing. If not much time is spent on IO, it doesn't matter that there was a lot of it. Commented Feb 27, 2014 at 17:02

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.