I have a very simple table in Postgresql, and need some guidance to speed up my queries. Recent large expansion of the data in the table has made everything grind to halt.
Table body_feature:
Column | Type | Modifiers
-----------+------------------+-----------
feature | text | not null
body | text | not null
start | integer | not null
stop | integer | not null
Indexes:
"t_feature_ind" btree (feature)
In the old version of the database, there were ~32 million rows in the database.
Here is a simple query:
explain analyze select * from body_feature where feature='ABC';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Index Scan using t_feature_ind on body_feature (cost=0.00..49544.92 rows=15986 width=69) (actual time=175.967..261.175 rows=7285 loops=1)
Index Cond: (feature = 'ABC'::text)
Total runtime: 261.555 ms
With the recent expansion, number of rows has grown to more than 110 million, and the same query gives:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on body_feature (cost=944.33..155543.90 rows=42307 width=73) (actual time=1118.598..204052.203 rows=23132 loops=1)
Recheck Cond: (feature = 'ABC'::text)
-> Bitmap Index Scan on t_feature_ind (cost=0.00..933.75 rows=42307 width=0) (actual time=952.948..952.948 rows=23132 loops=1)
Index Cond: (feature = 'ABC'::text)
Total runtime: 204065.405 ms
(5 rows)
The original machine running the server is quite old and has 8 Gb of RAM, but I have access to a 1Tb RAM machine for testing.
What would you suggest I change in the server configuration to make such simple query revert back to the simple index scan?
I tried extracting feature and body values (which are text) into separate tables, and using simple integer IDs in the body_feature table (which cuts the raw size down significantly), but I still get the same results.
Thank you for any suggestions/help.
1 Answer 1
You are assuming that the problem is the switch to the bitmap scan. That is very unlikely.
The likely problem is that your data had to be read from disk, either because it has grown too large to fit in cache (in which case more RAM would help), or the cache was cold when you happened to run the test.
The switch to bitmap scan is likely a response to the same thing that caused the problem, the growth in data.
You could cluster the table on the index t_feature_ind, which would make it more efficient to pull out all of the data with any given feature.
-
Is there a way of checking if the data is being read from the disk? Which parameters do I need to change to try and fit the table/index into memory? BTW, the database is completely static, ie. no user initiated writes ever happen.Zathar– Zathar2014年02月26日 23:08:30 +00:00Commented Feb 26, 2014 at 23:08
-
@Zathar The best way to monitor this is at the operating system level, for example with
sar
andvmstat
. This only works well if the system is rather idle other than your work, though. You can also get "block read" information from PostgreSQL, but if the blocks came from the kernel's cache rather than from disk, PostgreSQL has no way of knowing that so it still counts it as a read. In newer versions of PostgreSQL, another strategy is to turn on track_io_timing. If not much time is spent on IO, it doesn't matter that there was a lot of it.jjanes– jjanes2014年02月27日 17:02:27 +00:00Commented Feb 27, 2014 at 17:02
Explore related questions
See similar questions with these tags.
EXPLAIN ANALYZE
please.