I have a table with 4 columns:
id SERIAL PRIMARY KEY
dt DATE
val bigint -- Already has a btree index
txt VARCHAR(64)
I am running a query:
EXPLAIN ANALYSE SELECT dt, val FROM temp_test ORDER BY dt;
This was taking ~13ms and below is the output of EXPLAIN ANALYZE
Sort (cost=2862269.49..2912269.54 rows=20000020 width=12) (actual time=10561.801..12214.518 rows=20000020 loops=1)
Sort Key: dt
Sort Method: external merge Disk: 508816kB
-> Seq Scan on temp_test (cost=0.00..436917.25 rows=20000020 width=12) (actual time=0.016..2451.364 rows=20000020 loops=1)
Planning time: 0.127 ms
Execution time: 12874.978 ms
I presumed that creating an index on dt
will speed things up and I did
CREATE INDEX temp_test_dt_btree ON temp_test(dt);
Once index was created, I executed the same query again:
EXPLAIN ANALYSE SELECT dt, val FROM temp_test ORDER BY dt;
This one took ~54s which is around 4x more than one without index with the following output for EXPLAIN ANALYSE
Index Scan using temp_test_dt_btree on temp_test (cost=0.44..1317015.07 rows=20000020 width=12) (actual time=0.049..53027.137 rows=20000020 loops=1)
Planning time: 0.241 ms
Execution time: 53742.120 ms
I am a little confounded as I expected the index to improve the performance at best and have little (or no) impact at worst.
I am running:
PostgreSQL 10.11 (Ubuntu 10.11-1.pgdg18.04+1)
2 Answers 2
If your table is well-vacuumed, you can get an index only scan by creating the index:
CREATE INDEX ON temp_test(dt,val);
This might be quite a bit faster as it doesn't need to visit the table at all. (In my test, it was about 2.5 times faster than the sort.)
As to why it chooses the wrong plan with the index you already have, it could just be that you have an effective_cache_size that is set too high for the amount of RAM actually available to it. You would have to tell us more about your settings and you hardware.
I've replicated your finding in the absence of the index-only-scan index, with the planner choosing the much-slower ordered index scan over the seq scan. I replicated it on AWS m5.large machine, using Ubuntu 18.04, with PostgreSQL 10 from the ubuntu repository, and using the default settings for both OS and PostgreSQL, other than setting track_io_timing on and max_parallel_workers_per_gather=0; setting it up like this:
create table temp_test (
id SERIAL PRIMARY KEY,
dt DATE,
val bigint,
txt VARCHAR(64)
);
insert into temp_test select x, now()+interval '10 years'*random(), random()*1000000000, random()::text from generate_series(1,20000000) f(x);
vacuum ANALYZE temp_test ;
With the default shared_buffers of 128MB, the ordered index scan is preferred by the planner, but is slow as you indicate, even if all data is in memory (vmstat shows no read/write activity during the query). But if shared_buffers is increased so that it holds all the data, then the ordered index scan becomes much faster (still slower than the sort, but not by nearly as much).
So the main problem is that it is slow to read data from the filesystem cache into shared_buffers, and this is not accounted for by the planner. The code that implements effective_cache_size assumes that reading a page which is already in memory is absolutely free. The reality is that while it is much cheaper to read a page from the filesystem cache than it is to read it from disk, it is still quite expense relative to, say, a cpu_tuple_cost. So "free" is not nearly accurate.
The planner does not expose any knobs you can twist to make it account for the residual cost of reading a page from the filesystem cache, such that it does not appear to be free. And it would not be easy to add such a knob, because to do so accurately you would need to distinguish between how many pages will be re-found in shared_buffers, from how many will be re-found in the filesystem cache but not in share_buffers. The current 'Mackert and Lohman formula' doesn't make such a distinction and it is not clear to me how to make it do so.
-
Amazing! Thanks for the deep dive. It's very informational. Should I report this to Postgres team?halfusser– halfusser2020年02月08日 07:00:43 +00:00Commented Feb 8, 2020 at 7:00
-
There are lot of minor tweaks that could be made to the planner, and not many people to make them (and test and review them), so without a concrete proposal like a functioning patch, I wouldn't expect much more than sympathy. A big concern would be that anything we do will be likely to slightly slow down planning for everyone, only to benefit a tiny number of queries. So unless you are really burning to take on a new time-consuming hobby, I'd probably focus on work-arounds within the existing system.jjanes– jjanes2020年02月08日 18:21:39 +00:00Commented Feb 8, 2020 at 18:21
Using an index is only fast if you select a small set of records. You want the entire table, the DB is wasting its time switching between index and data.
-
Thanks! So any alternative way to speed up the query further?halfusser– halfusser2020年02月06日 14:59:50 +00:00Commented Feb 6, 2020 at 14:59
-
That doesn't explain why PostgreSQL chooses the wrong plan.Laurenz Albe– Laurenz Albe2020年02月06日 16:05:16 +00:00Commented Feb 6, 2020 at 16:05
-
After index creation, did you recreate the statistics on the table?Gerard H. Pille– Gerard H. Pille2020年02月07日 23:41:18 +00:00Commented Feb 7, 2020 at 23:41
-
@GerardH.Pille No, would that help?halfusser– halfusser2020年02月08日 07:01:43 +00:00Commented Feb 8, 2020 at 7:01
-
Do you find it too exhausting to try it? I didn't find any mention in the docs that the optimizer uses the statistics, but what other use could they have?Gerard H. Pille– Gerard H. Pille2020年02月08日 09:33:25 +00:00Commented Feb 8, 2020 at 9:33
Explore related questions
See similar questions with these tags.
dt
from index then getsval
from table. Create index which includesval
too...CREATE INDEX temp_test_dt_btree ON temp_test(dt, val);
, it's still same running with index only ondt
.