i read the doc about postgres bloom, but can't reproduce the same results, please help me to understand what i missed. my server is:
SHOW server_version;
server_version
-------------------------------
10.6 (Debian 10.6-1.pgdg90+1)
dev=# show random_page_cost;
random_page_cost
------------------
4
first create the table with the same command as in the docs:
dev=# CREATE TABLE tbloom AS
SELECT
(random() * 1000000)::int as i1,
(random() * 1000000)::int as i2,
(random() * 1000000)::int as i3,
(random() * 1000000)::int as i4,
(random() * 1000000)::int as i5,
(random() * 1000000)::int as i6
FROM
generate_series(1,10000000);
next i create the btree index
dev=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
and get next plan:
dev=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..127195.10 rows=1 width=24) (actual time=258.963..260.900 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbloom (cost=0.00..126195.00 rows=1 width=24) (actual time=255.446..255.446 rows=0 loops=3)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 3333333
Planning time: 0.412 ms
Execution time: 260.939 ms
Execution time: 260.939 ms
and now drop the btree index and create bloom:
dev=# DROP INDEX btreeidx;
DROP INDEX
dev=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
get new plan:
dev=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..127195.10 rows=1 width=24) (actual time=260.278..261.989 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbloom (cost=0.00..126195.00 rows=1 width=24) (actual time=256.224..256.224 rows=0 loops=3)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 3333333
Planning time: 0.165 ms
Execution time: 262.053 ms
Execution time: 262.053 ms in the docs
Bloom is better than btree
but not in my test. I tried several options with Length but not find good result.
-
Your execution plan says both the indexes aren't getting used. It is going with the sequential scan. If you want to perform a real test- just try by creating btree index on any of 2 columns run analyze and then check execution plan by filtering result for those 2 column. Then, disable that index -create bloom index for those 2 same column -analyze and then run the same query- you might see the difference nowShiwangini– Shiwangini2020年07月19日 18:27:52 +00:00Commented Jul 19, 2020 at 18:27
3 Answers 3
When bloom was introduced in 9.6, parallel query had just been introduced and is was turned off by default. Bloom appeared to be better than a non-parallel sequential scan in the example given. But when you can do a parallel seq scan, it appears better than using a bloom index does. It is not actually better as can be verified by turning off parallel query with set max_parallel_workers_per_gather TO 0 and looking at the actual executions speeds, but the planner thinks the parallel seq scan will be better. It looks like maybe the cost estimation part of bloom could use some work.
The example code was not updated for when parallel query was turned on by default, in v10, so it no longer works as advertised.
Note that your example never achieved any index usage at all, so you can't really draw any conclusions about which index is better for that scenario.
Update
This seems to be a bug with estimates.
Bloom is using internally genericcostestimates. That's defeated if the seqscan goes parallel.
Old attempt at answer
You're not even using the index you're creating (bloom or btree)
-> Parallel Seq Scan on tbloom (cost=0.00..126195.00 rows=1 width=24) (actual time=256.224..256.224 rows=0 loops=3)
That's showing that you're scanning the entire table with parallel workers. Your indexing is totally irrelevant neither index is used (thus the <1% difference). Did you ANALYZE the table after you created the index? If so, try
set enable_seqscan = 0;
And run the EXPLAIN ANALYZE for the query again. I would expect the bloom index speed things up by massively reducing the size of the table you have to visit.
Answer left in comments by a-horse-with-no-name
It seems to depend on the value for random_page_cost. On my laptop where I have a SSD, this is set to 1 and in that case the bloom index is used. On a server that has random_page_cost higher than 1, the seq scan is used:
https://explain.depesz.com/s/6Ynx
If you lower it to 1 (at least on Postgres 11) the bloom index is more efficient to the optimizer and thus it chooses the index scan:
screenshot
However, setting that value to 1 only makes sense on SSDs, it's not a good idea for spinning hard disks.
Analyzing the table does not change things (tested in Postgres 11). It seems the bloom costing could do with some adjustments.