postgres bloom index

Question 1

i read the doc about postgres bloom, but can't reproduce the same results, please help me to understand what i missed. my server is:

SHOW server_version;
 server_version 
-------------------------------
 10.6 (Debian 10.6-1.pgdg90+1)
dev=# show random_page_cost;
 random_page_cost 
------------------
 4

first create the table with the same command as in the docs:

dev=# CREATE TABLE tbloom AS 
 SELECT
 (random() * 1000000)::int as i1,
 (random() * 1000000)::int as i2,
 (random() * 1000000)::int as i3,
 (random() * 1000000)::int as i4,
 (random() * 1000000)::int as i5,
 (random() * 1000000)::int as i6
 FROM
 generate_series(1,10000000);

next i create the btree index

dev=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX

and get next plan:

dev=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
 QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------------
 Gather (cost=1000.00..127195.10 rows=1 width=24) (actual time=258.963..260.900 rows=0 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 -> Parallel Seq Scan on tbloom (cost=0.00..126195.00 rows=1 width=24) (actual time=255.446..255.446 rows=0 loops=3)
 Filter: ((i2 = 898732) AND (i5 = 123451))
 Rows Removed by Filter: 3333333
 Planning time: 0.412 ms
 Execution time: 260.939 ms

Execution time: 260.939 ms

and now drop the btree index and create bloom:

dev=# DROP INDEX btreeidx;
DROP INDEX
dev=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX

get new plan:

dev=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
 QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------------
 Gather (cost=1000.00..127195.10 rows=1 width=24) (actual time=260.278..261.989 rows=0 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 -> Parallel Seq Scan on tbloom (cost=0.00..126195.00 rows=1 width=24) (actual time=256.224..256.224 rows=0 loops=3)
 Filter: ((i2 = 898732) AND (i5 = 123451))
 Rows Removed by Filter: 3333333
 Planning time: 0.165 ms
 Execution time: 262.053 ms

Execution time: 262.053 ms in the docs

Bloom is better than btree

but not in my test. I tried several options with Length but not find good result.

Question 2

Your execution plan says both the indexes aren't getting used. It is going with the sequential scan. If you want to perform a real test- just try by creating btree index on any of 2 columns run analyze and then check execution plan by filtering result for those 2 column. Then, disable that index -create bloom index for those 2 same column -analyze and then run the same query- you might see the difference now

Question 3

When bloom was introduced in 9.6, parallel query had just been introduced and is was turned off by default. Bloom appeared to be better than a non-parallel sequential scan in the example given. But when you can do a parallel seq scan, it appears better than using a bloom index does. It is not actually better as can be verified by turning off parallel query with set max_parallel_workers_per_gather TO 0 and looking at the actual executions speeds, but the planner thinks the parallel seq scan will be better. It looks like maybe the cost estimation part of bloom could use some work.

The example code was not updated for when parallel query was turned on by default, in v10, so it no longer works as advertised.

Note that your example never achieved any index usage at all, so you can't really draw any conclusions about which index is better for that scenario.

Question 4

Update

This seems to be a bug with estimates.

Bloom is using internally genericcostestimates. That's defeated if the seqscan goes parallel.

Old attempt at answer

You're not even using the index you're creating (bloom or btree)

-> Parallel Seq Scan on tbloom (cost=0.00..126195.00 rows=1 width=24) (actual time=256.224..256.224 rows=0 loops=3)

That's showing that you're scanning the entire table with parallel workers. Your indexing is totally irrelevant neither index is used (thus the <1% difference). Did you ANALYZE the table after you created the index? If so, try

set enable_seqscan = 0;

And run the EXPLAIN ANALYZE for the query again. I would expect the bloom index speed things up by massively reducing the size of the table you have to visit.

Question 5

_{Answer left in comments by a-horse-with-no-name}

It seems to depend on the value for random_page_cost. On my laptop where I have a SSD, this is set to 1 and in that case the bloom index is used. On a server that has random_page_cost higher than 1, the seq scan is used:

https://explain.depesz.com/s/6Ynx

If you lower it to 1 (at least on Postgres 11) the bloom index is more efficient to the optimizer and thus it chooses the index scan:

screenshot

However, setting that value to 1 only makes sense on SSDs, it's not a good idea for spinning hard disks.

Analyzing the table does not change things (tested in Postgres 11). It seems the bloom costing could do with some adjustments.

jjanes 42.6k3 gold badges44 silver badges54 bronze badges · Accepted Answer · 2019-02-12 00:53:24Z

When bloom was introduced in 9.6, parallel query had just been introduced and is was turned off by default. Bloom appeared to be better than a non-parallel sequential scan in the example given. But when you can do a parallel seq scan, it appears better than using a bloom index does. It is not actually better as can be verified by turning off parallel query with set max_parallel_workers_per_gather TO 0 and looking at the actual executions speeds, but the planner thinks the parallel seq scan will be better. It looks like maybe the cost estimation part of bloom could use some work.

The example code was not updated for when parallel query was turned on by default, in v10, so it no longer works as advertised.

Note that your example never achieved any index usage at all, so you can't really draw any conclusions about which index is better for that scenario.

Stack Exchange Network

postgres bloom index

3 Answers 3

Update

Old attempt at answer

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

postgres bloom index

3 Answers 3

Update

Old attempt at answer

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions