I have a table of people, each person has a color and they seek various other colors and require various skills.
So I can query this I've created a table like so:
person id, color_is_red, color_is_green, color_is_blue, color_is_yellow, seeks_red, seeks_green, seeks_blue, seeks_yellow, seeks_technical, seeks_creative, seeks_leadership
I can then query the table like so:
-- Searching for green & yellow people that looking for red people with skills creative or technical
select id
from person WHERE
color_is_red = false
AND color_is_blue = false
AND seeks_red = true
AND
(
seeks_creative = true
OR seeks_technical = true
)
limit 200
I'm unclear if this is the best way to structure this data. I also don't know how I can index this. I had hoped to use a BLOOM index but bloom doesn't support booleans. I think maybe I need a GIN index as I'll be querying different combinations of the colors depending on whats required however I've tried creating a gin_btree index but the query planner isn't using it. Another option would be a to use a bitset but I'm not sure how I would do this or index it.
Any help would be greatly appreciated! Thanks very much
update: I have created a single btree index for each column but the query planner doesn't seem to be choosing to use them. Here is the explain output: https://explain.depesz.com/s/TZLG
update 2
I've tried to use intarray
to index int[]
but I've not had much look with basic queries taking over a minute to execute: https://explain.depesz.com/s/RyBn
1 Answer 1
I've found a solution by casting the bools to ints and indexing using a bloom
index. The query now executes in 27ms using a bitmap index scan compared to the previous 100ms of the sequential scan.
The index:
CREATE INDEX bloom_idx ON person USING bloom (
CAST(color_is_red as int), CAST(color_is_green as int), CAST(color_is_blue as int), CAST(color_is_yellow as int),
CAST(seeks_red as int), CAST(seeks_green as int), CAST(seeks_blue as int), CAST(seeks_yellow as int)
);
The query:
explain (analyze, buffers, format text)
SELECT id
FROM person
WHERE CAST(color_is_red as int) = 0
AND CAST(color_is_blue as int) = 0
AND CAST(seeks_red as int) = 1
AND
(
CAST(seeks_creative as int) = 1
OR CAST(seeks_technical as int) = 1
)
limit 200;
The explain output:
https://explain.depesz.com/s/piUf
"Limit (cost=20348.00..20352.04 rows=1 width=4) (actual time=27.660..27.824 rows=200 loops=1)"
" Buffers: shared hit=1998"
" -> Bitmap Heap Scan on person (cost=20348.00..20352.04 rows=1 width=4) (actual time=27.658..27.803 rows=200 loops=1)"
" Recheck Cond: (((color_is_red)::integer = 0) AND ((color_is_blue)::integer = 0) AND ((seeks_red)::integer = 1))"
" Rows Removed by Index Recheck: 58"
" Filter: (((seeks_creative)::integer = 1) OR ((seeks_technical)::integer = 1))"
" Rows Removed by Filter: 72"
" Heap Blocks: exact=37"
" Buffers: shared hit=1998"
" -> Bitmap Index Scan on bloom_idx_no_skills (cost=0.00..20348.00 rows=1 width=0) (actual time=24.281..24.281 rows=156588 loops=1)"
" Index Cond: (((color_is_red)::integer = 0) AND ((color_is_blue)::integer = 0) AND ((seeks_red)::integer = 1))"
" Buffers: shared hit=1961"
"Planning Time: 0.458 ms"
"Execution Time: 27.905 ms"
(Querying a table of 1 million randomly generated rows)
OR
in theWHERE
condition you won't get far.jsonb
and using a GIN index?