I have the following table structure:
CREATE TABLE items (
id SERIAL PRIMARY KEY,
labels json
);
INSERT INTO items (labels) VALUES
('{"labels":[{"value":"apple","score":0.95},{"value":"fruit","score":0.94},{"value":"red","score":0.93}]}'),
('{"labels":[{"value":"apple","score":0.92},{"value":"fruit","score":0.92},{"value":"green","score":0.93}]}'),
('{"labels":[{"value":"orange","score":0.92},{"value":"fruit","score":0.92},{"value":"orange","score":0.90}]}'),
('{"labels":[{"value":"tomato","score":0.98},{"value":"vegetable","score":0.96},{"value":"red","score":0.95}]}'),
('{"labels":[{"value":"carrot","score":0.94},{"value":"vegetable","score":0.93},{"value":"orange","score":0.92}]}')
('{"labels":[{"value":"peach","score":0.92},{"value":"fruit","score":0.92},{"value":"yellow","score":0.91}]}')
I'm trying to find a way to query this table by the labels column, using the "intersection" criteria with a factor of similarity>= 2.
e.g. if I set ["peach", "fruit", "orange"]
array as the WHERE clause input, then the result should be:
id|
---+
3 |
---+
6 |
---+
1 Answer 1
You can use exists
combined with json_array_elements
select id from items where exists
(select 1 from json_array_elements(labels->'labels') f(x)
where x->>'value' in ('peach', 'fruit', 'orange')
having count(*)>=2
);
It gives the answer you want, but good luck making it fast if the table is large. (Changing from JSON to JSONB might speed it up slightly)