Based on this related question on SO.
This works fine if you only want simple matches. Suppose tracks
have both Artist
and Title
field in the JSON data. So we have something like:
INSERT INTO tracks (id, data)
VALUES (1, '[{"artist": "Simple Plan", "title": "Welcome to My Life"}]');
We create the index like in the original question:
CREATE INDEX tracks_artists_gin_idx ON tracks
USING GIN (json2arr(data, 'artist'));
CREATE INDEX tracks_title_gin_idx ON tracks
USING GIN (json2arr(data, 'title'));
So now we have two fields to match. As you can see, if we perform the original query (with very naive modifications) of:
SELECT *
FROM tracks
WHERE '{"ARTIST NAME"}'::text[] <@ (json2arr(data, 'artist'))
AND '{"TITLE"}'::text[] <@ (json2arr(data, 'title'))
This will give the wrong answer because the indices of the array of artist and title in the JSON array do not have to match for this query to match something in the JSON. What is the proper way of doing this query so we can get the exact match we need? Does json2arr()
need to be changed?
Edit: Why this is wrong
Suppose our table has records like:
INSERT INTO tracks (id, data) VALUES
(1, '[{"artist": "Simple Plan", "title": "Welcome to My Life"}]')
, (2, '[{"artist": "Another Artist", "title": "Welcome to My Life"},
{"artist": "Simple Plan", "title": "Perfect"}]');
If you query like:
SELECT *
FROM tracks
WHERE '{"Simple Plan"}'::text[] <@ (json2arr(data, 'artist'))
AND '{"Welcome to my Life"}'::text[] <@ (json2arr(data, 'title'))
Both records will be matched (both record 1 and 2), even though you really only wanted the first record.
-
Could you actually provide an example where you get back the wrong row?András Váczi– András Váczi2014年07月15日 11:50:59 +00:00Commented Jul 15, 2014 at 11:50
-
@dezso updated with exampleKen Li– Ken Li2014年07月15日 12:12:35 +00:00Commented Jul 15, 2014 at 12:12
1 Answer 1
Why?
The expressed requirement that artist
and title
must match in the same element of the JSON array is not reflected in your query, which finds all rows where at least one element matches the artist
and another (possibly a different one) matches the title
.
The example data for your first case was inconclusive, since the query cannot fail this way for a single array element. Your second example demonstrates the case well, though.
Solution in Postgres 9.3
There are various ways to fix this. One way would be to translate each json array element to an SQL array of composite type consisting of artist
and title
and match the whole type as one.
Another way would be to keep indexes like you have now (or even a single composite spanning both expressions: (json2arr(data, 'artist'), json2arr(data, 'title'))
. Your current query identifies possible matches. Unnest the json array data
for all identified candidates and check whether both artist
and title
match on the same element. May or may not be efficient enough.
Better use jsonb
in n Postgres 9.4
I am not going into detail for json
/ pg 9.3, because the release of jsonb
in Postgres 9.4 with advanced indexing capabilities mostly obsoleted the problem. This can be implemented much simpler and more efficient with a native GIN index on a jsonb
column. There are various options. To optimize for the presented case:
CREATE TABLE tracks (id serial, data jsonb);
INSERT INTO tracks (id, data) VALUES
(1, '[{"artist": "Simple Plan", "title": "Welcome to My Life"}]')
, (2, '[{"artist": "Another Artist", "title": "Welcome to My Life"},
{"artist": "Simple Plan", "title": "Perfect"}]');
Index:
CREATE INDEX tracks_data_gin_idx ON tracks USING gin (data jsonb_path_ops);
Query:
SELECT * FROM tracks
WHERE data @> '[{"artist": "Simple Plan", "title": "Welcome to My Life"}]';
Sequence of attributes in the JSON value and insignificant whitespace don't matter for jsonb
. I added detailed information for jsonb to the referenced answer on SO: