5

I have a table that stores a conversation between two people.

The data will look something like this:

CREATE TABLE foo
AS
 SELECT $$[
 { "user": 1, "timestamp": 1, "message": "First message" },
 { "user": 2, "timestamp": 2, "message": "Second message" },
 { "user": 2, "timestamp": 3, "message": "Debounced message from same user" },
 { "user": 1, "timestamp": 4, "message": "Last message" }
 ]$$::jsonb AS jsondata;

I never need to look up each message individually, so I just want to store the whole conversation in a single jsonb field. I need to perform a full-text search across all of the messages.

My first thought was to create a new text column, concat all of the messages into one long string, and create a trigram GIN index on that column.

That seems like a hack that wastes a lot of space, so I would like to avoid the intermediate column. How can I create the index directly from the jsonb column?

Evan Carroll
65.7k50 gold badges259 silver badges510 bronze badges
asked Apr 12, 2017 at 19:53
0

2 Answers 2

7

The way I read this question, you only care about message. The difficulty here is that you need to,

  1. map over a json array returning the message element
  2. reduce/fold the array of message element strings to an aggregate string.

This is easy in functional programming. It's not as easy with the stock functions in PostgreSQL, and it'd be difficult to make it work with a declarative language. Maybe one day you'll have a jsonb_array_elements(jsonb [,path]) which will get you by but until then we can create a function in our database.

Creating a function with plpgsql

Note this probably isn't as a fast nor as clean as a plv8 function, but in the next revision we'll return a tsvector.

Here we use jsonb_array_elements to expand the json, and then aggregate back the 'message' elements into a string.

CREATE OR REPLACE FUNCTION jsonb_message_to_string( jsondata jsonb, out string text )
AS $func$
 BEGIN
 SELECT INTO string
 string_agg(d->>'message', ' ')
 FROM jsonb_array_elements(jsondata) AS d;
 RETURN;
 END;
$func$ LANGUAGE plpgsql
IMMUTABLE;

Creating tsvector_agg and improving our function.

This function is not yet optimal though because it's returning a string. However, there is a second difficulty in that as of 9.6 PostgreSQL does not yet ship with a tsvector_agg; but, it's PostgreSQL so we can make one..

CREATE AGGREGATE tsvector_agg (tsvector) (
 SFUNC = tsvector_concat,
 STYPE = tsvector
);

This permits us to now return an aggregate tsvector which is faster and retains positional information. Now we can improve our function. Here we create a new jsonb_message_to_tsvector.

CREATE OR REPLACE FUNCTION jsonb_message_to_tsvector( jsondata jsonb, out tsv tsvector )
AS $func$
 BEGIN
 SELECT INTO tsv
 tsvector_agg(to_tsvector(d->>'message'))
 FROM jsonb_array_elements(jsondata) AS d;
 RETURN;
 END;
$func$ LANGUAGE plpgsql
IMMUTABLE;

Now we can create our index..

CREATE INDEX ON FOO
 USING gin (jsonb_message_to_tsvector(jsondata));

And we would query it like so..

SELECT jsonb_message_to_tsvector(jsondata) @@ 'first'
FROM foo;
answered Apr 12, 2017 at 20:47
0
1

here is example:

t=# create table so59(j jsonb);
CREATE TABLE
t=# insert into so59 select '[
 { "user": 1, "timestamp": 1, "message": "First message" },
 { "user": 2, "timestamp": 2, "message": "Second message" },
 { "user": 2, "timestamp": 3, "message": "Debounced message from same user" },
 { "user": 1, "timestamp": 4, "message": "Last message" }
]
';
INSERT 0 1
t=# create index so60 on so59 using gin(to_tsvector('english',j::text));
CREATE INDEX

update: The you can create a simple function to strip jsonb array to text, eg:

t=# create or replace function so61(j jsonb) returns text as
$$
with a as (select jsonb_array_elements(j)->>'message' m) select string_agg(m,',') from a;
$$ language sql;
CREATE FUNCTION
t=# select so61(j) from so59;
 so61
----------------------------------------------------------------------------
 First message,Second message,Debounced message from same user,Last message
(1 row)
t=# create index so61 on so59 using gin(to_tsvector('english',so61(j)));
answered Apr 12, 2017 at 20:03
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.