Gather values from different related rows with JSONB fields

Question 1

Say I have the following table structure:

jobs
+----+------------+
| id | some_field |
+----+------------+
| 1 | some_val |
| 2 | some_val |
| 3 | some_val |
+----+------------+
events
+----+--------+----------------------------------------+
| id | job_id | payload (JSONB) |
+----+--------+----------------------------------------+
| 1 | 1 | {'type':1, 'a':'some_val', ...} |
| 2 | 1 | {'type':2, 'c':'some_other_val1', ...} |
| 3 | 2 | {'type':1, 'a':'some_other_val2', ...} |
| 4 | 2 | {'type':1, 'a':'some_other_val3', ...} |
| 5 | 3 | {'type':1, 'a':'some_other_val4', ...} |
| 6 | 3 | {'type':1, 'a':'some_other_val5', ...} |
| 7 | 3 | {'type':2, 'c':'some_other_val6', ...} |
| 8 | 3 | {'type':3, 'd':'some_other_val7', ...} |
+----+--------+----------------------------------------+

I want to create a query/view that ...

... selects one row for each job IF it has an event with payload->>'type' = 2
... in that row, shows values from other events that are associated with this job.
... in that row, shows values from other events that are associated with this job if that event type occurs multiple times.

For example, with the above data I'd like to select all jobs that have an event of type 2, and find out the value of the d field from events with type 3 and some aggregate (say, COUNT()) of the a field from events with type 1. The result would be:

Results
+--------+------------------+----------+
| job_id | d | COUNT(a) |
+--------+------------------+----------+
| 1 | NULL | 1 |
| 3 | some_other_val7 | 2 |
+--------+------------------+----------+

As a bonus: Each job can have only one event of type 2, so the basic query can be structured like:

SELECT
 job.id
FROM
 event
LEFT JOIN 
 job ON event.job_id = job.id
WHERE
 event.payload ->> 'type' = 2;

How can I get the result table? Do I need to do sub queries for each field/event type, which would result in relatively bad performance, or can I use some GROUP BY job_id on the event table and maybe join the JSONs? (Note that my data is more complex and has many more fields in the payload that I would like to select).

Here is a DB Fiddle with the example data and my test query:

CREATE TABLE job (
 id SERIAL PRIMARY KEY,
 some_field TEXT
);
CREATE TABLE event (
 id SERIAL PRIMARY KEY,
 job_id INTEGER,
 payload JSONB
);
INSERT INTO job (some_field) VALUES ('val1'), ('val2'), ('val3');
INSERT INTO event (job_id, payload) VALUES 
 (1, '{"type":1, "a":"some_val"}'::json),
 (1, '{"type":2, "c":"some_other_val1"}'::json),
 (2, '{"type":1, "a":"some_other_val2"}'::json),
 (2, '{"type":1, "a":"some_other_val3"}'::json),
 (3, '{"type":1, "a":"some_other_val4"}'::json),
 (3, '{"type":1, "a":"some_other_val5"}'::json),
 (3, '{"type":2, "c":"some_other_val6"}'::json),
 (3, '{"type":3, "d":"some_other_val7"}'::json);
SELECT job.id, job.some_field 
FROM event 
LEFT JOIN job 
 ON event.job_id = job.id WHERE event.payload ->> 'type' = '2';

id | some_field
-: | :---------
 1 | val1 
 3 | val3

db<>fiddle here

Question 2

Hi @McNets, thank you for the suggestion. I'm going to fiddle something together.

Question 3

Do you need only to COUNT events of a type?

Question 4

I might need more complicated information from the related events. In the real data each event has a timestamp, and I might need the latest event time for each job, or the time difference between events of two different types, and so on. If that turns out to be too complicated or has really bad performance, I'd consider preprocessing the data in code and designing non-json tables. For the sake of learning, COUNT(a) is sufficient here.

Question 5

Could be this usefull?

Question 6

It is useful! Starting from your code I managed to get what I want fairly quickly. See here for the result. If you add your comment as an answer I'll happily accept it. Thanks!

Question 7

As per comments, you need a timestamp field and I've added it to your sample data. Now IMHO you should first unnest your jsonb data. I've used a SUM(CASE for this purpose and it calculates too the minimum timestamp value.

SELECT 
 event.job_id, 
 job.some_field,
 event.payload->>'type' AS type, 
 SUM(CASE WHEN event.payload->>'a' IS NOT NULL THEN 1 ELSE 0 END) AS a,
 SUM(CASE WHEN event.payload->>'b' IS NOT NULL THEN 1 ELSE 0 END) AS b,
 SUM(CASE WHEN event.payload->>'c' IS NOT NULL THEN 1 ELSE 0 END) AS c,
 SUM(CASE WHEN event.payload->>'d' IS NOT NULL THEN 1 ELSE 0 END) AS d,
 MIN((event.payload->>'ts'::text)::timestamp) as min_ts
FROM 
 event
JOIN
 job
 ON event.job_id = job.id
GROUP BY 
 event.job_id, 
 job.some_field,
 event.payload->>'type'
ORDER BY
 1, 2;

This is the result:

job_id | some_field | type | a | b | c | d | min_ts 
-----: | :--------- | :--- | -: | -: | -: | -: | :------------------
 1 | val1 | 1 | 1 | 0 | 0 | 0 | 2004年10月19日 10:23:54
 1 | val1 | 2 | 0 | 0 | 1 | 0 | 2002年10月19日 10:23:54
 2 | val2 | 1 | 2 | 0 | 0 | 0 | 2003年10月19日 10:23:54
 3 | val3 | 1 | 2 | 0 | 0 | 0 | 2004年10月19日 10:23:54
 3 | val3 | 2 | 0 | 0 | 1 | 0 | 2005年10月19日 10:23:54
 3 | val3 | 3 | 0 | 0 | 0 | 1 | 2006年10月19日 10:23:54

db<>fiddle here

Starting at this point you can easily filter it by type and get the desired result.

McNets McNets 24k11 gold badges51 silver badges90 bronze badges · Accepted Answer · 2019-07-31 10:07:02Z

As per comments, you need a timestamp field and I've added it to your sample data. Now IMHO you should first unnest your jsonb data. I've used a SUM(CASE for this purpose and it calculates too the minimum timestamp value.

SELECT 
 event.job_id, 
 job.some_field,
 event.payload->>'type' AS type, 
 SUM(CASE WHEN event.payload->>'a' IS NOT NULL THEN 1 ELSE 0 END) AS a,
 SUM(CASE WHEN event.payload->>'b' IS NOT NULL THEN 1 ELSE 0 END) AS b,
 SUM(CASE WHEN event.payload->>'c' IS NOT NULL THEN 1 ELSE 0 END) AS c,
 SUM(CASE WHEN event.payload->>'d' IS NOT NULL THEN 1 ELSE 0 END) AS d,
 MIN((event.payload->>'ts'::text)::timestamp) as min_ts
FROM 
 event
JOIN
 job
 ON event.job_id = job.id
GROUP BY 
 event.job_id, 
 job.some_field,
 event.payload->>'type'
ORDER BY
 1, 2;

This is the result:

job_id | some_field | type | a | b | c | d | min_ts 
-----: | :--------- | :--- | -: | -: | -: | -: | :------------------
 1 | val1 | 1 | 1 | 0 | 0 | 0 | 2004年10月19日 10:23:54
 1 | val1 | 2 | 0 | 0 | 1 | 0 | 2002年10月19日 10:23:54
 2 | val2 | 1 | 2 | 0 | 0 | 0 | 2003年10月19日 10:23:54
 3 | val3 | 1 | 2 | 0 | 0 | 0 | 2004年10月19日 10:23:54
 3 | val3 | 2 | 0 | 0 | 1 | 0 | 2005年10月19日 10:23:54
 3 | val3 | 3 | 0 | 0 | 0 | 1 | 2006年10月19日 10:23:54

db<>fiddle here

Starting at this point you can easily filter it by type and get the desired result.

Stack Exchange Network

Gather values from different related rows with JSONB fields

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Gather values from different related rows with JSONB fields

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions