2

Say I have the following table structure:

jobs
+----+------------+
| id | some_field |
+----+------------+
| 1 | some_val |
| 2 | some_val |
| 3 | some_val |
+----+------------+
events
+----+--------+----------------------------------------+
| id | job_id | payload (JSONB) |
+----+--------+----------------------------------------+
| 1 | 1 | {'type':1, 'a':'some_val', ...} |
| 2 | 1 | {'type':2, 'c':'some_other_val1', ...} |
| 3 | 2 | {'type':1, 'a':'some_other_val2', ...} |
| 4 | 2 | {'type':1, 'a':'some_other_val3', ...} |
| 5 | 3 | {'type':1, 'a':'some_other_val4', ...} |
| 6 | 3 | {'type':1, 'a':'some_other_val5', ...} |
| 7 | 3 | {'type':2, 'c':'some_other_val6', ...} |
| 8 | 3 | {'type':3, 'd':'some_other_val7', ...} |
+----+--------+----------------------------------------+

I want to create a query/view that ...

  • ... selects one row for each job IF it has an event with payload->>'type' = 2
  • ... in that row, shows values from other events that are associated with this job.
  • ... in that row, shows values from other events that are associated with this job if that event type occurs multiple times.

For example, with the above data I'd like to select all jobs that have an event of type 2, and find out the value of the d field from events with type 3 and some aggregate (say, COUNT()) of the a field from events with type 1. The result would be:

Results
+--------+------------------+----------+
| job_id | d | COUNT(a) |
+--------+------------------+----------+
| 1 | NULL | 1 |
| 3 | some_other_val7 | 2 |
+--------+------------------+----------+

As a bonus: Each job can have only one event of type 2, so the basic query can be structured like:

SELECT
 job.id
FROM
 event
LEFT JOIN 
 job ON event.job_id = job.id
WHERE
 event.payload ->> 'type' = 2;

How can I get the result table? Do I need to do sub queries for each field/event type, which would result in relatively bad performance, or can I use some GROUP BY job_id on the event table and maybe join the JSONs? (Note that my data is more complex and has many more fields in the payload that I would like to select).


Here is a DB Fiddle with the example data and my test query:

CREATE TABLE job (
 id SERIAL PRIMARY KEY,
 some_field TEXT
);
CREATE TABLE event (
 id SERIAL PRIMARY KEY,
 job_id INTEGER,
 payload JSONB
);
INSERT INTO job (some_field) VALUES ('val1'), ('val2'), ('val3');
INSERT INTO event (job_id, payload) VALUES 
 (1, '{"type":1, "a":"some_val"}'::json),
 (1, '{"type":2, "c":"some_other_val1"}'::json),
 (2, '{"type":1, "a":"some_other_val2"}'::json),
 (2, '{"type":1, "a":"some_other_val3"}'::json),
 (3, '{"type":1, "a":"some_other_val4"}'::json),
 (3, '{"type":1, "a":"some_other_val5"}'::json),
 (3, '{"type":2, "c":"some_other_val6"}'::json),
 (3, '{"type":3, "d":"some_other_val7"}'::json);
SELECT job.id, job.some_field 
FROM event 
LEFT JOIN job 
 ON event.job_id = job.id WHERE event.payload ->> 'type' = '2';
id | some_field
-: | :---------
 1 | val1 
 3 | val3 

db<>fiddle here

McNets
24k11 gold badges51 silver badges90 bronze badges
asked Jul 31, 2019 at 7:02
5
  • Hi @McNets, thank you for the suggestion. I'm going to fiddle something together. Commented Jul 31, 2019 at 7:15
  • Do you need only to COUNT events of a type? Commented Jul 31, 2019 at 7:35
  • I might need more complicated information from the related events. In the real data each event has a timestamp, and I might need the latest event time for each job, or the time difference between events of two different types, and so on. If that turns out to be too complicated or has really bad performance, I'd consider preprocessing the data in code and designing non-json tables. For the sake of learning, COUNT(a) is sufficient here. Commented Jul 31, 2019 at 7:39
  • Could be this usefull? Commented Jul 31, 2019 at 8:59
  • It is useful! Starting from your code I managed to get what I want fairly quickly. See here for the result. If you add your comment as an answer I'll happily accept it. Thanks! Commented Jul 31, 2019 at 9:35

1 Answer 1

1

As per comments, you need a timestamp field and I've added it to your sample data. Now IMHO you should first unnest your jsonb data. I've used a SUM(CASE for this purpose and it calculates too the minimum timestamp value.

SELECT 
 event.job_id, 
 job.some_field,
 event.payload->>'type' AS type, 
 SUM(CASE WHEN event.payload->>'a' IS NOT NULL THEN 1 ELSE 0 END) AS a,
 SUM(CASE WHEN event.payload->>'b' IS NOT NULL THEN 1 ELSE 0 END) AS b,
 SUM(CASE WHEN event.payload->>'c' IS NOT NULL THEN 1 ELSE 0 END) AS c,
 SUM(CASE WHEN event.payload->>'d' IS NOT NULL THEN 1 ELSE 0 END) AS d,
 MIN((event.payload->>'ts'::text)::timestamp) as min_ts
FROM 
 event
JOIN
 job
 ON event.job_id = job.id
GROUP BY 
 event.job_id, 
 job.some_field,
 event.payload->>'type'
ORDER BY
 1, 2;

This is the result:

job_id | some_field | type | a | b | c | d | min_ts 
-----: | :--------- | :--- | -: | -: | -: | -: | :------------------
 1 | val1 | 1 | 1 | 0 | 0 | 0 | 2004年10月19日 10:23:54
 1 | val1 | 2 | 0 | 0 | 1 | 0 | 2002年10月19日 10:23:54
 2 | val2 | 1 | 2 | 0 | 0 | 0 | 2003年10月19日 10:23:54
 3 | val3 | 1 | 2 | 0 | 0 | 0 | 2004年10月19日 10:23:54
 3 | val3 | 2 | 0 | 0 | 1 | 0 | 2005年10月19日 10:23:54
 3 | val3 | 3 | 0 | 0 | 0 | 1 | 2006年10月19日 10:23:54

db<>fiddle here

Starting at this point you can easily filter it by type and get the desired result.

answered Jul 31, 2019 at 10:07

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.