I have the following notifications table in postgresql:
- notification_id, bigint
- group_key, character varying(24)
- associated, jsonb object
- timestamp
The table now looks like this
+------------------------------------------------------+
| notification_id | group_key | associated | timestamp |
+------------------------------------------------------+
| 1 | key1 | {123} | ... |
| 2 | key1 | {456} | ... |
| 3 | key2 | {789} | ... |
+------------------------------------------------------+
The following query gives the following results:
SELECT
a.group_key,
MAX(a.notification_id) as max,
COUNT(a.notification_id) as total
FROM
data.notifications a
GROUP BY
a.group_key
ORDER BY
MAX(a.notification_id) DESC
+-------------------------+
| group_key | max | total |
+-------------------------+
| key1 | 2 | 2 |
| key2 | 3 | 1 |
+-------------------------+
Explanation: the notifications with ID 1 and 2 belong to the same notification group (key1). I want to fetch the max notification ID for the group (in this case 2), the number of notifications in the group (in this case also 2).
What I want now is also the content of the column 'associated' of the last notification, so in this case from notification with ID 2 (so in this case '{456}').
This column contains a jsonb object with various keys. I want to do joins from other tables with those keys, but I don't know how to get the last JSONB object for each notification group.
I tried using LAST_VALUES
but I constantly get errors in pgadmin, saying that the column "associated" should be in the group by clause, but when I do this, I don't get the correct resultset returned.
SELECT
a.group_key,
MAX(a.notification_id) as max,
COUNT(a.notification_id) as total,
LAST_VALUE(a.associated) OVER (PARTITION BY a.group_key ORDER BY MAX(a.notification_id) DESC) as associated
FROM
data.notifications a
GROUP BY
a.group_key
ORDER BY
MAX(a.notification_id) DESC
1 Answer 1
This is a classic case where using the ROW_NUMBER()
window function can be used to get the latest row within a grouping (PARTITION
) like so:
WITH CTE_Notifications_Sorted AS
(
SELECT notification_id, group_key, associated, timestamp,
COUNT(notification_id) OVER (PARTITION BY group_key) AS notification_id_count,
ROW_NUMBER() OVER (PARTITION BY group_key ORDER BY notification_id DESC) AS PartitionSortId -- Generates a unique ID for each row within the grouping (partition) of group_key ordered by the notification_id descending
FROM data.notifications
)
SELECT group_key, notification_id AS notification_id_max, notification_id_count AS total, associated, timestamp
FROM CTE_Notifications_Sorted
WHERE PartitionSortId = 1 -- Filter out everything but the latest row of each group_key partition
By using the COUNT()
function in a window function as well in the above CTE, you're able to remove your GROUP BY
clause in the final SELECT
.
Now that you have the associated
column for the correct row, if you need to parse out a specific value you can follow this StackOverflow answer or leverage one of these JSON Functions and Operators.
-
1Thank you for your quick answer. I'll giv it a try!Sam Leurs– Sam Leurs2021年02月12日 14:16:50 +00:00Commented Feb 12, 2021 at 14:16
-
@yesterday Sorry I got pulled away while writing it up, I'm going to add more details because obviously the answer doesn't have the aggregation functions you were doing, but those can be added in as well. Also I think you'll want to use the
LAST_VALUE
function on top of the JSONB column in the final select.J.D.– J.D.2021年02月12日 14:19:46 +00:00Commented Feb 12, 2021 at 14:19 -
@Yesterday Ok I updated my answer. I looked a little closer and saw what you were trying to do. I believe my answer should give you what you need. I used
COUNT()
as a window function also that way you don't need to do any grouping in your final select. (Also note, instead of the column namemax
(which is a reserved keyword) for the maxnotification_id
, I called itnotification_id_max
.)J.D.– J.D.2021年02月12日 15:02:45 +00:00Commented Feb 12, 2021 at 15:02