SELECT message."id" AS "id"
, message."transactionId" AS "transactionId"
, "ids"
, message"createdAt" AS "createdAt"
FROM message as message
JOIN (
SELECT
"transactionId",
"createdAt",
array_agg(thisMessage.id) as "ids",
min(thisMessage.id::text)::uuid as "id"
FROM message as thisMessage
WHERE thisMessage."groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
GROUP BY "transactionId", "createdAt"
ORDER BY createdAt DESC
OFFSET 0 LIMIT 9
) as groupM
ON message."id" = groupM."id"
WHERE
message."groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
ORDER BY message."createdAt" DESC
OFFSET 0 LIMIT 3;
Here is the table
"id" | "groupId" | "transactionId" | "createdAt" |
---|---|---|---|
1 | a5086f93-7e62-47ea-9419-6e3800b2a15e | null | 2024年05月03日 14:45:46.881317 |
2 | a5086f93-7e62-47ea-9419-6e3800b2a15e | 20d1e6a7-5cca-48bf-8ac2-ba8fb9f38772 | 2024年05月03日 15:45:46.881317 |
3 | a5086f93-7e62-47ea-9419-6e3800b2a15e | 20d1e6a7-5cca-48bf-8ac2-ba8fb9f38772 | 2024年05月03日 15:45:46.881317 |
4 | a5086f93-7e62-47ea-9419-6e3800b2a15e | null | 2024年05月03日 17:45:46.881317 |
5 | a5086f93-7e62-47ea-9419-6e3800b2a15e | null | 2024年05月03日 18:45:46.881317 |
Desired result:
"id" | "transactionId" | "ids" | "createdAt" |
---|---|---|---|
5 | null | null | 2024年05月03日 18:45:46.881317 |
4 | null | null | 2024年05月03日 17:45:46.881317 |
2 | a5086f93-7e62-47ea-9419-6e3800b2a15e | {2, 3} | 2024年05月03日 15:45:46.881317 |
The main idea is to group rows with transactionId
and count groups as one row, but don't aggregate rows with transactionId IS NULL
. So from 2 rows with transactionId
and 2 rows with null, LIMIT 3
should return one grouped by transactionId
, and 2 with null.
But on large tables, it works very slow. How can I improve it?
1 Answer 1
Null values are treated as equal by GROUP BY
, so those rows would be aggregated, too. See:
- How do I (or can I) SELECT DISTINCT on multiple columns? - last chapter
Break out null cases to keep them separate, and only aggregate not-null cases. UNION ALL
the full 3 latest rows from each to be sure - meaning the sum of LIMIT
+ OFFSET
. (Doesn't add much cost for a small number.) Apply ORDER BY
, OFFSET
& LIMIT
in the outer SELECT
again:
SELECT * FROM
(
SELECT "transactionId", "createdAt"
, null::uuid[] AS ids
, id
FROM message
WHERE "groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
AND "transactionId" IS NULL
ORDER BY "createdAt" DESC, id -- added id tiebreaker
LIMIT 3 -- no OFFSET
)
UNION ALL
(
SELECT "transactionId", "createdAt"
, array_agg(id) AS ids
, any_value(id) AS id -- pg 16+, see below!
FROM message
WHERE "groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
AND "transactionId" IS NOT NULL
GROUP BY "transactionId", "createdAt"
ORDER BY "createdAt" DESC, id -- added id tiebreaker
LIMIT 3 -- no OFFSET
)
ORDER BY "createdAt" DESC, id -- added id tiebreaker
OFFSET 0
LIMIT 3;
To make it fast, have at least a multicolumn index on ("groupId", "createdAt") - with fields in this order.
Possibly another partial index for null cases if those are rare.
The odd construct min(id::text)::uuid AS id
indicates id
is actually type uuid
(which has no >
/ <
operators, hence doesn't allow min()
), and this is your way to pick one arbitrary value from the set.
Postgres 16 added the convenient aggregate function any_value()
for that task precisely. Fall back to your construct in older versions, or use first()
from the additional module first_last_agg. See:
"createdAt"
may not be unique, making the sort order non-deterministic and results arbitrary (possible alternating between calls). Add tiebreaker(s) to get deterministic results.
Aside:
Use legal, lower-case, unquoted identifiers if at all possible. See:
Explore related questions
See similar questions with these tags.
GROUP BY
springs to mind, but I can't be sure. Please go to dbfiddle.uk and provide a simplified fiddle with some sample data and your desired result.