0
SELECT message."id" AS "id"
 , message."transactionId" AS "transactionId"
 , "ids"
 , message"createdAt" AS "createdAt"
FROM message as message
 JOIN (
 SELECT
 "transactionId",
 "createdAt",
 array_agg(thisMessage.id) as "ids",
 min(thisMessage.id::text)::uuid as "id"
 FROM message as thisMessage
 WHERE thisMessage."groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
 GROUP BY "transactionId", "createdAt"
 ORDER BY createdAt DESC
 OFFSET 0 LIMIT 9
 ) as groupM
 ON message."id" = groupM."id"
WHERE
 message."groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
ORDER BY message."createdAt" DESC
OFFSET 0 LIMIT 3;

Here is the table

"id" "groupId" "transactionId" "createdAt"
1 a5086f93-7e62-47ea-9419-6e3800b2a15e null 2024年05月03日 14:45:46.881317
2 a5086f93-7e62-47ea-9419-6e3800b2a15e 20d1e6a7-5cca-48bf-8ac2-ba8fb9f38772 2024年05月03日 15:45:46.881317
3 a5086f93-7e62-47ea-9419-6e3800b2a15e 20d1e6a7-5cca-48bf-8ac2-ba8fb9f38772 2024年05月03日 15:45:46.881317
4 a5086f93-7e62-47ea-9419-6e3800b2a15e null 2024年05月03日 17:45:46.881317
5 a5086f93-7e62-47ea-9419-6e3800b2a15e null 2024年05月03日 18:45:46.881317

Desired result:

"id" "transactionId" "ids" "createdAt"
5 null null 2024年05月03日 18:45:46.881317
4 null null 2024年05月03日 17:45:46.881317
2 a5086f93-7e62-47ea-9419-6e3800b2a15e {2, 3} 2024年05月03日 15:45:46.881317

The main idea is to group rows with transactionId and count groups as one row, but don't aggregate rows with transactionId IS NULL. So from 2 rows with transactionId and 2 rows with null, LIMIT 3 should return one grouped by transactionId, and 2 with null.

But on large tables, it works very slow. How can I improve it?

Erwin Brandstetter
186k28 gold badges464 silver badges636 bronze badges
asked May 4, 2024 at 13:09
4
  • i can not see any column that is called batch id Commented May 4, 2024 at 21:44
  • Sorry, I got the name wrong. I meant transactionId Commented May 4, 2024 at 23:13
  • The query is too complex, even with sample data ot will be difficult, you should rewrite the subquery so that it groups only by transactionId but that is only a suggestion, without seeing the table and the data it is not really possible to do a suggestion Commented May 4, 2024 at 23:24
  • Hi, and welcome to dba.se! Your query is quite complex and I'm not sure what you mean by "handle rows with same batchid as a single row"? GROUP BY springs to mind, but I can't be sure. Please go to dbfiddle.uk and provide a simplified fiddle with some sample data and your desired result. Commented May 5, 2024 at 9:40

1 Answer 1

0

Null values are treated as equal by GROUP BY, so those rows would be aggregated, too. See:

Break out null cases to keep them separate, and only aggregate not-null cases. UNION ALL the full 3 latest rows from each to be sure - meaning the sum of LIMIT + OFFSET. (Doesn't add much cost for a small number.) Apply ORDER BY, OFFSET & LIMIT in the outer SELECT again:

SELECT * FROM
(
SELECT "transactionId", "createdAt"
 , null::uuid[] AS ids
 , id
FROM message
WHERE "groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
AND "transactionId" IS NULL
ORDER BY "createdAt" DESC, id -- added id tiebreaker
LIMIT 3 -- no OFFSET
)
UNION ALL 
(
SELECT "transactionId", "createdAt"
 , array_agg(id) AS ids
 , any_value(id) AS id -- pg 16+, see below!
FROM message
WHERE "groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
AND "transactionId" IS NOT NULL
GROUP BY "transactionId", "createdAt"
ORDER BY "createdAt" DESC, id -- added id tiebreaker
LIMIT 3 -- no OFFSET
)
ORDER BY "createdAt" DESC, id -- added id tiebreaker
OFFSET 0
LIMIT 3;

To make it fast, have at least a multicolumn index on ("groupId", "createdAt") - with fields in this order.
Possibly another partial index for null cases if those are rare.

The odd construct min(id::text)::uuid AS id indicates id is actually type uuid (which has no > / < operators, hence doesn't allow min()), and this is your way to pick one arbitrary value from the set.
Postgres 16 added the convenient aggregate function any_value() for that task precisely. Fall back to your construct in older versions, or use first() from the additional module first_last_agg. See:

"createdAt" may not be unique, making the sort order non-deterministic and results arbitrary (possible alternating between calls). Add tiebreaker(s) to get deterministic results.

Aside:
Use legal, lower-case, unquoted identifiers if at all possible. See:

answered May 6, 2024 at 3:31

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.