How to select 3 rows, using offset & limit, but merge rows with same ID

Question 1

SELECT message."id" AS "id"
 , message."transactionId" AS "transactionId"
 , "ids"
 , message"createdAt" AS "createdAt"
FROM message as message
 JOIN (
 SELECT
 "transactionId",
 "createdAt",
 array_agg(thisMessage.id) as "ids",
 min(thisMessage.id::text)::uuid as "id"
 FROM message as thisMessage
 WHERE thisMessage."groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
 GROUP BY "transactionId", "createdAt"
 ORDER BY createdAt DESC
 OFFSET 0 LIMIT 9
 ) as groupM
 ON message."id" = groupM."id"
WHERE
 message."groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
ORDER BY message."createdAt" DESC
OFFSET 0 LIMIT 3;

Here is the table

"id"	"groupId"	"transactionId"	"createdAt"
1	a5086f93-7e62-47ea-9419-6e3800b2a15e	null	2024年05月03日 14:45:46.881317
2	a5086f93-7e62-47ea-9419-6e3800b2a15e	20d1e6a7-5cca-48bf-8ac2-ba8fb9f38772	2024年05月03日 15:45:46.881317
3	a5086f93-7e62-47ea-9419-6e3800b2a15e	20d1e6a7-5cca-48bf-8ac2-ba8fb9f38772	2024年05月03日 15:45:46.881317
4	a5086f93-7e62-47ea-9419-6e3800b2a15e	null	2024年05月03日 17:45:46.881317
5	a5086f93-7e62-47ea-9419-6e3800b2a15e	null	2024年05月03日 18:45:46.881317

Desired result:

"id"	"transactionId"	"ids"	"createdAt"
5	null	null	2024年05月03日 18:45:46.881317
4	null	null	2024年05月03日 17:45:46.881317
2	a5086f93-7e62-47ea-9419-6e3800b2a15e	{2, 3}	2024年05月03日 15:45:46.881317

The main idea is to group rows with transactionId and count groups as one row, but don't aggregate rows with transactionId IS NULL. So from 2 rows with transactionId and 2 rows with null, LIMIT 3 should return one grouped by transactionId, and 2 with null.

But on large tables, it works very slow. How can I improve it?

Question 2

i can not see any column that is called batch id

Question 3

Sorry, I got the name wrong. I meant transactionId

Question 4

The query is too complex, even with sample data ot will be difficult, you should rewrite the subquery so that it groups only by transactionId but that is only a suggestion, without seeing the table and the data it is not really possible to do a suggestion

Question 5

Hi, and welcome to dba.se! Your query is quite complex and I'm not sure what you mean by "handle rows with same batchid as a single row"? GROUP BY springs to mind, but I can't be sure. Please go to dbfiddle.uk and provide a simplified fiddle with some sample data and your desired result.

Question 6

Null values are treated as equal by GROUP BY, so those rows would be aggregated, too. See:

How do I (or can I) SELECT DISTINCT on multiple columns? - last chapter

Break out null cases to keep them separate, and only aggregate not-null cases. UNION ALL the full 3 latest rows from each to be sure - meaning the sum of LIMIT + OFFSET. (Doesn't add much cost for a small number.) Apply ORDER BY, OFFSET & LIMIT in the outer SELECT again:

SELECT * FROM
(
SELECT "transactionId", "createdAt"
 , null::uuid[] AS ids
 , id
FROM message
WHERE "groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
AND "transactionId" IS NULL
ORDER BY "createdAt" DESC, id -- added id tiebreaker
LIMIT 3 -- no OFFSET
)
UNION ALL 
(
SELECT "transactionId", "createdAt"
 , array_agg(id) AS ids
 , any_value(id) AS id -- pg 16+, see below!
FROM message
WHERE "groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
AND "transactionId" IS NOT NULL
GROUP BY "transactionId", "createdAt"
ORDER BY "createdAt" DESC, id -- added id tiebreaker
LIMIT 3 -- no OFFSET
)
ORDER BY "createdAt" DESC, id -- added id tiebreaker
OFFSET 0
LIMIT 3;

To make it fast, have at least a multicolumn index on ("groupId", "createdAt") - with fields in this order.
Possibly another partial index for null cases if those are rare.

The odd construct min(id::text)::uuid AS id indicates id is actually type uuid (which has no > / < operators, hence doesn't allow min()), and this is your way to pick one arbitrary value from the set.
Postgres 16 added the convenient aggregate function any_value() for that task precisely. Fall back to your construct in older versions, or use first() from the additional module first_last_agg. See:

PostgreSQL equivalent to Oracle's ANY_VALUE(...) KEEP (DENSE_RANK FIRST/LAST ORDER BY ...)

"createdAt" may not be unique, making the sort order non-deterministic and results arbitrary (possible alternating between calls). Add tiebreaker(s) to get deterministic results.

Aside:
Use legal, lower-case, unquoted identifiers if at all possible. See:

Are PostgreSQL column names case-sensitive?

score 0 · Answer 1 · 2024-05-06 03:31:30Z

Null values are treated as equal by GROUP BY, so those rows would be aggregated, too. See:

How do I (or can I) SELECT DISTINCT on multiple columns? - last chapter

Break out null cases to keep them separate, and only aggregate not-null cases. UNION ALL the full 3 latest rows from each to be sure - meaning the sum of LIMIT + OFFSET. (Doesn't add much cost for a small number.) Apply ORDER BY, OFFSET & LIMIT in the outer SELECT again:

SELECT * FROM
(
SELECT "transactionId", "createdAt"
 , null::uuid[] AS ids
 , id
FROM message
WHERE "groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
AND "transactionId" IS NULL
ORDER BY "createdAt" DESC, id -- added id tiebreaker
LIMIT 3 -- no OFFSET
)
UNION ALL 
(
SELECT "transactionId", "createdAt"
 , array_agg(id) AS ids
 , any_value(id) AS id -- pg 16+, see below!
FROM message
WHERE "groupId" = 'a5086f93-7e62-47ea-9419-6e3800b2a15e'
AND "transactionId" IS NOT NULL
GROUP BY "transactionId", "createdAt"
ORDER BY "createdAt" DESC, id -- added id tiebreaker
LIMIT 3 -- no OFFSET
)
ORDER BY "createdAt" DESC, id -- added id tiebreaker
OFFSET 0
LIMIT 3;

To make it fast, have at least a multicolumn index on ("groupId", "createdAt") - with fields in this order.
Possibly another partial index for null cases if those are rare.

The odd construct min(id::text)::uuid AS id indicates id is actually type uuid (which has no > / < operators, hence doesn't allow min()), and this is your way to pick one arbitrary value from the set.
Postgres 16 added the convenient aggregate function any_value() for that task precisely. Fall back to your construct in older versions, or use first() from the additional module first_last_agg. See:

PostgreSQL equivalent to Oracle's ANY_VALUE(...) KEEP (DENSE_RANK FIRST/LAST ORDER BY ...)

"createdAt" may not be unique, making the sort order non-deterministic and results arbitrary (possible alternating between calls). Add tiebreaker(s) to get deterministic results.

Aside:
Use legal, lower-case, unquoted identifiers if at all possible. See:

Are PostgreSQL column names case-sensitive?

Stack Exchange Network

How to select 3 rows, using offset & limit, but merge rows with same ID

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

How to select 3 rows, using offset & limit, but merge rows with same ID

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions