Here is a fiddle for my question.
I have a simple table layout:
class
person: belongs to a class
I want to select all classes, and for each class, I want the first two person identifiers of the belonging persons sorted by descending name.
I solved this with the following query:
select c.identifier, array_agg(p.identifier order by p.name desc) as persons
from class as c
left join lateral (
select p.identifier, p.name
from person as p
where p.class_identifier = c.identifier
order by p.name desc
limit 2
) as p
on true
group by c.identifier
order by c.identifier
Note: I could have used a correlation subquery in the SELECT
clause, but I am trying to avoid that as part of a learning process.
As you can see, I am applying order by p.name desc
in two places:
- in the subquery
- in the aggregate function
Is there a way to avoid that? My train of thought:
First, obviously I cannot remove the
order by
in the subquery, as that would give a query which does not meet my requirement as stated above.Second, I think that the
order by
in the aggregate function cannot be left out, as row order of the subquery is not necessarily preserved in the aggregate function?
Should I rewrite the query?
2 Answers 2
I am applying
order by p.name desc
in two places ... Is there a way to avoid that?
Yes. Aggregate with an ARRAY constructor in the lateral subquery directly:
SELECT c.identifier, p.persons
FROM class c
CROSS JOIN LATERAL (
SELECT ARRAY (
SELECT identifier
FROM person
WHERE class_identifier = c.identifier
ORDER BY name DESC
LIMIT 2
) AS persons
) p
ORDER BY c.identifier;
You also don't need GROUP BY
in the outer SELECT
this way. Shorter, cleaner, faster.
I replaced the LEFT JOIN
with a plain CROSS JOIN
since the ARRAY constructor always returns exactly 1 row. (Like you pointed out in a comment.)
db<>fiddle here.
Related:
Order of rows in subqueries
To address your comment:
I learned that order of rows in a subquery is never guaranteed to be preserved in the outer query.
Well, no. While the SQL standard does not offer any guarantees, there are limited guarantees in Postgres. The manual:
This ordering is unspecified by default, but can be controlled by writing an
ORDER BY
clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work. For example:SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
Beware that this approach can fail if the outer query level contains additional processing, such as a join, because that might cause the subquery's output to be reordered before the aggregate is computed.
If all you do in the next level is to aggregate rows, the order is positively guaranteed. Any yes, what we feed to the ARRAY constructor is a subquery, too. That's not the point. It would work with array_agg()
as well:
SELECT c.identifier, p.persons
FROM class c
CROSS JOIN LATERAL (
SELECT array_agg(identifier) AS persons
FROM (
SELECT identifier
FROM person
WHERE class_identifier = c.identifier
ORDER BY name DESC
LIMIT 2
) sub
) p
ORDER BY c.identifier;
But I expect the ARRAY constructor to be faster for the case. See:
-
That's quite interesting. I learned that order of rows in a subquery is never guaranteed to be preserved in the outer query. So in this particular case, why is it to correct to assume that the rows in the innermost subquery are fed in the correct order to the
ARRAY(...)
construction?Jarius Hebzo– Jarius Hebzo2018年08月01日 06:59:16 +00:00Commented Aug 1, 2018 at 6:59 -
Answer my own question: this is not really a subquery (as in
SELECT ... FROM (SELECT ... FROM ...)
). This is aSELECT
on aSELECT
:SELECT ARRAY(SELECT ... FROM ...)
.Jarius Hebzo– Jarius Hebzo2018年08月01日 07:06:35 +00:00Commented Aug 1, 2018 at 7:06 -
1Related to this: dba.stackexchange.com/a/159717/157363Jarius Hebzo– Jarius Hebzo2018年08月01日 08:57:33 +00:00Commented Aug 1, 2018 at 8:57
-
1@JariusHebzo: I added a bit to address the issue of row order in subqueries.Erwin Brandstetter– Erwin Brandstetter2018年08月01日 17:24:18 +00:00Commented Aug 1, 2018 at 17:24
-
3Would it be correct to say that in both queries we can replace the
left lateral join
by justlateral join
? In the absence of persons, the first query returns an empty array, the second onenull
, right? This contradicts the last sentence of dba.stackexchange.com/questions/173831/…, but I think that information is wrong? I think we need to check onp.persons is not null
(in case of the first query) orp.persons != '{}'
(in case of the second query) to output only classes with at least one person?Jarius Hebzo– Jarius Hebzo2018年08月01日 18:26:11 +00:00Commented Aug 1, 2018 at 18:26
Here's an alternative, but it is not any better than what you already have:
with enumeration (class_identifier, identifier, name, n) as (
select p.class_identifier, p.identifier, p.name
, row_number() over (partition by p.class_identifier
order by p.name desc)
from person as p
)
select c.identifier, array_agg(e.identifier order by e.n) as persons
from class as c
left join enumeration e
on c.identifier = e.class_identifier
where e.n <= 2
group by c.identifier
order by c.identifier;
-
This is an interesting approach, thanks for that. I now understand your comment above, indeed, we always need two orderings, there is no way around that. I think this answers my question!Jarius Hebzo– Jarius Hebzo2018年07月30日 20:05:42 +00:00Commented Jul 30, 2018 at 20:05
-
I wonder how this will perform with a real database. The CTE
enumeration
will hold the completeperson
table, if I am not mistaken (due to the nature of CTE's being a memory barrier in PostgreSQL). It's basically an in-memory copy of theperson
table. This might not be ideal, as essentially we only need a few rows from that table (2 for each class to be precise). Maybe we should add aselect * from (...) where n <= 2
around the query inenumeration
(instead of in the main query)? This way, the CTEenumeration
does not longer contain the wholeperson
table.Jarius Hebzo– Jarius Hebzo2018年07月30日 20:08:47 +00:00Commented Jul 30, 2018 at 20:08 -
I hope this makes sense, I have troubles explaining it in this tiny box. I demonstrated it in this fiddle: dbfiddle.uk/…Jarius Hebzo– Jarius Hebzo2018年07月30日 20:18:41 +00:00Commented Jul 30, 2018 at 20:18
-
1It will probably be worse than your query. I added it just as a food for thougt since you seems to be investigating different techiquesLennart - Slava Ukraini– Lennart - Slava Ukraini2018年07月30日 20:23:42 +00:00Commented Jul 30, 2018 at 20:23
-
1I really appreciate that! I am indeed trying to expand my SQL knowledge by trying out different things, without really knowing what I am doing all the time. Your answers are very helpful!Jarius Hebzo– Jarius Hebzo2018年07月30日 20:26:13 +00:00Commented Jul 30, 2018 at 20:26
Explore related questions
See similar questions with these tags.
(identifier)
the primary key ofclass
?