Let's say we have the following query.
select
name,
pos,
rank() over (partition by constructor) r,
format('%s / %s',
row_number() over (partition by constructor),
count(*) over (partition by constructor)
) "pos/global"
from (values
('d1-c1', 1, 'c1'),
('d3-c1', 3, 'c1'),
('d3-c2', 3, 'c2'),
('d2-c1', 2, 'c1'),
('d2-c2', 2, 'c2')
) t(name, pos, constructor);
The output is the following:
name │ pos │ r │ pos/global
═══════╪═════╪═══╪════════════
d1-c1 │ 1 │ 1 │ 1 / 3
d3-c1 │ 3 │ 1 │ 2 / 3
d2-c1 │ 2 │ 1 │ 3 / 3
d3-c2 │ 3 │ 1 │ 1 / 2
d2-c2 │ 2 │ 1 │ 2 / 2
(5 rows)
(Optional question here: Why were all rows in every frame ranked as 1
?)
But when I change the ordering in the frame specification,
select
name,
pos,
rank() over (partition by constructor order by pos asc) r,
format('%s / %s',
row_number() over (partition by constructor),
count(*) over (partition by constructor)
) "pos/global"
from (values
('d1-c1', 1, 'c1'),
('d3-c1', 3, 'c1'),
('d3-c2', 3, 'c2'),
('d2-c1', 2, 'c1'),
('d2-c2', 2, 'c2')
) t(name, pos, constructor);
, I get this:
name │ pos │ r │ pos/global
═══════╪═════╪═══╪════════════
d1-c1 │ 1 │ 1 │ 1 / 3
d2-c1 │ 2 │ 2 │ 2 / 3
d3-c1 │ 3 │ 3 │ 3 / 3
d2-c2 │ 2 │ 1 │ 1 / 2
d3-c2 │ 3 │ 2 │ 2 / 2
(5 rows)
... which is kind of what I want, but I don't understand why the ordering of the frame affects the order of all the other rows. My intuition says that the output should've been something like this (only values in the r
column change):
name │ pos │ r │ pos/global
═══════╪═════╪═══╪════════════
d1-c1 │ 1 │ 1 │ 1 / 3
d3-c1 │ 3 │ 3 │ 2 / 3
d2-c1 │ 2 │ 2 │ 3 / 3
d3-c2 │ 3 │ 3 │ 1 / 2
d2-c2 │ 2 │ 2 │ 2 / 2
(5 rows)
1 Answer 1
The rank()
function ranks the rows by the specified order, you did not specify any and there is no inherent order so are rows are "equal" and all get 1
. In the second query you provide an ordering and it starts working as expected.
For the main question - if you do not specify any order-by for the query results (that you did not do in either one) then the order of result rows is undefined, the DB is allowed to return them in any order it pleases (it can differ between two executions of the same query). Usually the optimizer tries to do as little work as possible to satisfy the query.
- in the first case the original order of the
values
is kept because nothing needs it to be changed - in the second case the rows have to be ordered in some way to resolve the
rank()
and because no other order is required in that query, the optimizer decides to order thevalues
directly and use that for the output (as it still satisfies the "undefined" final ordering) - no idea if the "optimizer decides" is some kind of actual optimization or just a side effect of other interactions in the code and it does not actually matter :)