Why do PostgreSQL WINDOW functions affect the final ordering of all rows in the select?

Question 1

Let's say we have the following query.

select
 name,
 pos,
 rank() over (partition by constructor) r,
 format('%s / %s',
 row_number() over (partition by constructor),
 count(*) over (partition by constructor)
 ) "pos/global"
from (values
 ('d1-c1', 1, 'c1'),
 ('d3-c1', 3, 'c1'),
 ('d3-c2', 3, 'c2'),
 ('d2-c1', 2, 'c1'),
 ('d2-c2', 2, 'c2')
 ) t(name, pos, constructor);

The output is the following:

 name │ pos │ r │ pos/global
═══════╪═════╪═══╪════════════
 d1-c1 │ 1 │ 1 │ 1 / 3
 d3-c1 │ 3 │ 1 │ 2 / 3
 d2-c1 │ 2 │ 1 │ 3 / 3
 d3-c2 │ 3 │ 1 │ 1 / 2
 d2-c2 │ 2 │ 1 │ 2 / 2
(5 rows)

(Optional question here: Why were all rows in every frame ranked as 1?)

But when I change the ordering in the frame specification,

select
 name,
 pos,
 rank() over (partition by constructor order by pos asc) r,
 format('%s / %s',
 row_number() over (partition by constructor),
 count(*) over (partition by constructor)
 ) "pos/global"
from (values
 ('d1-c1', 1, 'c1'),
 ('d3-c1', 3, 'c1'),
 ('d3-c2', 3, 'c2'),
 ('d2-c1', 2, 'c1'),
 ('d2-c2', 2, 'c2')
 ) t(name, pos, constructor);

, I get this:

 name │ pos │ r │ pos/global
═══════╪═════╪═══╪════════════
 d1-c1 │ 1 │ 1 │ 1 / 3
 d2-c1 │ 2 │ 2 │ 2 / 3
 d3-c1 │ 3 │ 3 │ 3 / 3
 d2-c2 │ 2 │ 1 │ 1 / 2
 d3-c2 │ 3 │ 2 │ 2 / 2
(5 rows)

... which is kind of what I want, but I don't understand why the ordering of the frame affects the order of all the other rows. My intuition says that the output should've been something like this (only values in the r column change):

 name │ pos │ r │ pos/global
═══════╪═════╪═══╪════════════
 d1-c1 │ 1 │ 1 │ 1 / 3
 d3-c1 │ 3 │ 3 │ 2 / 3
 d2-c1 │ 2 │ 2 │ 3 / 3
 d3-c2 │ 3 │ 3 │ 1 / 2
 d2-c2 │ 2 │ 2 │ 2 / 2
(5 rows)

Question 2

The rank() function ranks the rows by the specified order, you did not specify any and there is no inherent order so are rows are "equal" and all get 1. In the second query you provide an ordering and it starts working as expected.

For the main question - if you do not specify any order-by for the query results (that you did not do in either one) then the order of result rows is undefined, the DB is allowed to return them in any order it pleases (it can differ between two executions of the same query). Usually the optimizer tries to do as little work as possible to satisfy the query.

in the first case the original order of the values is kept because nothing needs it to be changed
in the second case the rows have to be ordered in some way to resolve the rank() and because no other order is required in that query, the optimizer decides to order the values directly and use that for the output (as it still satisfies the "undefined" final ordering) - no idea if the "optimizer decides" is some kind of actual optimization or just a side effect of other interactions in the code and it does not actually matter :)

jkavalik jkavalik 5,2591 gold badge15 silver badges20 bronze badges · Accepted Answer · 2023-01-22 22:56:17Z

The rank() function ranks the rows by the specified order, you did not specify any and there is no inherent order so are rows are "equal" and all get 1. In the second query you provide an ordering and it starts working as expected.

For the main question - if you do not specify any order-by for the query results (that you did not do in either one) then the order of result rows is undefined, the DB is allowed to return them in any order it pleases (it can differ between two executions of the same query). Usually the optimizer tries to do as little work as possible to satisfy the query.

in the first case the original order of the values is kept because nothing needs it to be changed
in the second case the rows have to be ordered in some way to resolve the rank() and because no other order is required in that query, the optimizer decides to order the values directly and use that for the output (as it still satisfies the "undefined" final ordering) - no idea if the "optimizer decides" is some kind of actual optimization or just a side effect of other interactions in the code and it does not actually matter :)

Stack Exchange Network

Why do PostgreSQL WINDOW functions affect the final ordering of all rows in the select?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Why do PostgreSQL WINDOW functions affect the final ordering of all rows in the select?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions