1

Let's say we have the following query.

select
 name,
 pos,
 rank() over (partition by constructor) r,
 format('%s / %s',
 row_number() over (partition by constructor),
 count(*) over (partition by constructor)
 ) "pos/global"
from (values
 ('d1-c1', 1, 'c1'),
 ('d3-c1', 3, 'c1'),
 ('d3-c2', 3, 'c2'),
 ('d2-c1', 2, 'c1'),
 ('d2-c2', 2, 'c2')
 ) t(name, pos, constructor);

The output is the following:

 name │ pos │ r │ pos/global
═══════╪═════╪═══╪════════════
 d1-c1 │ 1 │ 1 │ 1 / 3
 d3-c1 │ 3 │ 1 │ 2 / 3
 d2-c1 │ 2 │ 1 │ 3 / 3
 d3-c2 │ 3 │ 1 │ 1 / 2
 d2-c2 │ 2 │ 1 │ 2 / 2
(5 rows)

(Optional question here: Why were all rows in every frame ranked as 1?)

But when I change the ordering in the frame specification,

select
 name,
 pos,
 rank() over (partition by constructor order by pos asc) r,
 format('%s / %s',
 row_number() over (partition by constructor),
 count(*) over (partition by constructor)
 ) "pos/global"
from (values
 ('d1-c1', 1, 'c1'),
 ('d3-c1', 3, 'c1'),
 ('d3-c2', 3, 'c2'),
 ('d2-c1', 2, 'c1'),
 ('d2-c2', 2, 'c2')
 ) t(name, pos, constructor);

, I get this:

 name │ pos │ r │ pos/global
═══════╪═════╪═══╪════════════
 d1-c1 │ 1 │ 1 │ 1 / 3
 d2-c1 │ 2 │ 2 │ 2 / 3
 d3-c1 │ 3 │ 3 │ 3 / 3
 d2-c2 │ 2 │ 1 │ 1 / 2
 d3-c2 │ 3 │ 2 │ 2 / 2
(5 rows)

... which is kind of what I want, but I don't understand why the ordering of the frame affects the order of all the other rows. My intuition says that the output should've been something like this (only values in the r column change):

 name │ pos │ r │ pos/global
═══════╪═════╪═══╪════════════
 d1-c1 │ 1 │ 1 │ 1 / 3
 d3-c1 │ 3 │ 3 │ 2 / 3
 d2-c1 │ 2 │ 2 │ 3 / 3
 d3-c2 │ 3 │ 3 │ 1 / 2
 d2-c2 │ 2 │ 2 │ 2 / 2
(5 rows)
asked Jan 22, 2023 at 22:17

1 Answer 1

7

The rank() function ranks the rows by the specified order, you did not specify any and there is no inherent order so are rows are "equal" and all get 1. In the second query you provide an ordering and it starts working as expected.

For the main question - if you do not specify any order-by for the query results (that you did not do in either one) then the order of result rows is undefined, the DB is allowed to return them in any order it pleases (it can differ between two executions of the same query). Usually the optimizer tries to do as little work as possible to satisfy the query.

  • in the first case the original order of the values is kept because nothing needs it to be changed
  • in the second case the rows have to be ordered in some way to resolve the rank() and because no other order is required in that query, the optimizer decides to order the values directly and use that for the output (as it still satisfies the "undefined" final ordering) - no idea if the "optimizer decides" is some kind of actual optimization or just a side effect of other interactions in the code and it does not actually matter :)
answered Jan 22, 2023 at 22:56

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.