0

I'm experimenting with Postgresql queries a bit and am trying to figure out if it's possible to write a query that would increment a row number counter when returned value changes.

Example SOURCE (where SOURCE can be a sorted/unsorted sub-query/table):

X|Y
---
0|0
0|1
0|0
1|0
1|0
1|1
2|0

Example increment on X: select X, Y, wishful_row_number(X) as Rn from SOURCE;

0,0,1
0,1,1
0,0,1
1,0,2
1,0,2
1,1,2
2,0,3

(row number changes every time X changes)

Example increment on Y: select X, Y, wishful_row_number(Y) as Rn from SOURCE;

0,0,1
0,1,2
0,0,3
1,0,3
1,0,3
1,1,4
2,0,5

(row number changes every time Y changes - goes to a bigger number/different string even if that value was already seen before)

So the Rn increment is not dependent on any "order by" or source sorting but just on the previously returned row.

Can such a query be written (without store procedures and if possible without temporary tables)?

EDIT: And yes, I know tables have no inherent ordering which is exactly the point of why I am trying to skip the "force me to order by" step and bind the Rn row generation on returned values (see my reply comment below).

And yes, I understand that something like that might not be possible in SQL based on the "why would somebody in their right mind want to do that???" but as far as I'm concerned that's the SQL limitation and the answer is "no, it can't be done" even if some people have the fetish of down voting my question.

Charlieface
17.7k22 silver badges45 bronze badges
asked Mar 10, 2024 at 23:54
2
  • Tables have no inherent ordering, so how do you differentiate between in one row and the next? Unless you have some date or ID column it's not going to work. In other words, how do you know whether the rows should be sorted 0,0 0,1 0,0 or 0,1 0,0 0,0? Commented Mar 11, 2024 at 2:44
  • That's exactly the point... All that queries I'm trying to use force me to order first while I don't care about the order - just want to add an extra counting row based on "whatever the ordering is". So if there was no ordering set, I'd essentially get random grouping generator. And if the sub-queries already specified "order by", the grouping is the same as adding "INNER JOIN (SELECT Y, ROW_NUMBER() OVER (ORDER BY 0) - 1 AS Rn FROM some_table GROUP BY Y) USING(Y)". Commented Mar 11, 2024 at 5:56

2 Answers 2

1

The dense_rank() window function should be what you need:

WITH tab(x, y) AS (VALUES (0, 0), (0, 1), (0, 0), (1, 0), (1, 0), (1, 1), (2, 0))
SELECT x, y,
 dense_rank() OVER (ORDER BY x)
FROM tab;
 x │ y │ dense_rank 
═══╪═══╪════════════
 0 │ 0 │ 1
 0 │ 1 │ 1
 0 │ 0 │ 1
 1 │ 0 │ 2
 1 │ 0 │ 2
 1 │ 1 │ 2
 2 │ 0 │ 3
(7 rows)
answered Mar 11, 2024 at 7:02
0

This is a classic gaps-and-islands problem.

A typical solution is to use LAG to check for the previous row value, then a running filtered COUNT to create the numbering.

WITH YourTable(x, y) AS
(
 VALUES (0, 0), (0, 1), (0, 0), (1, 0), (1, 0), (1, 1), (2, 0)
),
Lagged AS
(
 SELECT *,
 LAG(Y) OVER (ORDER BY X) AS PrevY
 FROM YourTable
)
SELECT
 X,
 Y,
 COUNT(*) FILTER (WHERE PrevY <> Y OR PrevY IS NULL)
 OVER (PARTITION BY X ORDER BY (SELECT NULL) ROWS UNBOUNDED PRECEDING) AS rn
FROM Lagged;

Your ordering, as you have noted, is non-deterministic, and you may get different results each time.

Use a deterministic sort (you probably need another column, such as ID or date). For example:

 LAG(Y) OVER (ORDER BY X, Id) AS PrevY
 COUNT(*) FILTER (WHERE PrevY <> Y OR PrevY IS NULL)
 OVER (PARTITION BY X ORDER BY Id ROWS UNBOUNDED PRECEDING) AS rn
answered Mar 11, 2024 at 10:41

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.