select row number increment on value change compared to previous return result

Question 1

I'm experimenting with Postgresql queries a bit and am trying to figure out if it's possible to write a query that would increment a row number counter when returned value changes.

Example SOURCE (where SOURCE can be a sorted/unsorted sub-query/table):

X|Y
---
0|0
0|1
0|0
1|0
1|0
1|1
2|0

Example increment on X: select X, Y, wishful_row_number(X) as Rn from SOURCE;

0,0,1
0,1,1
0,0,1
1,0,2
1,0,2
1,1,2
2,0,3

(row number changes every time X changes)

Example increment on Y: select X, Y, wishful_row_number(Y) as Rn from SOURCE;

0,0,1
0,1,2
0,0,3
1,0,3
1,0,3
1,1,4
2,0,5

(row number changes every time Y changes - goes to a bigger number/different string even if that value was already seen before)

So the Rn increment is not dependent on any "order by" or source sorting but just on the previously returned row.

Can such a query be written (without store procedures and if possible without temporary tables)?

EDIT: And yes, I know tables have no inherent ordering which is exactly the point of why I am trying to skip the "force me to order by" step and bind the Rn row generation on returned values (see my reply comment below).

And yes, I understand that something like that might not be possible in SQL based on the "why would somebody in their right mind want to do that???" but as far as I'm concerned that's the SQL limitation and the answer is "no, it can't be done" even if some people have the fetish of down voting my question.

Question 2

Tables have no inherent ordering, so how do you differentiate between in one row and the next? Unless you have some date or ID column it's not going to work. In other words, how do you know whether the rows should be sorted 0,0 0,1 0,0 or 0,1 0,0 0,0?

Question 3

That's exactly the point... All that queries I'm trying to use force me to order first while I don't care about the order - just want to add an extra counting row based on "whatever the ordering is". So if there was no ordering set, I'd essentially get random grouping generator. And if the sub-queries already specified "order by", the grouping is the same as adding "INNER JOIN (SELECT Y, ROW_NUMBER() OVER (ORDER BY 0) - 1 AS Rn FROM some_table GROUP BY Y) USING(Y)".

Question 4

The dense_rank() window function should be what you need:

WITH tab(x, y) AS (VALUES (0, 0), (0, 1), (0, 0), (1, 0), (1, 0), (1, 1), (2, 0))
SELECT x, y,
 dense_rank() OVER (ORDER BY x)
FROM tab;
 x │ y │ dense_rank 
═══╪═══╪════════════
 0 │ 0 │ 1
 0 │ 1 │ 1
 0 │ 0 │ 1
 1 │ 0 │ 2
 1 │ 0 │ 2
 1 │ 1 │ 2
 2 │ 0 │ 3
(7 rows)

Question 5

This is a classic gaps-and-islands problem.

A typical solution is to use LAG to check for the previous row value, then a running filtered COUNT to create the numbering.

WITH YourTable(x, y) AS
(
 VALUES (0, 0), (0, 1), (0, 0), (1, 0), (1, 0), (1, 1), (2, 0)
),
Lagged AS
(
 SELECT *,
 LAG(Y) OVER (ORDER BY X) AS PrevY
 FROM YourTable
)
SELECT
 X,
 Y,
 COUNT(*) FILTER (WHERE PrevY <> Y OR PrevY IS NULL)
 OVER (PARTITION BY X ORDER BY (SELECT NULL) ROWS UNBOUNDED PRECEDING) AS rn
FROM Lagged;

Your ordering, as you have noted, is non-deterministic, and you may get different results each time.

Use a deterministic sort (you probably need another column, such as ID or date). For example:

 LAG(Y) OVER (ORDER BY X, Id) AS PrevY

 COUNT(*) FILTER (WHERE PrevY <> Y OR PrevY IS NULL)
 OVER (PARTITION BY X ORDER BY Id ROWS UNBOUNDED PRECEDING) AS rn

Laurenz Albe Laurenz Albe 62k4 gold badges57 silver badges93 bronze badges · Answer 1 · 2024-03-11 07:02:28Z

The dense_rank() window function should be what you need:

WITH tab(x, y) AS (VALUES (0, 0), (0, 1), (0, 0), (1, 0), (1, 0), (1, 1), (2, 0))
SELECT x, y,
 dense_rank() OVER (ORDER BY x)
FROM tab;
 x │ y │ dense_rank 
═══╪═══╪════════════
 0 │ 0 │ 1
 0 │ 1 │ 1
 0 │ 0 │ 1
 1 │ 0 │ 2
 1 │ 0 │ 2
 1 │ 1 │ 2
 2 │ 0 │ 3
(7 rows)

Charlieface Charlieface 17.7k22 silver badges45 bronze badges · Answer 2 · 2024-03-11 10:41:57Z

This is a classic gaps-and-islands problem.

A typical solution is to use LAG to check for the previous row value, then a running filtered COUNT to create the numbering.

WITH YourTable(x, y) AS
(
 VALUES (0, 0), (0, 1), (0, 0), (1, 0), (1, 0), (1, 1), (2, 0)
),
Lagged AS
(
 SELECT *,
 LAG(Y) OVER (ORDER BY X) AS PrevY
 FROM YourTable
)
SELECT
 X,
 Y,
 COUNT(*) FILTER (WHERE PrevY <> Y OR PrevY IS NULL)
 OVER (PARTITION BY X ORDER BY (SELECT NULL) ROWS UNBOUNDED PRECEDING) AS rn
FROM Lagged;

Your ordering, as you have noted, is non-deterministic, and you may get different results each time.

Use a deterministic sort (you probably need another column, such as ID or date). For example:

 LAG(Y) OVER (ORDER BY X, Id) AS PrevY

 COUNT(*) FILTER (WHERE PrevY <> Y OR PrevY IS NULL)
 OVER (PARTITION BY X ORDER BY Id ROWS UNBOUNDED PRECEDING) AS rn

Stack Exchange Network

select row number increment on value change compared to previous return result

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

select row number increment on value change compared to previous return result

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions