I'm experimenting with Postgresql queries a bit and am trying to figure out if it's possible to write a query that would increment a row number counter when returned value changes.
Example SOURCE (where SOURCE can be a sorted/unsorted sub-query/table):
X|Y
---
0|0
0|1
0|0
1|0
1|0
1|1
2|0
Example increment on X: select X, Y, wishful_row_number(X) as Rn from SOURCE;
0,0,1
0,1,1
0,0,1
1,0,2
1,0,2
1,1,2
2,0,3
(row number changes every time X changes)
Example increment on Y: select X, Y, wishful_row_number(Y) as Rn from SOURCE;
0,0,1
0,1,2
0,0,3
1,0,3
1,0,3
1,1,4
2,0,5
(row number changes every time Y changes - goes to a bigger number/different string even if that value was already seen before)
So the Rn increment is not dependent on any "order by" or source sorting but just on the previously returned row.
Can such a query be written (without store procedures and if possible without temporary tables)?
EDIT: And yes, I know tables have no inherent ordering which is exactly the point of why I am trying to skip the "force me to order by" step and bind the Rn row generation on returned values (see my reply comment below).
And yes, I understand that something like that might not be possible in SQL based on the "why would somebody in their right mind want to do that???" but as far as I'm concerned that's the SQL limitation and the answer is "no, it can't be done" even if some people have the fetish of down voting my question.
2 Answers 2
The dense_rank()
window function should be what you need:
WITH tab(x, y) AS (VALUES (0, 0), (0, 1), (0, 0), (1, 0), (1, 0), (1, 1), (2, 0))
SELECT x, y,
dense_rank() OVER (ORDER BY x)
FROM tab;
x │ y │ dense_rank
═══╪═══╪════════════
0 │ 0 │ 1
0 │ 1 │ 1
0 │ 0 │ 1
1 │ 0 │ 2
1 │ 0 │ 2
1 │ 1 │ 2
2 │ 0 │ 3
(7 rows)
This is a classic gaps-and-islands problem.
A typical solution is to use LAG
to check for the previous row value, then a running filtered COUNT
to create the numbering.
WITH YourTable(x, y) AS
(
VALUES (0, 0), (0, 1), (0, 0), (1, 0), (1, 0), (1, 1), (2, 0)
),
Lagged AS
(
SELECT *,
LAG(Y) OVER (ORDER BY X) AS PrevY
FROM YourTable
)
SELECT
X,
Y,
COUNT(*) FILTER (WHERE PrevY <> Y OR PrevY IS NULL)
OVER (PARTITION BY X ORDER BY (SELECT NULL) ROWS UNBOUNDED PRECEDING) AS rn
FROM Lagged;
Your ordering, as you have noted, is non-deterministic, and you may get different results each time.
Use a deterministic sort (you probably need another column, such as ID
or date
). For example:
LAG(Y) OVER (ORDER BY X, Id) AS PrevY
COUNT(*) FILTER (WHERE PrevY <> Y OR PrevY IS NULL)
OVER (PARTITION BY X ORDER BY Id ROWS UNBOUNDED PRECEDING) AS rn
0,0
0,1
0,0
or0,1
0,0
0,0
?