Using DISTINCT in window function with OVER

Question 1

I'm trying to migrate a query from Oracle to SQL Server 2014.

Here is my query which works great in Oracle:

select
count(distinct A) over (partition by B) / count(*) over() as A_B
from MyTable

Here is the error i got after tried to run this query in SQL Server 2014.

Use of DISTINCT is not allowed with the OVER clause

Anyone know what is the problem? Is such as kind of query possible in SQL Server? Please advise.

Question 2

Do you actually need one row in the result for every row in MyTable? Or are distinct rows enough? And you don't need to consider the division by zero error if there are no rows in MyTable?

Question 3

Anyone know what is the problem? Is such as kind of query possible in SQL Server?

No it isn't currently implemented. See the following connect item request.

OVER clause enhancement request - DISTINCT clause for aggregate functions

Another possible variant would be

SELECT M.A,
 M.B,
 T.A_B
FROM MyTable M
 JOIN (SELECT CAST(COUNT(DISTINCT A) AS NUMERIC(18,8)) / SUM(COUNT(*)) OVER() AS A_B,
 B
 FROM MyTable
 GROUP BY B) T
 ON EXISTS (SELECT M.B INTERSECT SELECT T.B)

the cast to NUMERIC is there to avoid integer division. The reason for the join clause is explained here.

It can be replaced with ON M.B = T.B OR (M.B IS NULL AND T.B IS NULL) if preferred (or simply ON M.B = T.B if the B column is not nullable).

Question 4

This gives the distinct count(*) for A partitioned by B:

dense_rank() over (partition by B order by A) 
+ dense_rank() over (partition by B order by A desc) 
- 1

Question 5

Interesting solution. I suppose it should have a disclaimer that it works when A is non-nullable only (as I think it counts nulls as well).

Question 6

It should be abs(dense_rank - dense_rank) + 1 I believe.

Question 7

You can take the max value of dense_rank() to get the distinct count of A partitioned by B.

To take care of the case where A can have null values you can use first_value to figure out if a null is present in the partition or not and then subtract 1 if it is as suggested by Martin Smith in the comment.

select (max(T.DenseRankA) over(partition by T.B) - 
 cast(iif(T.FirstA is null, 1, 0) as numeric(18, 8))) / T.TotalCount as A_B
from (
 select dense_rank() over(partition by T.B order by T.A) DenseRankA,
 first_value(T.A) over(partition by T.B order by T.A) as FirstA,
 count(*) over() as TotalCount,
 T.A,
 T.B
 from MyTable as T
 ) as T

Question 8

Try doing a subquery, grouping by A, B, and including the count. Then in your outer query, your count(distinct) becomes a regular count, and your count(*) becomes a sum(cnt).

select
count(A) over (partition by B) * 1.0 / 
 sum(cnt) over() as A_B
from
(select A, B, count(*) as cnt
 from MyTable
 group by A, B) as partial;

Question 9

SQL Server for now does not allow using Distinct with windowed functions.

But once you remember how windowed functions work (in simplistic terms: they're applied to result set of the query), you can work around that:

select B,
min(count(distinct A)) over (partition by B) / max(count(*)) over() as A_B
from MyTable
group by B

Question 10

it's fascinating that this works, but does it accomplish anything more than just a simple count(distinct A) since you are already grouping by B?

Question 11

@ScottEdwards2000 - That wasn't the question. ;-) But I share the sentiment, so to answer: I don't know. I did a lot of work using windowed functions, but not for that. At a glance, I have no idea what is the point of this calculation, so the only way to find out is to test it.

Martin Smith Martin Smith 88.4k15 gold badges258 silver badges357 bronze badges · Accepted Answer · 2015-01-11 22:14:22Z

Anyone know what is the problem? Is such as kind of query possible in SQL Server?

No it isn't currently implemented. See the following connect item request.

OVER clause enhancement request - DISTINCT clause for aggregate functions

Another possible variant would be

SELECT M.A,
 M.B,
 T.A_B
FROM MyTable M
 JOIN (SELECT CAST(COUNT(DISTINCT A) AS NUMERIC(18,8)) / SUM(COUNT(*)) OVER() AS A_B,
 B
 FROM MyTable
 GROUP BY B) T
 ON EXISTS (SELECT M.B INTERSECT SELECT T.B)

the cast to NUMERIC is there to avoid integer division. The reason for the join clause is explained here.

It can be replaced with ON M.B = T.B OR (M.B IS NULL AND T.B IS NULL) if preferred (or simply ON M.B = T.B if the B column is not nullable).

Stack Exchange Network

Using DISTINCT in window function with OVER

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Using DISTINCT in window function with OVER

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions