12

I have a table with two columns, I want to count the distinct values on Col_B over (conditioned by) Col_A.

MyTable

Col_A | Col_B 
A | 1
A | 1
A | 2
A | 2
A | 2
A | 3
b | 4
b | 4
b | 5

Expected Result

Col_A | Col_B | Result
A | 1 | 3
A | 1 | 3
A | 2 | 3
A | 2 | 3
A | 2 | 3
A | 3 | 3
b | 4 | 2
b | 4 | 2
b | 5 | 2

I tried the following code

select *, 
count (distinct col_B) over (partition by col_A) as 'Result'
from MyTable

count (distinct col_B) is not working. How can I rewrite the count function to count distinct values?

Paul White
95.4k30 gold badges440 silver badges689 bronze badges
asked Jun 4, 2019 at 14:29
0

5 Answers 5

22

This is how I'd do it:

SELECT *
FROM #MyTable AS mt
CROSS APPLY ( SELECT COUNT(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A
 -- GROUP BY mt2.Col_A 
 ) AS ca;

The GROUP BY clause is redundant given the data provided in the question, but may give you a better execution plan. See the follow-up Q & A CROSS APPLY produces outer join.

Consider voting for OVER clause enhancement request - DISTINCT clause for aggregate functions on the feedback site if you would like that feature added to SQL Server.

Paul White
95.4k30 gold badges440 silver badges689 bronze badges
answered Jun 4, 2019 at 16:10
0
7

You can emulate it by using dense_rank, and then pick the maximum rank for each partition:

select col_a, col_b, max(rnk) over (partition by col_a)
from (
 select col_a, col_b
 , dense_rank() over (partition by col_A order by col_b) as rnk 
 from #mytable
) as t 

You would need to exclude any nulls from col_b to get the same results as COUNT(DISTINCT).

Paul White
95.4k30 gold badges440 silver badges689 bronze badges
answered Jun 4, 2019 at 19:12
0
7

This is, in a way, an extension to Lennart's solution, but it is so ugly that I dare not suggest it as an edit. The goal here is to get the results without a derived table. There may never be the need for that, and combined with the ugliness of the query the whole endeavour may seem like a wasted effort. I still wanted to do this as an exercise, though, and would now like to share my result:

SELECT
 Col_A,
 Col_B,
 DistinctCount = DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B ASC )
 + DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B DESC)
 - 1
 - CASE COUNT(Col_B) OVER (PARTITION BY Col_A)
 WHEN COUNT( * ) OVER (PARTITION BY Col_A)
 THEN 0
 ELSE 1
 END
FROM
 dbo.MyTable
;

The core part of the calculation is this (and I would first of all like to note that the idea is not mine, I learned about this trick elsewhere):

 DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B ASC )
+ DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B DESC)
- 1

This expression can be used without any change if the values in Col_B are guaranteed to never have nulls. If the column can have nulls, however, you need to account for that, and that is exactly what the CASE expression is there for. It compares the number of rows per partition with the number of Col_B values per partition. If the numbers differ, it means that some rows have a null in Col_B and, therefore, the initial calculation (DENSE_RANK() ... + DENSE_RANK() - 1) needs to be reduced by 1.

Note that because the - 1 is part of the core formula, I chose to leave it like that. However, it can actually be incorporated into the CASE expression, in the futile attempt to make the entire solution look less ugly:

SELECT
 Col_A,
 Col_B,
 DistinctCount = DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B ASC )
 + DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B DESC)
 - CASE COUNT(Col_B) OVER (PARTITION BY Col_A)
 WHEN COUNT( * ) OVER (PARTITION BY Col_A)
 THEN 1
 ELSE 2
 END
FROM
 dbo.MyTable
;

This live demo at dbfiddle logodb<>fiddle.uk can be used to test both variations of the solution.

answered Jun 5, 2019 at 7:28
0
2
create table #MyTable (
Col_A varchar(5),
Col_B int
)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',3)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',5)
;with t1 as (
select t.Col_A,
 count(*) cnt
 from (
 select Col_A,
 Col_B,
 count(*) as ct
 from #MyTable
 group by Col_A,
 Col_B
 ) t
 group by t.Col_A
 )
select a.*,
 t1.cnt
 from #myTable a
 join t1
 on a.Col_A = t1.Col_a
answered Jun 4, 2019 at 15:22
1

Alternative if you're mildly allergic to correlated subqueries (Erik Darling's answer) and CTEs (kevinnwhat's answer) like me.

Be aware that when nulls are thrown in to the mix, none of these may work how you would like them to. (but it's fairly simple to modify them to taste)

Simple case:

--ignore the existence of nulls
SELECT [mt].*, [Distinct_B].[Distinct_B]
FROM #MyTable AS [mt]
INNER JOIN(
 SELECT [Col_A], COUNT(DISTINCT [Col_B]) AS [Distinct_B]
 FROM #MyTable
 GROUP BY [Col_A]
) AS [Distinct_B] ON
 [mt].[Col_A] = [Distinct_B].[Col_A]
;

Same as above, but with comments on what to change for null handling:

--customizable null handling
SELECT [mt].*, [Distinct_B].[Distinct_B]
FROM #MyTable AS [mt]
INNER JOIN(
 SELECT 
 [Col_A],
 (
 COUNT(DISTINCT [Col_B])
 /*
 --uncomment if you also want to count Col_B NULL
 --as a distinct value
 +
 MAX(
 CASE
 WHEN [Col_B] IS NULL
 THEN 1
 ELSE 0
 END
 )
 */
 )
 AS [Distinct_B]
 FROM #MyTable
 GROUP BY [Col_A]
) AS [Distinct_B] ON
 [mt].[Col_A] = [Distinct_B].[Col_A]
/*
--uncomment if you also want to include Col_A when it's NULL
OR
([mt].[Col_A] IS NULL AND [Distinct_B].[Col_A] IS NULL)
*/
answered Jun 5, 2019 at 20:42

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.