I saw another post where they showed you how to do this with a case statement and had you define your buckets in advance. how do I make it dynamic?
I know about the width_bucket
and ntile
functions. I am having trouble getting them to do what I want.
I created the following table to get counts of duplicate names
create table name_dupe as
select name,count(*) mycount
from mynametable
group by name
having count(*) > 1
I now want to find out how many duplicate names fit into certain counts. For example: How many names have between 101 and 200 duplicates, 201 and 300. 301 and 400 and so on. I may want to change the bucket ranges.
How do I do this? I think its with width_bucket and not ntile, but I can't get it to work. This is what I tried.
select distinct mycount,dupes
from (
select mycount,width_bucket(mycount,200,300,30) dupes
from name_dupe
)
where dupes > 0
order by mycount
Sample output:
mycount dupe
------- ----------
887 31
909 31
993 31
1 Answer 1
You were getting close. WIDTH_BUCKET does not allow a subset of the ranges to be specified, you must specify the start and end of the whole range and how many buckets you want that range divided into. In the case of 1-100, 101-200, 201-300, 301-400, & 401-500 your start and end are 1 and 500 and this should be divided into five buckets. This can be done as follows:
SELECT WIDTH_BUCKET (mycount, 1, 500, 5) Bucket FROM name_dupe;
Having the buckets we just need to count how many hits we have for each bucket using a group by. This comes together with the above as follows:
SELECT Bucket*100 - 99 "Start", Bucket*100 "End", Count(Bucket) "Duplicates In Range"
FROM
(
SELECT WIDTH_BUCKET (mycount, 1, 500, 5) Bucket FROM name_dupe
)
GROUP BY Bucket ORDER BY Bucket;
There are some tradeoffs for creating the name_dupe table. To avoid them you could combine your first query with this one as follows:
SELECT Bucket*100 - 99 "Start", Bucket*100 "End", Count(Bucket) "Duplicates In Range"
FROM
(
SELECT WIDTH_BUCKET (count(name), 1, 500, 5) Bucket FROM mynametable
GROUP BY name
HAVING count(name) > 1
)
GROUP BY Bucket ORDER BY Bucket;
Finally, here is another solution avoiding the use of WIDTH_BUCKET
, which was apparently added in 9.2.
SELECT rownum*100 - 99 "Start", rownum*100 "End", d2 "Duplicates In Range" FROM
(
SELECT trunc((dupes+99)/100), count(trunc((dupes+99)/100)) d2 FROM
(
SELECT name, count(name) dupes FROM mynametable GROUP BY name
HAVING count(name) > 1 ORDER BY 2
)
GROUP BY trunc((dupes+99)/100) ORDER BY 1
);
Test Data:
--drop table mynametable;
create table mynametable as (
select 'a' name from dual connect by level <= 1
UNION ALL
select 'b' name from dual connect by level <= 100
UNION ALL
select 'c' name from dual connect by level <= 101
UNION ALL
select 'd' name from dual connect by level <= 120
UNION ALL
select 'e' name from dual connect by level <= 200
UNION ALL
select 'f' name from dual connect by level <= 250
UNION ALL
select 'g' name from dual connect by level <= 300
UNION ALL
select 'h' name from dual connect by level <= 400
);
create table name_dupe as (
select name,count(*) mycount from mynametable
group by name having count(*) > 1
);