2

I saw another post where they showed you how to do this with a case statement and had you define your buckets in advance. how do I make it dynamic?

I know about the width_bucket and ntile functions. I am having trouble getting them to do what I want.

I created the following table to get counts of duplicate names

create table name_dupe as
select name,count(*) mycount
from mynametable
group by name
having count(*) > 1

I now want to find out how many duplicate names fit into certain counts. For example: How many names have between 101 and 200 duplicates, 201 and 300. 301 and 400 and so on. I may want to change the bucket ranges.

How do I do this? I think its with width_bucket and not ntile, but I can't get it to work. This is what I tried.

select distinct mycount,dupes 
from ( 
select mycount,width_bucket(mycount,200,300,30) dupes 
from name_dupe 
) 
where dupes > 0 
order by mycount

Sample output:

mycount dupe
------- ----------
 887 31
 909 31
 993 31
asked Oct 6, 2011 at 18:21

1 Answer 1

2

You were getting close. WIDTH_BUCKET does not allow a subset of the ranges to be specified, you must specify the start and end of the whole range and how many buckets you want that range divided into. In the case of 1-100, 101-200, 201-300, 301-400, & 401-500 your start and end are 1 and 500 and this should be divided into five buckets. This can be done as follows:

SELECT WIDTH_BUCKET (mycount, 1, 500, 5) Bucket FROM name_dupe;

Having the buckets we just need to count how many hits we have for each bucket using a group by. This comes together with the above as follows:

SELECT Bucket*100 - 99 "Start", Bucket*100 "End", Count(Bucket) "Duplicates In Range" 
FROM
(
 SELECT WIDTH_BUCKET (mycount, 1, 500, 5) Bucket FROM name_dupe
)
GROUP BY Bucket ORDER BY Bucket;

There are some tradeoffs for creating the name_dupe table. To avoid them you could combine your first query with this one as follows:

SELECT Bucket*100 - 99 "Start", Bucket*100 "End", Count(Bucket) "Duplicates In Range" 
FROM
(
 SELECT WIDTH_BUCKET (count(name), 1, 500, 5) Bucket FROM mynametable 
 GROUP BY name 
 HAVING count(name) > 1
)
GROUP BY Bucket ORDER BY Bucket;

Finally, here is another solution avoiding the use of WIDTH_BUCKET, which was apparently added in 9.2.

SELECT rownum*100 - 99 "Start", rownum*100 "End", d2 "Duplicates In Range" FROM
( 
SELECT trunc((dupes+99)/100), count(trunc((dupes+99)/100)) d2 FROM 
 (
 SELECT name, count(name) dupes FROM mynametable GROUP BY name 
 HAVING count(name) > 1 ORDER BY 2
 )
GROUP BY trunc((dupes+99)/100) ORDER BY 1
);

Test Data:

--drop table mynametable;
create table mynametable as (
 select 'a' name from dual connect by level <= 1
 UNION ALL
 select 'b' name from dual connect by level <= 100
 UNION ALL
 select 'c' name from dual connect by level <= 101
 UNION ALL
 select 'd' name from dual connect by level <= 120
 UNION ALL
 select 'e' name from dual connect by level <= 200
 UNION ALL
 select 'f' name from dual connect by level <= 250
 UNION ALL
 select 'g' name from dual connect by level <= 300
 UNION ALL
 select 'h' name from dual connect by level <= 400
);
create table name_dupe as (
 select name,count(*) mycount from mynametable
 group by name having count(*) > 1
 );
answered Oct 6, 2011 at 21:21

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.