creating buckets in oracle sql

Question 1

I saw another post where they showed you how to do this with a case statement and had you define your buckets in advance. how do I make it dynamic?

I know about the width_bucket and ntile functions. I am having trouble getting them to do what I want.

I created the following table to get counts of duplicate names

create table name_dupe as
select name,count(*) mycount
from mynametable
group by name
having count(*) > 1

I now want to find out how many duplicate names fit into certain counts. For example: How many names have between 101 and 200 duplicates, 201 and 300. 301 and 400 and so on. I may want to change the bucket ranges.

How do I do this? I think its with width_bucket and not ntile, but I can't get it to work. This is what I tried.

select distinct mycount,dupes 
from ( 
select mycount,width_bucket(mycount,200,300,30) dupes 
from name_dupe 
) 
where dupes > 0 
order by mycount

Sample output:

mycount dupe
------- ----------
 887 31
 909 31
 993 31

Question 2

You were getting close. WIDTH_BUCKET does not allow a subset of the ranges to be specified, you must specify the start and end of the whole range and how many buckets you want that range divided into. In the case of 1-100, 101-200, 201-300, 301-400, & 401-500 your start and end are 1 and 500 and this should be divided into five buckets. This can be done as follows:

SELECT WIDTH_BUCKET (mycount, 1, 500, 5) Bucket FROM name_dupe;

Having the buckets we just need to count how many hits we have for each bucket using a group by. This comes together with the above as follows:

SELECT Bucket*100 - 99 "Start", Bucket*100 "End", Count(Bucket) "Duplicates In Range" 
FROM
(
 SELECT WIDTH_BUCKET (mycount, 1, 500, 5) Bucket FROM name_dupe
)
GROUP BY Bucket ORDER BY Bucket;

There are some tradeoffs for creating the name_dupe table. To avoid them you could combine your first query with this one as follows:

SELECT Bucket*100 - 99 "Start", Bucket*100 "End", Count(Bucket) "Duplicates In Range" 
FROM
(
 SELECT WIDTH_BUCKET (count(name), 1, 500, 5) Bucket FROM mynametable 
 GROUP BY name 
 HAVING count(name) > 1
)
GROUP BY Bucket ORDER BY Bucket;

Finally, here is another solution avoiding the use of WIDTH_BUCKET, which was apparently added in 9.2.

SELECT rownum*100 - 99 "Start", rownum*100 "End", d2 "Duplicates In Range" FROM
( 
SELECT trunc((dupes+99)/100), count(trunc((dupes+99)/100)) d2 FROM 
 (
 SELECT name, count(name) dupes FROM mynametable GROUP BY name 
 HAVING count(name) > 1 ORDER BY 2
 )
GROUP BY trunc((dupes+99)/100) ORDER BY 1
);

Test Data:

--drop table mynametable;
create table mynametable as (
 select 'a' name from dual connect by level <= 1
 UNION ALL
 select 'b' name from dual connect by level <= 100
 UNION ALL
 select 'c' name from dual connect by level <= 101
 UNION ALL
 select 'd' name from dual connect by level <= 120
 UNION ALL
 select 'e' name from dual connect by level <= 200
 UNION ALL
 select 'f' name from dual connect by level <= 250
 UNION ALL
 select 'g' name from dual connect by level <= 300
 UNION ALL
 select 'h' name from dual connect by level <= 400
);
create table name_dupe as (
 select name,count(*) mycount from mynametable
 group by name having count(*) > 1
 );

Leigh Riffel Leigh Riffel 23.9k17 gold badges80 silver badges155 bronze badges · Answer 1 · 2011-10-06 21:21:37Z

You were getting close. WIDTH_BUCKET does not allow a subset of the ranges to be specified, you must specify the start and end of the whole range and how many buckets you want that range divided into. In the case of 1-100, 101-200, 201-300, 301-400, & 401-500 your start and end are 1 and 500 and this should be divided into five buckets. This can be done as follows:

SELECT WIDTH_BUCKET (mycount, 1, 500, 5) Bucket FROM name_dupe;

Having the buckets we just need to count how many hits we have for each bucket using a group by. This comes together with the above as follows:

SELECT Bucket*100 - 99 "Start", Bucket*100 "End", Count(Bucket) "Duplicates In Range" 
FROM
(
 SELECT WIDTH_BUCKET (mycount, 1, 500, 5) Bucket FROM name_dupe
)
GROUP BY Bucket ORDER BY Bucket;

There are some tradeoffs for creating the name_dupe table. To avoid them you could combine your first query with this one as follows:

SELECT Bucket*100 - 99 "Start", Bucket*100 "End", Count(Bucket) "Duplicates In Range" 
FROM
(
 SELECT WIDTH_BUCKET (count(name), 1, 500, 5) Bucket FROM mynametable 
 GROUP BY name 
 HAVING count(name) > 1
)
GROUP BY Bucket ORDER BY Bucket;

Finally, here is another solution avoiding the use of WIDTH_BUCKET, which was apparently added in 9.2.

SELECT rownum*100 - 99 "Start", rownum*100 "End", d2 "Duplicates In Range" FROM
( 
SELECT trunc((dupes+99)/100), count(trunc((dupes+99)/100)) d2 FROM 
 (
 SELECT name, count(name) dupes FROM mynametable GROUP BY name 
 HAVING count(name) > 1 ORDER BY 2
 )
GROUP BY trunc((dupes+99)/100) ORDER BY 1
);

Test Data:

--drop table mynametable;
create table mynametable as (
 select 'a' name from dual connect by level <= 1
 UNION ALL
 select 'b' name from dual connect by level <= 100
 UNION ALL
 select 'c' name from dual connect by level <= 101
 UNION ALL
 select 'd' name from dual connect by level <= 120
 UNION ALL
 select 'e' name from dual connect by level <= 200
 UNION ALL
 select 'f' name from dual connect by level <= 250
 UNION ALL
 select 'g' name from dual connect by level <= 300
 UNION ALL
 select 'h' name from dual connect by level <= 400
);
create table name_dupe as (
 select name,count(*) mycount from mynametable
 group by name having count(*) > 1
 );

Stack Exchange Network

creating buckets in oracle sql

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

creating buckets in oracle sql

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions