0

I am trying to group multiple rows; aggregate distinct values in a column (for the group); and then check if that aggregate set is a subset of some other set. I spent several hours researching and testing different queries, but in the end I am still stuck!

I have put together a test example that I hope illustrates the setup:

create table testtable (
 BusinessDate date
 , GroupId varchar(10)
 , TypeCode varchar(10)
); 
insert into testtable values ('2019-10-15','ABC','T1');
insert into testtable values ('2019-10-15','ABC','T1'); -- TypeCode is not unique for a GroupId
insert into testtable values ('2019-10-15','ABC','T2');
insert into testtable values ('2019-10-15','DEF','T1');
insert into testtable values ('2019-10-15','GHI','T1');
insert into testtable values ('2019-10-15','GHI','T2');
insert into testtable values ('2019-10-15','GHI','T3');
insert into testtable values ('2019-10-15','JKL','T2');
insert into testtable values ('2019-10-15','JKL','T2');
insert into testtable values ('2019-10-15','JKL','T4');

The SuperSets that I want to test against are defined like this (I would prefer to encode these details in the query itself, but could put them in a table if absolutely necessary:

 select 'Type1' as ClassificationType, t.TypeCode from ( values ('T1'),('T2'),('T3') ) t(TypeCode)
union select 'Type2' as ClassificationType, t.TypeCode from ( values ('T1'),('T3') ) t(TypeCode)
union select 'Type3' as ClassificationType, t.TypeCode from ( values ('T2'),('T4'),('T5'),('T6') ) t(TypeCode)

I would like to write a query that extracts unique BusinessDate, GroupId, and TypeCode. It then checks the set of TypeCode (for each BusinessDate, GroupId) against each of the SuperSets defined above (Type1, Type2, Type3). If all the TypeId are found in a SuperSet, it returns that Type.

The SuperSet can have more values than in the set - that's OK. But all the values in the set must be contained in that SuperSet for it to be a match.

The desired output, manually generated for illustration:

 select '2019-10-15' as BusinessDate, 'ABC' as GroupId, 'Type1' as ClassificationType
union select '2019-10-15' as BusinessDate, 'DEF' as GroupId, 'Type1' as ClassificationType
union select '2019-10-15' as BusinessDate, 'GHI' as GroupId, 'Type1' as ClassificationType
union select '2019-10-15' as BusinessDate, 'JKL' as GroupId, 'Type3' as ClassificationType

I expect I would need to use INTERSECT to check the subset contains, but I am not sure how exactly. And I am not sure how to generalize the query to check for all GroupId at the same time.

I'd be grateful for some direction, please!

Here's a horrible query I wrote with all kinds of duplicate code and just checking for one GroupId:

-- variables to allow easy change of inputs for testing
declare @businessdate date = '2019-10-15';
declare @groupid varchar(10) = 'JKL' --'GHI' --'DEF' --'ABC'
select
 @businessDate BusinessDate
, @groupId GroupId
, case
 when (
 select count(*) from
 (
 select distinct TypeCode
 from testtable
 where BusinessDate = @businessdate
 and GroupId = @groupid 
 ) a
 )
 =
 (
 select count(*) from (
 -- duplicate from above
 select distinct TypeCode
 from testtable
 where BusinessDate = @businessdate
 and GroupId = @groupid 
 INTERSECT
 -- DEFINITION OF TYPE 1
 select t.TypeCode
 from ( values ('T1'),('T2'),('T3') ) t(TypeCode)
 ) b
 ) then 'Type1'
 when (
 select count(*) from
 (
 select distinct TypeCode
 from testtable
 where BusinessDate = @businessdate
 and GroupId = @groupid 
 ) a
 )
 =
 (
 select count(*) from (
 -- duplicate from above
 select distinct TypeCode
 from testtable
 where BusinessDate = @businessdate
 and GroupId = @groupid 
 INTERSECT
 -- DEFINITION OF TYPE 2
 select t.TypeCode
 from ( values ('T1'),('T3') ) t(TypeCode)
 ) b
 ) then 'Type2'
 when (
 select count(*) from
 (
 select distinct TypeCode
 from testtable
 where BusinessDate = @businessdate
 and GroupId = @groupid 
 ) a
 )
 =
 (
 select count(*) from (
 -- duplicate from above
 select distinct TypeCode
 from testtable
 where BusinessDate = @businessdate
 and GroupId = @groupid 
 INTERSECT
 -- DEFINITION OF TYPE 3
 select t.TypeCode
 from ( values ('T2'),('T4'),('T5'),('T6') ) t(TypeCode)
 ) b
 ) then 'Type3'
end as ClassificationType
asked Oct 16, 2019 at 19:04

1 Answer 1

1

I think you want to pursue it like this:

 with part1 as (
 select groupid, count(distinct typecode) as typecount from testtable
 group by groupid),
 part2 as (
 select groupid, count(distinct typecode) as typecount
 from yourdata
 group by groupid)
 select p1.groupid, p1.typecount
 from part1 p1
 inner join part2 p2 on p1.groupid = p2.groupid and p1.typecount = p2.typecount

Try rewriting your query as 2 datasets, then see if they match.

answered Oct 16, 2019 at 19:21
2
  • 1
    Thanks, I will have a look at this approach. But I am not clear how this checks for subset status? Not all the testtable TypeCode need to match. It's just that whatever is there needs to be contained in the comparison sets. So I can't just compare typecount? But I may be misunderstanding you, and maybe just need to think on this more. Commented Oct 16, 2019 at 20:21
  • yea, the final solution will probably be bigger - or maybe completely different. you can group by groupid, typecode and that might help. Its hard to follow along because the data, as it's presented, has to be unpivoted. another useful trick might be joining the table to itself. Commented Oct 17, 2019 at 13:09

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.