Grouping Subsets of Rows with Null Values within an Ordered Set

Question 1

Lets say we have an table where each row is a day, and it is ordered by this day column. Then we have left joined a membership data set showing which day the members were active (and not).

Lets say our current data set looks like this... Membership was active from day 3-5, inactive from 5-8, and active from day 9 onward etc.

DAY DATE MEMBER ACTIVE
 1 2017年01月01日 123 null
 2 2017年01月02日 123 null
 3 2017年01月03日 123 2017年01月03日
 4 2017年01月04日 123 2017年01月04日
 5 2017年01月05日 123 2017年01月05日
 6 2017年01月06日 123 null
 7 2017年01月07日 123 null
 8 2017年01月08日 123 null
 9 2017年01月09日 123 2017年01月09日
10 2017年01月10日 123 2017年01月10日

...so ACTIVE=null means membership was not active on those days.

With this data structure, I would like to get to a "collapsed" set, showing "spans" of time inactive/active:

MEMBER MIN(DATE) MAX(DATE) STATUS
123, 2017年01月01日, 2017年01月02日 INACTIVE
123, 2017年01月03日, 2017年01月05日 ACTIVE
123, 2017年01月06日, 2017年01月08日 INACTIVE
123, 2017年01月09日, 2017年01月10日 ACTIVE

I have tried using row_number() to somehow partition out the subsets of a certain status, but in this case, using min()/max() over the rows where ACTIVE is null, treats those as a single group, when in reality, there are several distinct spans of "inactive membership".

How can I distinguish the spans of inactive membership from one-another for grouping purposes? What technique can I use to achieve that output above?

Here is the script to generate the dummy source data:

CREATE TABLE ##SRC (ID INT, D DATE, MEMBER INT, ACTIVE DATE);
INSERT INTO ##SRC (ID, D, MEMBER, ACTIVE)
SELECT 1, '2017-01-01', 123, NULL UNION 
SELECT 2, '2017-01-02', 123, NULL UNION 
SELECT 3, '2017-01-03', 123, '2017-01-03' UNION 
SELECT 4, '2017-01-04', 123, '2017-01-04' UNION 
SELECT 5, '2017-01-05', 123, '2017-01-05' UNION 
SELECT 6, '2017-01-06', 123, NULL UNION 
SELECT 7, '2017-01-07', 123, NULL UNION 
SELECT 8, '2017-01-08', 123, NULL UNION 
SELECT 9, '2017-01-09', 123, '2017-01-09' UNION 
SELECT 10, '2017-01-10', 123, '2017-01-10' 
;

Question 2

This seems like a pretty straightforward gaps and islands problem, no?

Question 3

What version of SQL Server are you using? 2008, 2012, etc.?

Question 4

2008 r2. I will update the topic.

Question 5

Your sample data does not match your description and confused me at first. As sp_BlitzErik points out this is an island-and-gap problem. The solution is pretty straightforward if you have access to window functions. First, we can enumerate the table per member alone, let's call this full_order (this happens to be the same as day, but I'll ad it for generality). Second, we can enumerate the table per member and whether they were active on that day, let's call this partial_order

 select day, active, date, member
 , row_number() over (partition by member 
 order by day) as fullorder
 , row_number() over (partition by member
 ,case when active is null then 0 else 1 end
 order by day) as partialorder
 from src
DAY ACTIVE MEMBER FULLORDER PARTIALORDER 
----------- ---------- ----------- -------------------- --------------------
 1 - 123 1 1
 2 - 123 2 2
 3 01/03/2017 123 3 1
 4 01/04/2017 123 4 2
 5 01/05/2017 123 5 3
 6 - 123 6 3
 7 - 123 7 4
 8 - 123 8 5
 9 01/09/2017 123 9 4
 10 01/10/2017 123 10 5

If the difference between full_order and partial_order changes, it means that active has changed from null to a value, or vice versa. Therefore we can form a group with this difference. Within each such group we can pick the min(active) and max(active) to form an interval:

select member, grp, min(date), max(active) 
from (
 select day, active, date, member
 , row_number() over (partition by member order by day) 
 - row_number() over (partition by member
 ,case when active is null then 0 else 1 end 
 order by day) as grp 
 from src
) 
group by member, grp
MEMBER GRP 3 4 
----------- -------------------- ---------- ----------
 123 0 01/01/2017 - 
 123 2 01/03/2017 01/05/2017
 123 3 01/05/2017 - 
 123 5 01/08/2017 01/10/2017

It's probably easiest to add another level of nesting to get the desired result:

select member, min_active
 , coalesce(max_active, min_active) as max_active
 , case when max_active is null then 'INACTIVE' else 'ACTIVE' end as status 
from (
 select member, grp, min(date) as min_active, max(active) as max_active 
 from (
 select day, active, date, member
 , row_number() over (partition by member order by day) 
 - row_number() over (partition by member
 ,case when active is null then 0 else 1 end 
 order by day) as grp 
 from src
 ) 
 group by member, grp)
MEMBER MIN_ACTIVE MAX_ACTIVE STATUS 
----------- ---------- ---------- --------
 123 01/01/2017 01/01/2017 INACTIVE
 123 01/03/2017 01/05/2017 ACTIVE 
 123 01/05/2017 01/05/2017 INACTIVE
 123 01/08/2017 01/10/2017 ACTIVE

Question 6

this is great, it was grouping on the difference where I was coming up short... Thanks for pointing me in the right direction!

Question 7

Also I corrected the sample data...

Question 8

Then we have left joined a membership data set showing which day the members were active (and not).

Sorry If understood it wrong.you should have posted both table data then mention your problem.This way it is guaranteed to get best query.

I am using Recursive CTE in my query which can be avoided if you would have both the table.

anyway script is very short

CREATE TABLE #SRC (ID INT, D DATE, MEMBER INT, ACTIVE DATE);
INSERT INTO #SRC (ID, D, MEMBER, ACTIVE)
SELECT 1, '2017-01-01', 123, NULL UNION 
SELECT 2, '2017-01-02', 123, NULL UNION 
SELECT 3, '2017-01-03', 123, '2017-01-03' UNION 
SELECT 4, '2017-01-04', 123, '2017-01-04' UNION 
SELECT 5, '2017-01-05', 123, '2017-01-05' UNION 
SELECT 6, '2017-01-06', 123, NULL UNION 
SELECT 7, '2017-01-07', 123, NULL UNION 
SELECT 8, '2017-01-08', 123, NULL UNION 
SELECT 9, '2017-01-09', 123, '2017-01-09' UNION 
SELECT 10, '2017-01-10', 123, '2017-01-10' 
;with CTE as
(
select *, 1 flg from #SRC where id=1
union ALL
select s.*
,case when s.active is null and c.active is null then flg 
when s.active is not null and c.active is not null then flg 
else flg+1 end 
from #SRC S 
inner JOIN cte c on s.member=c.member
and s.id=c.id +1
)
select member,flg,MIN(D) MinD ,max(D)MaxD
from cte
group by member,flg
drop table #SRC

score 3 · Accepted Answer · 2017-11-15 21:04:18Z

Your sample data does not match your description and confused me at first. As sp_BlitzErik points out this is an island-and-gap problem. The solution is pretty straightforward if you have access to window functions. First, we can enumerate the table per member alone, let's call this full_order (this happens to be the same as day, but I'll ad it for generality). Second, we can enumerate the table per member and whether they were active on that day, let's call this partial_order

 select day, active, date, member
 , row_number() over (partition by member 
 order by day) as fullorder
 , row_number() over (partition by member
 ,case when active is null then 0 else 1 end
 order by day) as partialorder
 from src
DAY ACTIVE MEMBER FULLORDER PARTIALORDER 
----------- ---------- ----------- -------------------- --------------------
 1 - 123 1 1
 2 - 123 2 2
 3 01/03/2017 123 3 1
 4 01/04/2017 123 4 2
 5 01/05/2017 123 5 3
 6 - 123 6 3
 7 - 123 7 4
 8 - 123 8 5
 9 01/09/2017 123 9 4
 10 01/10/2017 123 10 5

If the difference between full_order and partial_order changes, it means that active has changed from null to a value, or vice versa. Therefore we can form a group with this difference. Within each such group we can pick the min(active) and max(active) to form an interval:

select member, grp, min(date), max(active) 
from (
 select day, active, date, member
 , row_number() over (partition by member order by day) 
 - row_number() over (partition by member
 ,case when active is null then 0 else 1 end 
 order by day) as grp 
 from src
) 
group by member, grp
MEMBER GRP 3 4 
----------- -------------------- ---------- ----------
 123 0 01/01/2017 - 
 123 2 01/03/2017 01/05/2017
 123 3 01/05/2017 - 
 123 5 01/08/2017 01/10/2017

It's probably easiest to add another level of nesting to get the desired result:

select member, min_active
 , coalesce(max_active, min_active) as max_active
 , case when max_active is null then 'INACTIVE' else 'ACTIVE' end as status 
from (
 select member, grp, min(date) as min_active, max(active) as max_active 
 from (
 select day, active, date, member
 , row_number() over (partition by member order by day) 
 - row_number() over (partition by member
 ,case when active is null then 0 else 1 end 
 order by day) as grp 
 from src
 ) 
 group by member, grp)
MEMBER MIN_ACTIVE MAX_ACTIVE STATUS 
----------- ---------- ---------- --------
 123 01/01/2017 01/01/2017 INACTIVE
 123 01/03/2017 01/05/2017 ACTIVE 
 123 01/05/2017 01/05/2017 INACTIVE
 123 01/08/2017 01/10/2017 ACTIVE

this is great, it was grouping on the difference where I was coming up short... Thanks for pointing me in the right direction!

Stack Exchange Network

Grouping Subsets of Rows with Null Values within an Ordered Set

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Grouping Subsets of Rows with Null Values within an Ordered Set

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions