Lets say we have an table where each row is a day, and it is ordered by this day column. Then we have left joined a membership data set showing which day the members were active (and not).
Lets say our current data set looks like this... Membership was active from day 3-5, inactive from 5-8, and active from day 9 onward etc.
DAY DATE MEMBER ACTIVE
1 2017年01月01日 123 null
2 2017年01月02日 123 null
3 2017年01月03日 123 2017年01月03日
4 2017年01月04日 123 2017年01月04日
5 2017年01月05日 123 2017年01月05日
6 2017年01月06日 123 null
7 2017年01月07日 123 null
8 2017年01月08日 123 null
9 2017年01月09日 123 2017年01月09日
10 2017年01月10日 123 2017年01月10日
...so ACTIVE=null
means membership was not active on those days.
With this data structure, I would like to get to a "collapsed" set, showing "spans" of time inactive/active:
MEMBER MIN(DATE) MAX(DATE) STATUS
123, 2017年01月01日, 2017年01月02日 INACTIVE
123, 2017年01月03日, 2017年01月05日 ACTIVE
123, 2017年01月06日, 2017年01月08日 INACTIVE
123, 2017年01月09日, 2017年01月10日 ACTIVE
I have tried using row_number() to somehow partition out the subsets of a certain status, but in this case, using min()
/max()
over the rows where ACTIVE is null, treats those as a single group, when in reality, there are several distinct spans of "inactive membership".
How can I distinguish the spans of inactive membership from one-another for grouping purposes? What technique can I use to achieve that output above?
Here is the script to generate the dummy source data:
CREATE TABLE ##SRC (ID INT, D DATE, MEMBER INT, ACTIVE DATE);
INSERT INTO ##SRC (ID, D, MEMBER, ACTIVE)
SELECT 1, '2017-01-01', 123, NULL UNION
SELECT 2, '2017-01-02', 123, NULL UNION
SELECT 3, '2017-01-03', 123, '2017-01-03' UNION
SELECT 4, '2017-01-04', 123, '2017-01-04' UNION
SELECT 5, '2017-01-05', 123, '2017-01-05' UNION
SELECT 6, '2017-01-06', 123, NULL UNION
SELECT 7, '2017-01-07', 123, NULL UNION
SELECT 8, '2017-01-08', 123, NULL UNION
SELECT 9, '2017-01-09', 123, '2017-01-09' UNION
SELECT 10, '2017-01-10', 123, '2017-01-10'
;
-
2This seems like a pretty straightforward gaps and islands problem, no?Erik Reasonable Rates Darling– Erik Reasonable Rates Darling2017年11月15日 18:12:03 +00:00Commented Nov 15, 2017 at 18:12
-
What version of SQL Server are you using? 2008, 2012, etc.?Scott Hodgin - Retired– Scott Hodgin - Retired2017年11月15日 19:09:28 +00:00Commented Nov 15, 2017 at 19:09
-
2008 r2. I will update the topic.GWR– GWR2017年11月15日 19:15:22 +00:00Commented Nov 15, 2017 at 19:15
2 Answers 2
Your sample data does not match your description and confused me at first. As sp_BlitzErik points out this is an island-and-gap problem. The solution is pretty straightforward if you have access to window functions. First, we can enumerate the table per member alone, let's call this full_order (this happens to be the same as day, but I'll ad it for generality). Second, we can enumerate the table per member and whether they were active on that day, let's call this partial_order
select day, active, date, member
, row_number() over (partition by member
order by day) as fullorder
, row_number() over (partition by member
,case when active is null then 0 else 1 end
order by day) as partialorder
from src
DAY ACTIVE MEMBER FULLORDER PARTIALORDER
----------- ---------- ----------- -------------------- --------------------
1 - 123 1 1
2 - 123 2 2
3 01/03/2017 123 3 1
4 01/04/2017 123 4 2
5 01/05/2017 123 5 3
6 - 123 6 3
7 - 123 7 4
8 - 123 8 5
9 01/09/2017 123 9 4
10 01/10/2017 123 10 5
If the difference between full_order and partial_order changes, it means that active has changed from null to a value, or vice versa. Therefore we can form a group with this difference. Within each such group we can pick the min(active) and max(active) to form an interval:
select member, grp, min(date), max(active)
from (
select day, active, date, member
, row_number() over (partition by member order by day)
- row_number() over (partition by member
,case when active is null then 0 else 1 end
order by day) as grp
from src
)
group by member, grp
MEMBER GRP 3 4
----------- -------------------- ---------- ----------
123 0 01/01/2017 -
123 2 01/03/2017 01/05/2017
123 3 01/05/2017 -
123 5 01/08/2017 01/10/2017
It's probably easiest to add another level of nesting to get the desired result:
select member, min_active
, coalesce(max_active, min_active) as max_active
, case when max_active is null then 'INACTIVE' else 'ACTIVE' end as status
from (
select member, grp, min(date) as min_active, max(active) as max_active
from (
select day, active, date, member
, row_number() over (partition by member order by day)
- row_number() over (partition by member
,case when active is null then 0 else 1 end
order by day) as grp
from src
)
group by member, grp)
MEMBER MIN_ACTIVE MAX_ACTIVE STATUS
----------- ---------- ---------- --------
123 01/01/2017 01/01/2017 INACTIVE
123 01/03/2017 01/05/2017 ACTIVE
123 01/05/2017 01/05/2017 INACTIVE
123 01/08/2017 01/10/2017 ACTIVE
-
this is great, it was grouping on the difference where I was coming up short... Thanks for pointing me in the right direction!GWR– GWR2017年11月16日 12:59:12 +00:00Commented Nov 16, 2017 at 12:59
-
Also I corrected the sample data...GWR– GWR2017年11月16日 13:50:40 +00:00Commented Nov 16, 2017 at 13:50
Then we have left joined a membership data set showing which day the members were active (and not).
Sorry If understood it wrong.you should have posted both table data then mention your problem.This way it is guaranteed to get best query.
I am using Recursive CTE in my query which can be avoided if you would have both the table.
anyway script is very short
CREATE TABLE #SRC (ID INT, D DATE, MEMBER INT, ACTIVE DATE);
INSERT INTO #SRC (ID, D, MEMBER, ACTIVE)
SELECT 1, '2017-01-01', 123, NULL UNION
SELECT 2, '2017-01-02', 123, NULL UNION
SELECT 3, '2017-01-03', 123, '2017-01-03' UNION
SELECT 4, '2017-01-04', 123, '2017-01-04' UNION
SELECT 5, '2017-01-05', 123, '2017-01-05' UNION
SELECT 6, '2017-01-06', 123, NULL UNION
SELECT 7, '2017-01-07', 123, NULL UNION
SELECT 8, '2017-01-08', 123, NULL UNION
SELECT 9, '2017-01-09', 123, '2017-01-09' UNION
SELECT 10, '2017-01-10', 123, '2017-01-10'
;with CTE as
(
select *, 1 flg from #SRC where id=1
union ALL
select s.*
,case when s.active is null and c.active is null then flg
when s.active is not null and c.active is not null then flg
else flg+1 end
from #SRC S
inner JOIN cte c on s.member=c.member
and s.id=c.id +1
)
select member,flg,MIN(D) MinD ,max(D)MaxD
from cte
group by member,flg
drop table #SRC
Explore related questions
See similar questions with these tags.