4

I have a table attendance as follows:

create table attendance (id int, name varchar(30), status varchar(10), date date) 

This table has the following records:

insert into attendance values (1,'John','absent','2016-03-01');
insert into attendance values (1,'John','absent','2016-03-02');
insert into attendance values (1,'John','absent','2016-03-03');
insert into attendance values (2,'Sam','present','2016-03-04');
insert into attendance values (3,'Sam','absent','2016-03-05');
insert into attendance values (1,'John','absent','2016-03-06');
insert into attendance values (1,'John','absent','2016-03-07');
insert into attendance values (1,'John','absent','2016-03-08');
insert into attendance values (1,'John','present','2016-03-09');
insert into attendance values (1,'John','absent','2016-03-10');
insert into attendance values (1,'John','absent','2016-03-11');
insert into attendance values (1,'John','present','2016-03-12'); 
insert into attendance values (1,'John','absent','2016-03-13'); 

Now I want to count that how many times a person has three consecutive absent records.
The result should be as follows:

id name count
 1 John 2 

because John is absent for three consecutive days two times.

If they have 6 absents on 6 consecutive dates, then it should count 2 i.e (3+3).

It should count 1 for 3 consecutive absents on 3 consecutive dates i.e if John is absent for 1,2 and 3 march, it should count 1. but if John is absent for 1,2 and 4 march then it should not count 1.
4 or 5 absents should be counted as 1.

If there is no entry for a date, It is considered 'Present'.

Any help is appreciated.

asked Aug 7, 2016 at 5:04
0

3 Answers 3

5

Another solution. Recursive part creates groups of rows of status "absent". The solution uses recursive CTEs and window functions.

WITH RECURSIVE a(id, name, date, n) as (
 SELECT id, name, q.date, 1 as n
 FROM (
 SELECT id, name, date,
 date-lag(date) OVER (PARTITION BY name ORDER BY date) as lag
 FROM attendance
 WHERE status='absent'
 ) q
 WHERE lag >1 or lag is null
 UNION
 SELECT a.id, a.name, a.date, a.n + 1
 FROM a
 JOIN attendance at ON (
 a.id = at.id and at.name = a.name and at.date = a.date + n)
 WHERE at.status='absent'
)
SELECT id, name, sum(long_absences) FROM (
 SELECT id, name, count(*)/3 as long_absences
 FROM a
 GROUP BY id, name, date having count(*) >=3
 ) as absences
GROUP BY id, name;
answered Aug 9, 2016 at 21:47
0
1

I beleive @user103153 solution is better so I'll just add this as an example of another technique. I did not have PostgreSQL to try with so some adjustments likely have to be made. I also use a calendar table, in psql either a recursive CTE or a generating expression can be used:

We start by constructing the domain of names and dates:

select distinct d, name 
from calendar c 
cross join attendance a

from there we can construct the status:

select nc.name, nc.d, a.date, a.name as aname
 , coalesce(a.status, 'present') as status 
from ( 
 select distinct d, name 
 from calendar c 
 cross join attendance a
) as nc
left join attendance a 
 on a.name = nc.name
 and a.date = nc.d

Then by using a common trick we can construct groups of consecutive name, date and status:

select name, d, status, row_number() over (partition by name order by d) 
 - row_number() over (partition by name, status order by d) as grp 
from (
 select nc.name, nc.d, a.date, a.name as aname
 , coalesce(a.status, 'present') as status 
 from ( 
 select distinct d, name 
 from calendar c cross join attendance a
 ) as nc
 left join attendance a 
 on a.name = nc.name
 and a.date = nc.d
) 

Next step is to determine the start and stop for each interval where status is 'absent':

select name, min(d), max(d) from (
 select name, d, status, row_number() over (partition by name order by d) 
 - row_number() over (partition by name, status order by d) as grp 
 from (
 select nc.name, nc.d, a.date, a.name as aname
 , coalesce(a.status, 'present') as status 
 from ( 
 select distinct d, name 
 from calendar c 
 cross join attendance a
 ) as nc
 left join attendance a 
 on a.name = nc.name
 and a.date = nc.d
 ) 
) 
where status = 'absent' 
group by grp, name order by min(d);
NAME 2 3 
------------------------------ ---------- ----------
John 03/01/2016 03/03/2016
Sam 03/05/2016 03/05/2016
John 03/06/2016 03/08/2016
John 03/10/2016 03/11/2016
John 03/13/2016 03/13/2016

Finally (left as an exercise :-) we take the length of the interval and integer divide that with 3 and filter out the ones with 0. Then it's just a matter of summing the number per name.

answered Aug 10, 2016 at 7:58
0

I think I cobbled together something that does what you're looking for. I used several subqueries to check for consecutive absences, and also to avoid double dipping on a long absence (meaning 6 days in a row should count ONLY the 2 combined marks from the first day, and not the individual marks on absences 2-4, 3-5, and 4-6).

Would be curious to see if there's a more elegant solution from someone else.

 select 
 z.name, 
 sum(z.absent_marks) 
 from (
 select 
 sub1.* 
 from (
 select 
 x.name, 
 x.adate, 
 floor(max(x.timeshift)/3)::integer as absent_marks
 from (
 select 
 a.name,
 a.date as adate,
 b.date as bdate,
 dense_rank() over (partition by a.name, a.date order by b.date) as timeshift
 from attendance a
 left join attendance b 
 on a."name" = b."name" 
 and a.date <= b.date
 and a.status = b.status 
 where a.status = 'absent'
 ) x 
 where x.timeshift - 1 = datediff('day', x.adate, x.bdate)
 group by x.name, x.adate
 ) sub1
 left join ( 
 select 
 x.name, 
 x.adate, 
 floor(max(x.timeshift)/3)::integer as absent_marks
 from (
 select 
 a.name,
 a.date as adate,
 b.date as bdate,
 dense_rank() over (partition by a.name, a.date order by b.date) as timeshift
 from attendance a
 left join attendance b 
 on a."name" = b."name" 
 and a.date <= b.date
 and a.status = b.status 
 where a.status = 'absent'
 ) x 
 where x.timeshift - 1 = datediff('day', x.adate, x.bdate)
 group by x.name, x.adate) sub2
 on sub1.adate = sub2.adate + 1
 and sub1.name = sub2.name 
 where (sub2.absent_marks = 0 or sub2.absent_marks is null)
 and sub1.absent_marks != 0
 ) z
 group by
 z.name
answered Aug 7, 2016 at 18:22
1
  • It did not give me the output. And instead of datediff() we have date_part in PostgreSQL. @peach Commented Aug 9, 2016 at 7:07

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.